| "Descrizione" by Radar (1854 pt) | 2026-Feb-03 12:22 |
HP 3AA1-1106 “Landshark” – CPU for RISC PA-8600 (v2.0) 64-bit systems, with large on-chip L1 caches and a Runway DDR bus
Definition
The HP 3AA1-1106, code-named Landshark, is a CPU developed by Hewlett-Packard for PA-8600 RISC systems, conceptually an evolution of the PA-8500 with modifications aimed at sustaining higher operating frequency. Another major change concerns the cache hierarchy, with a more aggressive approach and a very large on-chip L1 footprint.
The PA-8600 2.0 version is described as a 64-bit platform and integrates architectural resources focused on high throughput: superscalar issue, many functional units, advanced branch prediction, and a high-bandwidth memory subsystem via the Runway bus.

Evolution from PA-8500: frequency and cache
A “PA-8500-like” design with changes for higher clocks typically translates into two practical design outcomes:
Greater attention to the internal pipeline and power/clocking in order to sustain higher frequencies.
Rebalancing of the cache subsystem, with very large on-chip L1 to reduce pressure on main memory and improve performance stability under server workloads.
Execution architecture: 4-way superscalar and 10 functional units
The PA-8600 2.0 is described as 4-way superscalar, meaning it can issue/handle multiple instructions per cycle when the instruction stream and dependencies allow it. The internal organization includes 10 functional units:
2 integer ALUs
2 shift/merge units
2 complete load/store pipelines
2 floating-point multiply/accumulate units
2 floating-point divide/square-roots units
Practically, this unit mix is meant to sustain parallelism on mixed code (integer, memory, floating point), reducing bottlenecks when the compiler and workload expose enough ILP (instruction-level parallelism).
Two address adders are also specified, helping compute addresses in parallel and feed load/store pipelines more effectively.
Front-end and control flow: TLB, BTAC, BHT, branch prediction
The control-flow subsystem includes:
A 160-entry fully-associative, dual-ported TLB
A 32-entry BTAC
A 2048-entry BHT
Dynamic and static branch prediction modes
Practically, a large, highly associative TLB reduces translation misses (especially with large working sets), while BTAC/BHT and mixed prediction help limit pipeline bubbles caused by branches and calls, which strongly affect real performance on superscalar designs.
Queueing and reordering: instruction queue / reorder buffer
A 56-entry instruction queue / reorder buffer is specified. Operationally, this resource helps:
Absorb memory latency and temporary dependencies while keeping instructions “in flight”.
Improve superscalar effectiveness when reordering and out-of-order completion opportunities exist (as supported by the platform’s microarchitecture).
On-chip caches: very large L1, set-associative, 32/64-byte lines
The PA-8600 2.0 integrates on-chip L1 caches with the stated sizes:
0.51 MB instruction cache (I), 4-way set associative
1 MB data cache (D), 4-way set associative
Selectable cache line size 32 or 64 bytes
Also stated:
Quasi-LRU replacement policy for the instruction cache
Practically, L1 caches of this size significantly reduce main-memory traffic and improve predictability on server workloads, while associativity and replacement policy limit conflicts and thrashing on less regular access patterns.
Memory and system bus: Runway 125 MHZ, 64-bit, DDR, ~2 GB/s peak
The system/memory link is based on Runway, with:
125 MHZ, 64-bit, DDR
A stated peak bandwidth of about 2 GB/s
Practical implication: the bus is intended to keep the CPU fed in scenarios where L1 is insufficient (large datasets, heavy I/O, multiuser workloads), reducing wait time in the load/store pipelines.
Support for up to 1 TB of physically addressable memory is also specified, consistent with enterprise-class positioning.
Extensions and compatibility: MAX-2 and bi-endian
MAX-2 multimedia extensions
MAX-2 extensions are present for multimedia applications, with MPEG decoding given as an example. Practically, this implies instruction paths intended for vector-like or repetitive media-processing patterns, reducing cycles versus purely scalar routines.
Bi-endian support
Bi-endian support enables operation in little-endian or big-endian mode, useful in heterogeneous environments, migrations, and compatibility with software or devices that assume a specific endian format.
Frequency and voltage: up to ~550 MHZ at 2.0 V
A frequency of up to about 550 MHZ is specified with a 2.0 V core voltage. Practically, achieving such clocks also depends on thermal design and the overall platform (board, power delivery, chassis), in addition to the specific stepping.
Deployment systems
The PA-8600 (Landshark) CPU/platform is indicated as used in:
A400-5X
B2000, B2600
C3600
J5600, J6000, J7600
L1000-5X, L2000-5X
L1500-5X, L3000-5X
N4000-5X
V2600
Superdome
Stratus Continuum 439, 449, 651-2, 1251-2, 1252-2 (Stratus Technologies platforms)
Sketch of the most important connections
server/workstation platform (RAM, I/O, backplane) ┌──────────────────────────────────────────────────────────┐ │ system controller + memory + I/O │ │ RAM (up to 1 TB), storage, network, interrupts │ └───────────────────────────────┬──────────────────────────┘ │ Runway 64-bit DDR bus │ 125 MHZ ~2 GB/s peak ▼ ┌─────────────────────────────┐ │ HP 3AA1-1106 │ │ “Landshark” │ │ PA-8600 v2.0 64-bit │ │ L1 I 0.51 MB + D 1 MB │ └─────────────┬───────────────┘ │ ├────────► load/store pipelines (2 complete) └────────► integer and FP units (10 FUs)
Table 1 – Identification data and specifications
| Characteristic | Indicative value |
|---|---|
| Device | HP 3AA1-1106 |
| Codename | Landshark |
| Family / platform | PA-8600 (version 2.0) |
| Architecture | 64-bit |
| Stated frequency | Up to ~550 MHZ |
| Stated core voltage | 2.0 V |
| Superscalar | 4-way |
| Functional units | 10 (2 integer ALU, 2 shift/merge, 2 load/store, 2 FP mul/acc, 2 FP div/sqrt) |
| TLB | 160-entry fully-associative dual-ported |
| BTAC | 32-entry |
| BHT | 2048-entry |
| Branch prediction | Dynamic and static modes |
| On-chip L1 | I 0.51 MB 4-way + D 1 MB 4-way |
| Cache line | 32 or 64 bytes |
| Instruction queue / reorder buffer | 56 entries |
| Supported physical memory | Up to 1 TB |
| System/memory bus | Runway 125 MHZ, 64-bit, DDR, ~2 GB/s peak |
| Extensions | MAX-2, bi-endian |
Table 2 – Operational and design considerations
| Aspect | Practical meaning |
|---|---|
| 4-way superscalar + many FUs | Higher throughput when code exposes parallelism and manageable dependencies |
| 2 load/store pipelines + 2 address adders | Better ability to feed the CPU with data and addresses, reducing stalls |
| 160-entry fully-associative dual-ported TLB | Reduces translation misses and penalties on large working sets |
| BTAC/BHT + mixed prediction | Improves control flow and limits branch bubbles on deeper pipelines |
| L1 I 0.51 MB + D 1 MB | Reduces main-memory accesses and increases predictability on server workloads |
| 32/64-byte cache lines | Allows tuning between latency and locality utilization depending on workload |
| Quasi-LRU on I-cache | Reduces conflicts and unfavorable replacements on non-trivial fetch patterns |
| Runway DDR ~2 GB/s | Higher bandwidth to memory, useful when L1 does not hold the working set |
| MAX-2 + bi-endian | Accelerates media processing and helps integration/compatibility across environments |
| Up to ~550 MHZ @ 2.0 V | High performance target, dependent on platform power delivery and thermals |
| Evaluate |