Hello, Guest!
 
 

 
 
  Objects Tiiips Categories
HP 3AA1-1106
"Descrizione"
by Radar (1854 pt)
2026-Feb-03 12:22

HP 3AA1-1106 “Landshark” – CPU for RISC PA-8600 (v2.0) 64-bit systems, with large on-chip L1 caches and a Runway DDR bus

Definition

The HP 3AA1-1106, code-named Landshark, is a CPU developed by Hewlett-Packard for PA-8600 RISC systems, conceptually an evolution of the PA-8500 with modifications aimed at sustaining higher operating frequency. Another major change concerns the cache hierarchy, with a more aggressive approach and a very large on-chip L1 footprint.

The PA-8600 2.0 version is described as a 64-bit platform and integrates architectural resources focused on high throughput: superscalar issue, many functional units, advanced branch prediction, and a high-bandwidth memory subsystem via the Runway bus.

Evolution from PA-8500: frequency and cache

A “PA-8500-like” design with changes for higher clocks typically translates into two practical design outcomes:

  • Greater attention to the internal pipeline and power/clocking in order to sustain higher frequencies.

  • Rebalancing of the cache subsystem, with very large on-chip L1 to reduce pressure on main memory and improve performance stability under server workloads.


Execution architecture: 4-way superscalar and 10 functional units

The PA-8600 2.0 is described as 4-way superscalar, meaning it can issue/handle multiple instructions per cycle when the instruction stream and dependencies allow it. The internal organization includes 10 functional units:

  • 2 integer ALUs

  • 2 shift/merge units

  • 2 complete load/store pipelines

  • 2 floating-point multiply/accumulate units

  • 2 floating-point divide/square-roots units

Practically, this unit mix is meant to sustain parallelism on mixed code (integer, memory, floating point), reducing bottlenecks when the compiler and workload expose enough ILP (instruction-level parallelism).

Two address adders are also specified, helping compute addresses in parallel and feed load/store pipelines more effectively.


Front-end and control flow: TLB, BTAC, BHT, branch prediction

The control-flow subsystem includes:

  • A 160-entry fully-associative, dual-ported TLB

  • A 32-entry BTAC

  • A 2048-entry BHT

  • Dynamic and static branch prediction modes

Practically, a large, highly associative TLB reduces translation misses (especially with large working sets), while BTAC/BHT and mixed prediction help limit pipeline bubbles caused by branches and calls, which strongly affect real performance on superscalar designs.


Queueing and reordering: instruction queue / reorder buffer

A 56-entry instruction queue / reorder buffer is specified. Operationally, this resource helps:

  • Absorb memory latency and temporary dependencies while keeping instructions “in flight”.

  • Improve superscalar effectiveness when reordering and out-of-order completion opportunities exist (as supported by the platform’s microarchitecture).


On-chip caches: very large L1, set-associative, 32/64-byte lines

The PA-8600 2.0 integrates on-chip L1 caches with the stated sizes:

  • 0.51 MB instruction cache (I), 4-way set associative

  • 1 MB data cache (D), 4-way set associative

  • Selectable cache line size 32 or 64 bytes

Also stated:

  • Quasi-LRU replacement policy for the instruction cache

Practically, L1 caches of this size significantly reduce main-memory traffic and improve predictability on server workloads, while associativity and replacement policy limit conflicts and thrashing on less regular access patterns.


Memory and system bus: Runway 125 MHZ, 64-bit, DDR, ~2 GB/s peak

The system/memory link is based on Runway, with:

  • 125 MHZ, 64-bit, DDR

  • A stated peak bandwidth of about 2 GB/s

Practical implication: the bus is intended to keep the CPU fed in scenarios where L1 is insufficient (large datasets, heavy I/O, multiuser workloads), reducing wait time in the load/store pipelines.

Support for up to 1 TB of physically addressable memory is also specified, consistent with enterprise-class positioning.


Extensions and compatibility: MAX-2 and bi-endian

MAX-2 multimedia extensions
MAX-2 extensions are present for multimedia applications, with MPEG decoding given as an example. Practically, this implies instruction paths intended for vector-like or repetitive media-processing patterns, reducing cycles versus purely scalar routines.

Bi-endian support
Bi-endian support enables operation in little-endian or big-endian mode, useful in heterogeneous environments, migrations, and compatibility with software or devices that assume a specific endian format.


Frequency and voltage: up to ~550 MHZ at 2.0 V

A frequency of up to about 550 MHZ is specified with a 2.0 V core voltage. Practically, achieving such clocks also depends on thermal design and the overall platform (board, power delivery, chassis), in addition to the specific stepping.


Deployment systems

The PA-8600 (Landshark) CPU/platform is indicated as used in:

A400-5X
B2000, B2600
C3600
J5600, J6000, J7600
L1000-5X, L2000-5X
L1500-5X, L3000-5X
N4000-5X
V2600
Superdome
Stratus Continuum 439, 449, 651-2, 1251-2, 1252-2 (Stratus Technologies platforms)


Sketch of the most important connections

server/workstation platform (RAM, I/O, backplane) ┌──────────────────────────────────────────────────────────┐ │ system controller + memory + I/O │ │ RAM (up to 1 TB), storage, network, interrupts │ └───────────────────────────────┬──────────────────────────┘ │ Runway 64-bit DDR bus │ 125 MHZ ~2 GB/s peak ▼ ┌─────────────────────────────┐ │ HP 3AA1-1106 │ │ “Landshark” │ │ PA-8600 v2.0 64-bit │ │ L1 I 0.51 MB + D 1 MB │ └─────────────┬───────────────┘ │ ├────────► load/store pipelines (2 complete) └────────► integer and FP units (10 FUs)

Table 1 – Identification data and specifications

CharacteristicIndicative value
DeviceHP 3AA1-1106
CodenameLandshark
Family / platformPA-8600 (version 2.0)
Architecture64-bit
Stated frequencyUp to ~550 MHZ
Stated core voltage2.0 V
Superscalar4-way
Functional units10 (2 integer ALU, 2 shift/merge, 2 load/store, 2 FP mul/acc, 2 FP div/sqrt)
TLB160-entry fully-associative dual-ported
BTAC32-entry
BHT2048-entry
Branch predictionDynamic and static modes
On-chip L1I 0.51 MB 4-way + D 1 MB 4-way
Cache line32 or 64 bytes
Instruction queue / reorder buffer56 entries
Supported physical memoryUp to 1 TB
System/memory busRunway 125 MHZ, 64-bit, DDR, ~2 GB/s peak
ExtensionsMAX-2, bi-endian


Table 2 – Operational and design considerations

AspectPractical meaning
4-way superscalar + many FUsHigher throughput when code exposes parallelism and manageable dependencies
2 load/store pipelines + 2 address addersBetter ability to feed the CPU with data and addresses, reducing stalls
160-entry fully-associative dual-ported TLBReduces translation misses and penalties on large working sets
BTAC/BHT + mixed predictionImproves control flow and limits branch bubbles on deeper pipelines
L1 I 0.51 MB + D 1 MBReduces main-memory accesses and increases predictability on server workloads
32/64-byte cache linesAllows tuning between latency and locality utilization depending on workload
Quasi-LRU on I-cacheReduces conflicts and unfavorable replacements on non-trivial fetch patterns
Runway DDR ~2 GB/sHigher bandwidth to memory, useful when L1 does not hold the working set
MAX-2 + bi-endianAccelerates media processing and helps integration/compatibility across environments
Up to ~550 MHZ @ 2.0 VHigh performance target, dependent on platform power delivery and thermals


Evaluate