IBM zseries Mainframes. ~ Development IBM Corporation Charles F. Webb

IBM zSeries Mainframes ~ Development IBM Corporation Charles F. Webb Evolution of z/Architecture z 1960s S/360: 24-bit address z 1970s S/370: virtu...
Author: Edward York
5 downloads 1 Views 399KB Size
IBM zSeries Mainframes ~ Development IBM Corporation Charles F. Webb

Evolution of z/Architecture z 1960s

S/360: 24-bit address z 1970s S/370: virtual address/DAT z 1980s 370/XA: 31-bit address, new I/O z 1990s ESA/390: multiple address spaces z 1998 added IEEE 754 Standard z 2000 z/Architecture: 64-bit address

Uni Processor Performance R e la tiv e P e rfo rm a n c e

C M O S -9 9 = 9 6 7 2 -G 6

200

C M O S -9 8 = 9 6 7 2 -G 5

B IP O L A R

100

CM OS

50

C M O S -9 7 = 9 6 7 2 -G 4 C M O S -9 6 = 9 6 7 2 -G 3

30 C M O S -9 5 = 9 6 7 2 -R 2 /R 3

20

C M O S -9 4 = 9 6 7 2 -R 1

10 9 2 2 1 -1 7 0

5

9 2 2 1 -1 5 0

3

P e rfo rm a n c e x 2 e v e ry 4 to 5 ye a rs

2

P e rfo rm a n c e x 2 e v e ry 1 2 -2 4 m o n th s 9 3 7 0 -5 0

1

Ye a r

70

75

80

85

90

95

2000

256K I-Cache

TLB

DAT Engine CoProcessor

Data

Inst Decode Buffer

BTB

Tags

Address

Inst Fetch

Inst Queue

Compare Checkpoint State

Inst/Cntl

Tags TLB

256K D-Cache

AGen

GRs

Operand Buffer

FXU Execution

CBus

FPRs

FPU Execution

ST Buf

Processor Organization

CP Pipeline IF,Dec,Agen,C1,C2,E1,WR I Fetch Ifetch request

ADDR Cache access

Data

I Buff

Decode / Op Fetch I REG

Operands

Decode B, X, D Op Req Address Cache Data Op Buff

Execute/Commit Instr

Exec Instr

I Queue

Result Stage

Check Stage Chkpt

CP Chip - 17.9mm x 9.9mm – 47 million transistors

I Cache 256 KB

BTB 2k x 4

I Unit A

I Cache Controls

D Cache Controls

Compression Translator I Unit B

FPU A

FXU A D Cache 256 KB

FXU B

R Unit

FPU B

Binodal L2 System STI MBA1

STI PU00 PU01 PU02 PU03 PU04 PU05 PU06 PU07 PU08 PU09

MBA0

16 byte bidi

SCD2,3

SCC0

SCD0,1 Between SCs 2 uni 16 byte

4 busses To MEM

CRP0 CRP1

SCD6,7

MBA3

STI

SCC1

SCD4,5

16 byte bidi

PU0A PU0B PU0C PU0D PU0E PU0F PU10 PU11 PU12 PU13

MBA2

STI

Millicode z

Licensed Internal Code layer for complex functions – System/control ops, interruptions, service operations, etc.

z

Variant of z/Architecture ISA – – – –

z

Unique GRs and ARs Includes all hardwired z/Architecture ops Modest set of millicode-only ops Access to all processor state via R-Unit

Millicode mode entered under hardware control – Mode-changing branch with minimal context switch

z

Uses same instruction pipeline as normal code – Minimal unique hardware

z

Enables architectural and design flexibility – New ops and features, workarounds – Full CISC support with manageable complexity

R-Unit z z

Focal point for hardware fault checking – Mirrored unit comparators and other checkers Buffers entire processor architected state – GRs, FPRs, ARs, CRs, PSW – Millicode CRs, SysRegs, Timing facility, etc.

z

Maintains CP checkpoint for recovery – Processor state protected via ECC or equivalent – Granularity: every HW instruction (regular or millicode)

z

Provides R/W access to processor state – Millicode special ops plus a few hardwired z/Arch ops – State mapped into 256 x 64-bit register space

R-Unit CBus-A CBus-B

Error Checkers

EU-A EU-B

ECC Gen Check

ECC Gen Check

Buffer

Buffer

Recovery Control

Write Address

IU-A IU-B

IU/EU-A IU/EU-B

Read Address

Checkpoint StateRegisters

Timing Facility

System Registers

Inst Addr PSW

Async Interrupts

Processor State EU-A

System Operations

EU-B

Fault Checking z

Combination of checking schemes used – – – –

z

Mirrored units: complex logic and dataflow Parity check: byte-coherent dataflow, BTB, etc. Functional / state checks: cache controls, co-processor ECC / duplicate parity: checkpoint state in R-Unit

All processor state updates sent to R-Unit – Checked on hardware-instruction granularity

z

Results committed to checkpoint only if clean – All mirrored compares equal – No faults detected anywhere in processor

z

Target: near-100% detection of hardware faults – Both hard/permanent and soft/transient varieties

CP Chip - Checking Strategy by Unit

I Cache 256 KB

BTB 2k x 4

I Unit A

I Cache Controls

D Cache Controls

Compression Translator I Unit B

FPU A

FXU A D Cache 256 KB

FXU B

R Unit

FPU B

zSeries RAS Priorities 1. Ensure data integrity • •

Requires ~100% error detection zSeries is industry leader

2. Keep applications on the air • • •

Whenever #1 is not compromised Requires fine-grained recovery zSeries is industry leader

3. Repair on-line • ¾

Primarily 2nd level packaging constraint

Crash and Re-boot is not good enough!

Fault Recovery I-Unit (unchecked)

Cache (parity)

I-Unit (mirror)

E-Unit

E-Unit

(mirror)

(unchecked)

R-Unit (ECC on saved state)

Check all state updates Preserve known good state If error Stop state updates Refresh from saved state Restart CPU If error persists Extract saved state (SE) Load into spare CPU Start spare CPU

Address Cache data Instructions Results / state updates Saved state data

R-Unit CBus-A CBus-B

Error Checkers

EU-A EU-B

ECC Gen Check

ECC Gen Check

Buffer

Buffer

Recovery Control

Write Address

IU-A IU-B

IU/EU-A IU/EU-B

Read Address

Checkpoint StateRegisters

Timing Facility

System Registers

Inst Addr PSW

Async Interrupts

Processor State EU-A

System Operations

EU-B

Dynamic CPU Sparing Operating CPU

Service Processor

Spare CPU

Check all state updates Preserve known good state If error Stop state updates Refresh from saved state Restart CPU If error persists Signal service processor

Extract saved state from CPU Process CPU state Adjust CPU numbers Check for special conditions Store CPU state in memory Signal Spare CPU

Wait in idle loop until needed Load CPU state from buffer Special CPU instruction Replace R-Unit contents Refresh CPU state Restart CPU with new state

I-Unit (unchecked)

Cache (parity)

I-Unit (mirror)

E-Unit

E-Unit

(unchecked)

(mirror)

R-Unit (ECC on saved state)

System Memory

I-Unit

State Buffer

(unchecked)

Cache (parity)

E-Unit

E-Unit

(unchecked)

R-Unit

Service Processor

I-Unit (mirror)

(ECC on saved state)

(mirror)

Other RAS Features z

CP Arrays (caches / tags / TLBs / BTB) – Data stored through to L2 to get ECC protection – Line and set deletion for persistent array faults

z

L2 and Memory – ECC on arrays and busses – Retry on failing commands – DRAM chip sparing

z

I/O Subsystem – Multiple paths to devices – Multiple identical hubs / channels – Retry on failing commands

z

Power / Cooling / Service – N+1 redundancy

Conclusion z Custom

CISC Microprocessor z Durable Design Point z Industry-Leading RAS z More to Come

Want to know more? z z z z z

z

C.F.Webb & J.S.Liptay, “A High-frequency Custom CMOS S/390 Microprocessor”, IBM Journal of R&D, July/September 1997 T.J.Slegel et al., “IBM’s S/390 G5 Microprocessor”, IEEE Micro, March/April 1999 C.F.Webb, “ S/390 Microprocessor Design”, IBM Journal of R&D, November 2000 E.M.Schwarz et al., “The Microarchitecture of the IBM eServer z900”, IBM Journal of R&D, July/September 2002 K.E.Plambeck et al., “Development and Attributes of z/Architecture”, IBM Journal of R&D, July/September 2002 Questions? [email protected]

Thanks for Listening!