Joel Emer November 30, 2005 6.823, L22-1
Vector Computers
Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind
Joel Emer November 30, 2005 6.823, L22-2
Supercomputers Definition of a supercomputer: • Fastest machine in world at given task • A device to turn a compute-bound problem into an I/O bound problem • Any machine costing $30M+ • Any machine designed by Seymour Cray CDC6600 (Cray, 1964) regarded as first supercomputer
Joel Emer November 30, 2005 6.823, L22-3
Supercomputer Applications
Typical application areas • Military research (nuclear weapons, cryptography) • Scientific research • Weather forecasting • Oil exploration • Industrial design (car crash simulation) • Bioinformatics • Cryptography All involve huge computations on large data sets In 70s-80s, Supercomputer ≡ Vector Machine
Loop Unrolled Code Schedule
loop: ld f1, 0(r1) ld f2, 8(r1) ld f3, 16(r1) ld f4, 24(r1) add r1, 32 fadd f5, f0, f1 fadd f6, f0, f2 fadd f7, f0, f3 fadd f8, f0, f4 sd f5, 0(r2) sd f6, 8(r2) sd f7, 16(r2) sd f8, 24(r2) add r2, 32 bne r1, r3, loop
Int1
Int 2
loop:
M1
add r1
ld f1 ld f2 ld f3 ld f4
add r2
sd f5 sd f6 sd f7 sd f8
Schedule
bne
M2
FP+
fadd f5 fadd f6 fadd f7 fadd f8
Joel Emer November 30, 2005 6.823, L22-4
FPx
Joel Emer November 30, 2005 6.823, L22-5
Vector Supercomputers Epitomized by Cray-1, 1976: • Scalar Unit – Load/Store Architecture
• Vector Extension – Vector Registers
– Vector Instructions
• Implementation – – – – –
Hardwired Control Highly Pipelined Functional Units Interleaved Memory System
No Data Caches
No Virtual Memory
Cray-1 (1976)
Core unit of the Cray 1 computer Image removed due to copyright restrictions. To view image, visit http://www.craycyber.org/memory/scray.php.
Joel Emer November 30, 2005 6.823, L22-6
Joel Emer November 30, 2005 6.823, L22-7
Cray-1 (1976) 64 Element Vector Registers
Single Port Memory 16 banks of 64-bit words + 8-bit SECDED
( (Ah) + j k m ) (A0)
64 T Regs
Si Tjk
V0 V1 V2 V3 V4 V5 V6 V7 S0 S1 S2 S3 S4 S5 S6 S7
Vi
V. Mask
Vj
V. Length
Vk
FP Add Sj
FP Mul
Sk
FP Recip
Si
Int Add Int Logic Int Shift
80MW/sec data load/store
( (Ah) + j k m ) (A0)
320MW/sec instruction buffer refill
64 B Regs
Ai Bjk
NIP
64-bitx16
4 Instruction Buffers
memory bank cycle 50 ns
A0 A1 A2 A3 A4 A5 A6 A7
Pop Cnt Aj Ak Ai
Addr Add Addr Mul
CIP
LIP
processor cycle 12.5 ns (80MHz)
Joel Emer November 30, 2005 6.823, L22-8
Vector Programming Model Scalar Registers
r15
v15
r0
v0
Vector Registers
[0]
[1]
[2]
[VLRMAX-1]
Vector Length Register
Vector Arithmetic Instructions ADDV v3, v1, v2
v1 v2 v3
Vector Load and Store Instructions LV v1, r1, r2 Base, r1
VLR
Stride, r2
+
+
[0]
[1]
v1
+
+
+
+
[VLR-1] Vector Register
Memory
Joel Emer November 30, 2005 6.823, L22-9
Vector Code Example
# Scalar Code # Vector Code # C code LI R4, 64 LI VLR, 64 for (i=0; i fast clock) to execute element operations • Simplifies control of deep pipeline because elements in vector are independent (=> no hazards!)
V 1
V 2
V 3
Six stage multiply pipeline
V3