COMPUTER ARCHITECTURE VS. INSTRUCTION SET ARCHITECTURE

CMSC 411 Computer Systems Architecture Lecture 4 MIPS ISA & Basic Pipelining COMPUTER ARCHITECTURE VS. INSTRUCTION SET ARCHITECTURE CMSC 411 - 1 CS...
Author: Avice Watts
0 downloads 0 Views 136KB Size
CMSC 411 Computer Systems Architecture Lecture 4 MIPS ISA & Basic Pipelining

COMPUTER ARCHITECTURE VS. INSTRUCTION SET ARCHITECTURE

CMSC 411 - 1

CS252 S05

2

Instruction Set Architecture: Critical Interface software

instruction set

hardware

• Properties of a good abstraction – – – –

Lasts through many generations (portability) Used in many different ways (generality) Provides convenient functionality to higher levels Permits an efficient implementation at lower levels CMSC 411 - 1

3

Example: MIPS r0 r1 ° ° ° r31 PC

0

Programmable storage

Data types ?

2^32 x bytes

Format ?

31 x 32-bit GPRs (R0=0)

Addressing Modes?

32 x 32-bit FP regs (paired DP) PC

Arithmetic logical Add, AddU, Sub, SubU, And, Or, Xor, SLT, SLTU, AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, SLL, SRL, SRA

Memory Access LB, LBU, LH, LHU, LW, SB, SH, SW

Control

32-bit instructions on word boundary

J, JAL, JR, JALR BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ CMSC 411 - 1

CS252 S05

4

Instruction Set Architecture (ISA) “... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.” – Amdahl, Blaauw, and Brooks, 1964 SOFTWARE -- Organization of Programmable Storage -- Data Types & Data Structures: Encodings & Representations -- Instruction Formats -- Instruction (or Operation Code) Set -- Modes of Addressing and Accessing Data Items and Instructions -- Exceptional Conditions CMSC 411 - 1

5

ISA vs. Computer Architecture • Old definition of computer architecture = instruction set design – Other aspects of computer design called implementation – Insinuates implementation is uninteresting or less challenging

• H&P’s view is computer architecture >> ISA • Architect’s job much more than instruction set design; technical hurdles today more challenging than those in instruction set design • Since instruction set design not where action is, some conclude computer architecture (using old definition) is not where action is – H&P disagree on conclusion – Agree that ISA not where action is (ISA in CA:AQA 4/e appendix)

CMSC 411 - 1

CS252 S05

6

Computer Architecture Is An Integrated Approach • What really matters is the functioning of the complete system – hardware, runtime system, compiler, operating system, and application – In networking, this is called the “End to End argument”

• Computer architecture is not just about transistors, individual instructions, or particular implementations – E.g., Original RISC projects replaced complex instructions with a compiler + simple instructions

CMSC 411 - 1

7

Computer Architecture Is Design & Analysis Design

Architecture is an iterative process: • Searching the space of possible designs • At all levels of computer systems

Analysis

Creativity Cost / Performance Analysis

Good Ideas

Bad Ideas

Mediocre Ideas

CMSC 411 - 1

CS252 S05

8

MIPS INSTRUCTION SET ARCHITECTURE

CMSC 411 - 1

9

A "Typical" RISC ISA • • • •

32-bit fixed format instruction (4 formats) 32 32-bit GPR (R0 contains zero, DP take pair) 3-address, reg-reg arithmetic instruction Single address mode for load/store: base + displacement – no indirection

• Simple branch conditions • Delayed branch see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

CMSC 411 - 3 (from Patterson)

CS252 S05

10

Example: MIPS Register-Register 31

26 25

op

21 20

rs

16 15

rt

11 10

6 5

rd

0

opx

rd ← rs OP rt Register-Immediate 31

26 25

op

21 20

rs

16 15

0

immediate

rt

rt ← rs OP immed Jump / Call 31

26 25

0

target

op

CMSC 411 - 3 (from Patterson)

11

5 Steps of MIPS Datapath Instruction Fetch

Instr. Decode Reg. Fetch

Execute Addr. Calc

Adder

4

Write Back

MUX

Next PC

Memory Access

Next SEQ PC Zero? RS

L M D

MUX

Data Memory

ALU

Imm

MUX MUX

RD

Reg File

Inst

Memory

Address

IR ← mem[PC];

RT

Sign Extend

PC ← PC + 4 Reg[IRrd] ← Reg[IRrs] opIRop Reg[IRrt]

WB Data

CMSC 411 - 4 (from Patterson)

CS252 S05

12

5 Steps of MIPS Datapath Instruction Fetch

Execute Addr. Calc

Instr. Decode Reg. Fetch Next SEQ PC

Next SEQ PC

Adder

Zero? RS

MUX

MEM/WB

Data Memory

EX/MEM

ALU

A ← Reg[IRrs]; B ← Reg[IRrt]

MUX MUX

Imm

ID/EX

Reg File

IF/ID

Memory

Address

IR ← mem[PC]; PC ← PC + 4

RT

WB Data

4

Write Back

MUX

Next PC

Memory Access

Sign Extend

RD

RD

RD

rslt ← A opIRop B WB ← rslt Reg[IRrd] ← WB CMSC 411 - 4 (from Patterson)

13

Instruction Set Processor Controller IR

← mem[PC];

Ifetch

PC ← PC + 4

opFetch-DCD

A ← Reg[IRrs]; B ← Reg[IRrt]

br if bop(A,b)

jmp PC ← IRjaddr

RI

RR r ← A opIRop B

LD

r ← A opIRop IRim

r ← A + IRim

WB ← r

WB ← Mem[r]

PC ← PC+IRim WB ← r

Reg[IRrd] ← WB

Reg[IRrt] ← WB

CMSC 411 - 4 (from Patterson)

CS252 S05

Reg[IRrt] ← WB

14

Visualizing Pipelining Time (clock cycles)

Ifetch

DMem

Reg

DMem

Reg

DMem

Reg

ALU

Reg

ALU

O r d e r

Ifetch

ALU

I n s t r.

ALU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Ifetch

Ifetch

Reg

Reg

Reg

DMem

CMSC 411 - 4 (from Patterson)

Reg

15

Pipelining Is Not Quite That Easy!

• Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle – Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) – Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock) – Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).

CMSC 411 - 4 (from Patterson)

CS252 S05

16

One Memory Port/Structural Hazards Time (clock cycles)

Reg

DMem

Reg

DMem

Reg

ALU

DMem

Reg

ALU

Ifetch

DMem

ALU

O r d e r

Reg

ALU

I Load Ifetch n s Instr 1 t r.

ALU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Ifetch

Instr 2

Ifetch

Instr 3

Reg

Ifetch

Instr 4

Reg

Reg

Reg

Reg

DMem

CMSC 411 - 4 (from Patterson)

17

One Memory Port/Structural Hazards Time (clock cycles)

Stall

Reg

DMem

Reg

ALU

Instr 2

Ifetch

DMem

Ifetch

Bubble

Reg

DMem

Bubble Bubble

Ifetch

Instr 3

Reg

Reg

Reg

Bubble ALU

O r d e r

Reg

ALU

I Load Ifetch n s Instr 1 t r.

ALU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Bubble

Reg

DMem

How do you “bubble” the pipe? CMSC 411 - 4 (from Patterson)

CS252 S05

18

Speed Up Equation for Pipelining

CPIpipelined = Ideal CPI + Average Stall cycles per Inst

Speedup =

Cycle Timeunpipelined Ideal CPI × Pipeline depth × Ideal CPI + Pipeline stall CPI Cycle Timepipelined

For simple RISC pipeline, ideal CPI = 1: Speedup =

Cycle Timeunpipelined Pipeline depth × 1 + Pipeline stall CPI Cycle Timepipelined

CMSC 411 - 4 (from Patterson)

19

Example: Dual-port vs. Single-port • Machine A: Dual ported memory (“Harvard Architecture”) • Machine B: Single ported memory, but its pipelined implementation has a 1.05 times faster clock rate • Ideal CPI = 1 for both • Loads are 40% of instructions executed SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe) = Pipeline Depth SpeedUpB = Pipeline Depth/(1 + 0.4 x 1) x (clockunpipe/(clockunpipe / 1.05) = (Pipeline Depth/1.4) x 1.05 = 0.75 x Pipeline Depth SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33

• Machine A is 1.33 times faster

CMSC 411 - 4 (from Patterson)

CS252 S05

20

Data Hazard on R1 Time (clock cycles)

or

Ifetch

DMem

Reg

DMem

Reg

DMem

Reg

DMem

Reg

ALU

and r6,r1,r7

sub r4,r1,r3

Reg

ALU

O r d e r

Ifetch

ALU

add r1,r2,r3

WB

ALU

I n s t r.

MEM

ALU

IF ID/RF EX

Ifetch

Ifetch

r8,r1,r9

xor r10,r1,r11

Reg

Ifetch

Reg

Reg

CMSC 411 - 4 (from Patterson)

Reg

DMem

21

Three Generic Data Hazards • Read After Write (RAW) InstrJ tries to read operand before InstrI writes it I: add r1,r2,r3 J: sub r4,r1,r3 • Caused by a “true / flow dependence” (in compiler nomenclature). This hazard results from an actual need for communication.

CMSC 411 - 4 (from Patterson)

CS252 S05

22

Reg

Three Generic Data Hazards • Write After Read (WAR) InstrJ writes operand before InstrI reads it I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 • Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”. • Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Reads are always in stage 2, and – Writes are always in stage 5 CMSC 411 - 4 (from Patterson)

23

Three Generic Data Hazards • Write After Write (WAW) InstrJ writes operand before InstrI writes it. I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7 • Called an “output dependence” by compiler writers This also results from the reuse of name “r1”. • Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Writes are always in stage 5 • Will see WAR and WAW in more complicated pipes CMSC 411 - 4 (from Patterson)

CS252 S05

24

Forwarding to Avoid Data Hazard

DMem

Reg

DMem

Reg

DMem

Reg

Ifetch

and r6,r1,r7 or

Reg

ALU

Ifetch

DMem

ALU

sub r4,r1,r3

Reg

ALU

O r d e r

add r1,r2,r3 Ifetch

ALU

I n s t r.

ALU

Time (clock cycles)

Ifetch

r8,r1,r9

Reg

Ifetch

xor r10,r1,r11

Reg

Reg

CMSC 411 - 4 (from Patterson)

Reg

DMem

Reg

25

HW Change for Forwarding NextPC

mux MEM/WR

EX/MEM

ALU

mux

ID/EX

Registers

Data Memory

mux

Immediate

CMSC 411 - 4 (from Patterson)

CS252 S05

26

Forwarding to Avoid LW-SW Data Hazard

DMem

Reg

DMem

Reg

DMem

Reg

Ifetch

sw r4,12(r1) or

Reg

ALU

Ifetch

DMem

ALU

lw r4, 0(r1)

Reg

ALU

O r d e r

add r1,r2,r3 Ifetch

ALU

I n s t r.

ALU

Time (clock cycles) Reg

Ifetch

r8,r6,r9

Reg

Ifetch

xor r10,r9,r11

Reg

Reg

DMem

Reg

27

CMSC 411 - 5 (from Patterson)

Data Hazard Even with Forwarding

and r6,r1,r7 or

Ifetch

DMem

Reg

DMem

Reg

Ifetch

Ifetch

r8,r1,r9 CMSC 411 - 5 (from Patterson)

CS252 S05

Reg

Reg

Reg

DMem

ALU

O r d e r

sub r4,r1,r6

Reg

ALU

lw r1, 0(r2) Ifetch

ALU

I n s t r.

ALU

Time (clock cycles)

Reg

Reg

DMem

28

Data Hazard Even with Forwarding

Reg

Ifetch

and r6,r1,r7 or r8,r1,r9

DMem

Reg

Reg

Bubble

Ifetch

Bubble

Reg

Bubble

Ifetch

DMem

Reg

Reg

Reg

DMem

ALU

sub r4,r1,r6

Ifetch

ALU

O r d e r

lw r1, 0(r2)

ALU

I n s t r.

ALU

Time (clock cycles)

DMem

29

CMSC 411 - 5 (from Patterson)

Software Scheduling Instead Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d ,e, and f in memory. Slow code: LW LW ADD SW LW LW SUB SW

Rb,b Rc,c Ra,Rb,Rc a,Ra Re,e Rf,f Rd,Re,Rf d,Rd

Fast code: LW LW LW ADD LW SW SUB SW

Rb,b Rc,c Re,e Ra,Rb,Rc Rf,f a,Ra Rd,Re,Rf d,Rd

Compiler optimizes for performance. Hardware checks for safety. CMSC 411 - 5 (from Patterson)

CS252 S05

30