CPSC 330 Computer Organization. Chapter 6-II Pipelining

CPSC 330 Computer Organization Chapter 6-II Pipelining 1 Problem 6.2 n n n n A computer architect needs to design the pipeline for a new microp...
Author: Kelley Murphy
20 downloads 0 Views 199KB Size
CPSC 330 Computer Organization Chapter 6-II Pipelining

1

Problem 6.2 n

n

n

n

A computer architect needs to design the pipeline for a new microprocessor. The workload program core consists of 1E6 instructions. Each instruction takes 100 ps to finish. How long does it take to execute this program on a non-pipelined processor? The current processor has about 20 pipelined stages. Assuming it is perfectly pipelined, what potential speedup will it achieve compared to a non-pipelined processor? Pipelining introduces overhead per pipeline stages. Will this affect instruction latency, instruction throughput or both? CPSC330 CompOrg: Dr. Gerousis

pipelining 2

2

Hazards 3 types of pipelining hazards structural hazards: attempt to use the same resource two different ways at the same time –A register is used to write back to at the same time the same register is used to ‘put bits’ into from the current instruction in the decode cycle. n data hazards: attempt to use a register before its proper value is ready. –instruction depends on result of prior instruction still in pipeline n control hazards: attempt to make a decision but the information needed to make the decision is not available yet. –Branch instructions n

CPSC330 CompOrg: Dr. Gerousis

pipelining 3

3

Hazards - Review n

For R-type instructions there are 4 possible conflicts – – – –

1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt IF/ID

sub $2, $1, $3 and $12, $2, $5 Rs

IM

ID/EX EX/MEM MEM/WB DM

Reg

IM

Reg

IF/ID

Reg

DM

Reg

ID/EX EX/MEM MEM/WB

CPSC330 CompOrg: Dr. Gerousis

pipelining 4

4

Hazards - Review n

For R-type instructions there are 4 possible conflicts – – – –

1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt IF/ID

sub $2, $1, $3 and $12, $5, $2 Rt

IM

ID/EX EX/MEM MEM/WB DM

Reg

IM

Reg

IF/ID

Reg

DM

Reg

ID/EX EX/MEM MEM/WB

CPSC330 CompOrg: Dr. Gerousis

pipelining 5

5

Hazards - Review n

For R-type instructions there are 4 possible conflicts – – – –

1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt IF/ID

sub $2, $1, $3 and $12, $2, $5 or $13, $2, $6 Rs

IM

ID/EX EX/MEM MEM/WB DM

Reg

IM

Reg

DM

Reg

IM

Reg

IF/ID

Reg

DM

Reg

ID/EX EX/MEM MEM/WB

CPSC330 CompOrg: Dr. Gerousis

pipelining 6

6

Hazards - Review n

For R-type instructions there are 4 possible conflicts – – – –

1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt IF/ID

sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 Rt

IM

ID/EX EX/MEM MEM/WB DM

Reg

IM

Reg

DM

Reg

IM

Reg

IF/ID

Reg

DM

Reg

ID/EX EX/MEM MEM/WB

CPSC330 CompOrg: Dr. Gerousis

pipelining 7

7

Data Hazard - Problem IF/ID

ADD

R1, R2, R3

SUB

R4, R5, R1

OR

R8, R3, R9

ADD

R3, R2, R4

IM

ID/EX EX/MEM MEM/WB Reg

IM

DM

DM

Reg

IM

Reg

DM

Reg

IM

Reg

Reg

Reg

DM

Reg

v Identify the instruction(s) affected by the data hazard. v Fix the hazard by inserting ‘nops’ ß note: not efficient

CPSC330 CompOrg: Dr. Gerousis

pipelining 8

8

Fixing the data hazard using software: ‘NOPS’ IM

Reg

IM

DM

DM

Reg

IM

Reg

DM

Reg

IM

Reg

DM

Reg

IM

Reg

Reg

IM

Reg

DM

DM

Reg

IM

Reg

Reg

CPSC330 CompOrg: Dr. Gerousis

Reg

DM

Reg

pipelining 9

9

Fixing the data hazard using data forwarding n

n

n

n n

Use temporary results, don’t wait for them to be written Pipeline registers have data too! “sub” ALU operation completes prior to “and” execution Same for “or” No conflict with “add”, since write completes before read in same cycle

T ime (in cloc k cycle s) P ro gra m

CC 1

CC 2

IM

Reg

CC 3

CC 4

CC 5

DM

Reg

CC 6

CC 7

CC 8

CC 9

e xecu tio n o rd e r (in in stru ction s)

su b $ 2, $ 1, $3

a nd $1 2 , $ 2, $ 5

o r $1 3, $6 , $ 2

a dd $1 4 , $2 , $2

IM

R eg

IM

DM

Re g

IM

R eg

DM

Reg

R eg

DM

Reg

sw $ 15 , 1 0 0($2 ) IM

CPSC330 CompOrg: Dr. Gerousis

R eg

DM

R eg

pipelining 10

10

Fixing data hazard Hardware for data forwarding n

The main idea (some details not shown) ID /E X

WB

C o n tro l

PC

In s tr u c tio n

I n str u ct i o n

I F/ ID

E X /M E M

M

WB

EX

M

MEM/ W B

WB

M u x R e g i s te rs A LU

m em or y

D a ta m em or y

M u x

M u x IF /ID .R e g i s te r R s

Rs

IF /ID .R e g i s te r R t

Rt

IF /ID .R e g i s te r R t

Rt

IF /ID .R e g i s te r R d

Rd

M u x

E X /M E M .R e g i s te rR d

F o r w a rd in g unit

CPSC330 CompOrg: Dr. Gerousis

M E M /W B .R e g is te r R d

pipelining 11

11

Can't always forward n

Load word can still cause a hazard: – an instruction (and) tries to read a register following a load instruction that writes to the same register. – ‘lw’ & ‘and’ dependence goes backwards in time.

n

Thus, we need a hazard detection unit to “stall” the load instruction

Time (in clock cycles) CC 1 CC 2

CC 3

CC 4

CC 5

DM

Reg

CC 6

CC 7

CC 8

CC 9

Program execution order (in instructions) lw $2, 20($1)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

IM

Reg

IM

Reg

IM

DM

Reg

IM

Reg

DM

Reg

Reg

DM

Reg

slt $1, $6, $7 IM

CPSC330 CompOrg: Dr. Gerousis

Reg

DM

Reg

pipelining 12

12

Data Hazards and “stalls” n n

By stalling instructions in the pipe 1 cycle, dependence is gone A “stall” is said to inject a “bubble” or “nop” into the pipe Time (in c lock cy cles) CC 1 CC 2 CC 3

CC 4

CC 5

R eg

DM

R eg

CC 6

CC 7

CC 8

CC 9

CC 10

Program execution order (in instructions) lw $2, 20($1)

IM

bubble and becomes nop

an d $4, $2, $5

or $8, $2, $6

add $9, $4, $2

IM

Reg

IM

DM

Reg

IM

Reg

DM

DM

R eg

IM

Reg

Reg

CPSC330 CompOrg: Dr. Gerousis

Reg

DM

Reg

pipelining 13

13

Detecting “lw” hazard & injecting a “bubble” (nop) ID /EX .M e mR e ad

IF/ID Write

H a za rd d e te cti on un i t

ID /EX WB M u x

C on tro l

If (ID/EX.MemRead AND ((ID/EX.RegRt = IF/ID.RegRs) OR (ID/EX.RegRt = IF/ID.RegRt))), then stall the pipeline EX /ME M

M

WB

EX

M

M EM /WB

0

PC

Ins tr uct ion m em o ry

In stru ct ion

PC Wri te

IF/ID

WB

M u x R eg is ters AL U

Da ta me m or y

M u x

M u x

IF /ID .Re g is te rR s

Inject a“bubble”, à a “nop” in the pipe, set control lines to 0

IF /ID .Re g is te rR t IF /ID .Re g is te rR t

Rt

IF /ID .Re g is te rR d

Rd

ID /EX .R e g iste rR t

Rs Rt

M u x

EX/MEM .R e gi ste rR d

Fo rw ar d ing u nit

CPSC330 CompOrg: Dr. Gerousis

MEM /WB .R e giste rRd

pipelining 14

14

Branch Hazards n

When we decide to branch, other instructions are in the pipeline!

Time (in clo ck cycles)

Assuming “branch not taken” (simple form of branch prediction) – need to ‘flush’ instructions, if we are wrong – add control line, IF.flush

CC 2

IM

R eg

CC 3

CC 4

CC 5

DM

Re g

CC 6

CC 7

CC 8

CC 9

Pro gram execut io n order (in in st ru ctio ns) 40 beq $1 , $3 , 28

n

CC 1

44 and $1 2, $ 2, $ 5

48 or $1 3, $ 6, $ 2

52 add $1 4, $ 2, $ 2

IM

R eg

IM

DM

R eg

IM

Re g

DM

Re g

Reg

DM

Reg

72 lw $ 4, 5 0($7) IM

CPSC330 CompOrg: Dr. Gerousis

Reg

DM

R eg

pipelining 15

15

Control Hazards branch prediction n

n

n n

n

n

It’s easy to continue executing sequentially, assuming or predicting “branch not taken” If prediction fails, simply flush pipe and inject a bubble, and take branch However, what if the majority of time “branch taken” We could record the result of branch condition, for future branches A “branch prediction buffer” or “branch history table” could provide this dynamically Hence, dynamic branch prediction: Prediction of branches at runtime using runtime information CPSC330 CompOrg: Dr. Gerousis

pipelining 16

16

Control Hazards branch delayed decision n

n

n

To avoid a possible pipeline flush of an instruction as result of mis-predicted branch, we could insert “non-conditional” instructions in delayed branch slots Compilers and assemblers could handle this, inserting “nop”s whenever suitable instructions can’t be found MIPS actually implements delayed branches P rogr am e xe cu tion o rde r Time ( in instructions) beq $ 1, $ 2, 4 0

2

Ins truction fetch

ad d $4, $5 , $6 (Del ayed br anch slot) lw $3, 3 00 ($0 )

2 ns

4

Reg Instruction fetch

2 ns

6

ALU

Reg Instruction fetch

8

Data access ALU

Reg

10

12

14

Reg Data ac cess ALU

Reg Data access

Reg

2 ns

CPSC330 CompOrg: Dr. Gerousis

pipelining 17

17

Enhancing performance with pipelining Summary n

What makes it easy/simple – all instructions are the same length – just a few instruction formats – memory operands appear only in loads and stores

n

What makes it tough/challenging? – structural hazards: suppose we had only one memory – control hazards: need to worry about branch instructions – data hazards: an instruction needs data that is not yet available

CPSC330 CompOrg: Dr. Gerousis

pipelining 18

18

Comparing Performance (p.425) n

Compare the performance for single-cycle, multicycle, and pipeline control using the SPECint2000 instruction mix – – – – –

n

25% loads 10% stores 11% branches 2% jumps 52% ALU

The number of clock cycles for each instruction class: – – – – –

Loads: 5 Stores: 4 Branches: 3 Jumps: 3 ALU: 4 CPSC330 CompOrg: Dr. Gerousis

pipelining 19

19

Comparing Performance n

Start with performance of single-cycle machine: – 200 ps for memory access – 100 ps for ALU operation – 50 ps for register file read or write

n n n

n

What is the clock cycle time for single-cycle datapath? What is the average CPI for the multiple cycle design? What is the average CPI for the pipeline design? (Loads, stores, and ALU take 1 clock cycle. Branches take 1 clock cycles when predicted correctly and 2 when not. Jump CPI = 2) Note that the long cycle time of memory is a performance bottleneck for pipelined and multicycle design.

CPSC330 CompOrg: Dr. Gerousis

pipelining 20

20

Problem 6.3 n

Show the forwarding paths needed to execute the following four instructions: add $3, $4, $6 sub $5, $3, $2 lw $7, 100($5) add $8, $7, $2

CPSC330 CompOrg: Dr. Gerousis

pipelining 21

21

Problem 6.3 IM

Re g

IM

lw $7, 100($5)

DM

Re g

IM

add $8, $7, $2

DM

R eg

IM

add $8, $7, $2

add $3, $4, $6

R eg

Re g

DM

Re g

IM

sub $5, $3, $2 Re g

DM

Re g

CPSC330 CompOrg: Dr. Gerousis

Re g

DM

R eg

pipelining 22

22

Problem 6.22

CPSC330 CompOrg: Dr. Gerousis

pipelining 23

23

Problem 6.22

IM

Re g

IM

add $2, $3, $5

DM

Re g

IM

DM

Re g

lw $4, 100($2)

R eg

R eg

DM

CPSC330 CompOrg: Dr. Gerousis

sub $6, $4, $3 Re g

pipelining 24

24

Problem 6.22 - continued CC8

IM

Re g

IM

add $2, $3, $5

DM

Re g

IM

Re g

IM

lw $4, 100($2)

R eg

DM

Re g

CPSC330 CompOrg: Dr. Gerousis

Re g

DM

sub $6, $4, $3 Re g

pipelining 25

25

Advanced Pipelining n

n

n

Computer Organizations and Design covers advanced pipelining in 18 pages à Sections 6.9 – 6.10 Consult one of the advanced books, Computer Architecture: A Quantitative Approach (CPEN414) The Verilog Hardware Descriptive Language to describe a pipeline like that in the Pentium 4 will be on the order of thousands of lines.

CPSC330 CompOrg: Dr. Gerousis

pipelining 26

26

Advanced Pipelining Extracting More Performance n

n

n

Increase the depth of the pipeline DEC Alpha 21264: 9 stage pipeline Launch multiple instructions in every pipeline stage (multiple issue) – Replicate units A 6 GHz four-way multiple-issue with CPI=0.25; microprocessor can execute a peak of 24 billion instructions per second!

CPSC330 CompOrg: Dr. Gerousis

pipelining 27

27

Concluding Remark n

n

This chapter gives you the background you need to learn more! Next time we will start chapter 6: Memory

CPSC330 CompOrg: Dr. Gerousis

pipelining 28

28