Giving credit where credit is due

Giving credit where credit is due CSCE 230J Computer Organization Most of slides for this lecture are based on slides created by Dr. Bryant, Carnegi...
Author: Lawrence Terry
4 downloads 0 Views 129KB Size
Giving credit where credit is due

CSCE 230J Computer Organization

Most of slides for this lecture are based on slides created by Dr. Bryant, Carnegie Mellon University.

Processor Architecture V: Making the Pipelined Implementation Work

I have modified them and added new slides.

Dr. Steve Goddard [email protected]

http://cse.unl.edu/~goddard/Courses/CSCE230J

2

Overview

W_icode, W_valM

Pipeline Stages

W_valE, W_valM, W_dstE, W_dstM

W valM

Fetch

Make the pipelined processor work!

Data Data memory memory

M_icode, M_Bch, M_valA

Memory

Addr, Data

Select current PC Read instruction Compute incremented PC

Data Hazards Instruction having register R as source follows shortly after instruction having register R as destination Common condition, don’t want to slow down pipeline

M

Bch

valE

CC CC

Execute

ALU ALU

aluA, aluB

Decode

E

Read program registers

Control Hazards

valA, valB

Mispredict conditional branch

d_srcA, d_srcB

Decode

Execute

A

B

Register Register M file file E

Write back

Our design predicts all branches as being taken Naïve pipeline executes two extra instructions

Operate ALU

D

icode, ifun, rA, rB, valC

Memory

Getting return address for ret instruction

valP

valP

Instruction Instruction memory memory

Fetch

PC PC increment increment predPC

Read or write data memory

Naïve pipeline executes three extra instructions

PC

Making Sure It Really Works

f_PC

Write Back

What if multiple special cases happen simultaneously?

F

Update register file 3

4

Data Dependencies: 2 Nop’s

Write back

PIPE- Hardware

W

icode

valE

Pipeline registers hold intermediate values from instruction execution

Forward (Upward) Paths Values passed from one stage to next Cannot jump past stages

valM

dstE dstM

data out read

Mem. control write

Memory

Data Data memory memory data in

# demo-h2.ys

1

2

3

4

5

0x000: irmovl $10,%edx

F

D F

E D F

M E D F

W M E D F

Addr

icode

0x006: irmovl

M_valA

M_Bch

M

Bch

valE

valA

0x00d: nop

ALU fun.

ALU ALU

CC CC

$3,%eax

0x00c: nop

dstE dstM

e_Bch

0x00e: addl %edx,%eax ALU A

Execute

E

icode

ifun

ALU B

valC

0x010: halt

valA

valB

dstE dstM srcA

6

7

8

9

10

W M E D F

W M E D

W M E

W M

W

Cycle 6

srcB

d_srcA d_srcB

Select A

Decode

d_rvalA

A

dstE dstM srcA

srcB

W

W_valM

B

Register Register M file file

R[%eax] R[%eax] 3

W_valE

3

E

e.g., valC passes through decode

D

Fetch

icode

ifun

rA

rB

Instruction Instruction memory memory

valC

valP

PC PC increment increment

• • •

Predict PC

D

f_PC M_valA

Select PC

F

valA valB

W_valM

predPC

5

R[%edx] = 10 R[%eax] = 0

Error

6

Page 1

Data Dependencies: No Nop # demo-h0.ys

1

2

3

4

5

6

0x000: irmovl $10,%edx

F

D F

E D

M E

W M

W

F

D F

E D

M E

0x006: irmovl

$3,%eax

0x00c: addl %edx,%eax 0x00e: halt

7

Stalling for Data Dependencies

8

# demo-h2.ys

1

2

3

4

5

6

7

0x000: irmovl $10,%edx

F

D F

E D F

M E D

W M E

W M

W

F

D

E

M

W

D F

E D F

M E D

0x006: irmovl

W M

$3,%eax

0x00c: nop

W

0x00d: nop bubble

Cycle 4

F

0x00e: addl %edx,%eax

M

0x010: halt

8

9

10

11

W M E

W M

W

M_valE = 10 M_dstE = %edx

If instruction follows too closely after one that writes register, slow it down Hold instruction in decode Dynamically inject nop into execute stage

E e_valE 0 + 3 = 3 E_dstE = %eax

D valA valB

Error

R[%edx] = 0 R[%eax] = 0

7

8

Write back

Stall Condition

W

icode

valE

valM

dstE

Detecting Stall Condition

dstM

data out read

Mem. control write

Memory

Source Registers

Data Data memory memory

srcA and srcB of current instruction in decode stage

M

M_valA

icode

Bch

valE

valA

dstE

1

2

3

4

5

6

0x000: irmovl $10,%edx

F

D F

E D F

M E D F

W M E D

W M E

0x006: irmovl

dstM

ALU fun.

ALU ALU ALU A

Execute

$3,%eax

0x00c: nop

e_Bch

CC CC

Destination Registers

# demo-h2.ys

0x00d: nop

ALU B

bubble

F

0x00e: addl %edx,%eax

dstE and dstM fields Instructions in execute, memory, and write-back stages

0x010: halt E

7

8

9

10

11

W M E D F

W M E D

W M E

W M

W

data in

Addr M_Bch

icode

ifun

valC

valA

valB

dstE

dstM srcA

srcB

D F

d_srcA d_srcB d_rvalA

Select A

Decode

A

dstE

dstM srcA

srcB

Cycle 6

W_valM

B

Register RegisterM file file

W

W_valE

E

D

Special Case Don’t stall for register ID 8

icode

ifun

rA

rB

valC

Instruction Instruction memory memory

Fetch

W_dstE = %eax W_valE = 3

valP Predict PC

PC PC increment increment

• • •

f_PC M_valA

Select PC

Indicates absence of register operand

F

D

W_valM

predPC

srcA = %edx srcB = %eax 9

10

Stalling X3

What Happens When Stalling? 1

2

3

4

5

F

D F

E D

M E

W M E

# demo-h0.ys 0x000: irmovl $10,%edx 0x006: irmovl

$3,%eax

bubble bubble

6

F

0x00e: halt

D F

D F

8

9

10

11

# demo-h0.ys

W M E

bubble 0x00c: addl %edx,%eax

7

D F

W M E D F

0x006: irmovl

W M E D

0x00e: halt

W M

E

• • • D

E_dstE = %eax

D srcA = %edx srcB = %eax

srcA = %edx srcB = %eax

0x000: irmovl 0x006: bubble $10,%edx $3,%eax 0x000: irmovl 0x006: bubble $10,%edx $3,%eax 0x006: addl 0x00c: irmovl bubble %edx,%eax $3,%eax 0x00c: addl 0x00e: halt %edx,%eax 0x00e: halt

Stalling instruction held back in decode stage Following instruction stays in fetch stage Bubbles injected into execute stage

W_dstE = %eax

Like dynamically generated nop’s Move through later stages

M M_dstE = %eax

Write Back Memory Execute Decode Fetch

W

W

Cycle 4

$3,%eax

0x00c: addl %edx,%eax

W M E

Cycle 6 Cycle 5

Cycle 8 4 5 6 7

0x000: irmovl $10,%edx

• • • D srcA = %edx srcB = %eax

11

12

Page 2

Implementing Stalling

Pipeline Register Modes

W_dstM W_dstE

W

icode

valE

valM

Input = y

dstE dstM

Rising clock

Output = x

Output = y

x

Normal

M_dstM

y

M_dstE

M

icode

Bch

valE

valA

stall =0

dstE dstM

bubble =0

E_dstM

Pipe control logic

E_dstE E_bubble

E

icode ifun

valC

valA

valB

dstE dstM srcA

srcB

Input = y

Rising clock

Output = x

Output = x

d_srcB d_srcA D_icode

D_stall

F_stall

D

Stall

srcB

x

x

srcA icode ifun

F

rA

rB

valC

stall =1

valP

bubble =0

Rising clock

predPC

Input = y

Pipeline Control

Bubble

Combinational logic detects stall condition Sets mode signals for how pipeline registers should update

Output = x

stall =0

Output = nop

n o p

x bubble =1

13

14

Data Forwarding

Data Forwarding Example # demo-h2.ys 0x000: irmovl $10,%edx

Naïve Pipeline

0x006: irmovl

Register isn’t written until completion of write-back stage Source operands read from register file in decode stage

$3,%eax

1

2

3

4

5

6

F

D F

E D

M E

W M

W

F

D F

E D F

0x00c: nop 0x00d: nop 0x00e: addl %edx,%eax

Needs to be in register file at start of stage

0x010: halt

Observation Trick Pass value directly from generating instruction to decode stage Needs to be available at end of decode stage

8

9

10

M E D

W M E

W M

W

F

D

E

M

W R[%eax]

W_dstE = %eax W_valE = 3



3

• • • D srcA = %edx srcB = %eax

valA valB 



R[%edx] = 10 W_valE = 3

15

valM

Data Forwarding Example #2

W_valE W_valM

W

Decode Stage

m_valM

Data Data memory memory

icode, M_Bch, valA

# demo-h0.ys

1

2

3

4

5

6

0x000: irmovl $10,%edx

F

D F

E D

M E

W M

W

F

D

E

M

W

F

D

E

M

0x006: irmovl

$3,%eax

0x00c: addl %edx,%eax Addr, Data

0x00e: halt

M_valE M

Register %edx

e_valE

Bch CC CC

Generated by ALU during previous cycle Forward from memory as valA

ALU ALU

E_valA, E_valB, E_srcA, E_srcB

Forwarding Sources Execute: valE Memory: valE, valM Write back: valE, valM

16

W_valE, W_valM, W_dstE, W_dstM

Bypass Paths Forwarding logic selects valA and valB Normally from register file Forwarding: get valA or valB from later pipeline stage

Value just generated by ALU Forward from execute as valB

valA, valB Forward

d_srcA, d_srcB

A

B

Register Register M file file E

Write back

7

8

W

Cycle 4 M M_dstE = %edx M_valE = 10

E

Register %eax

E

D

W

Cycle 6

irmovl in writeback stage Destination value in W pipeline register Forward as valB for decode stage

Value generated in execute or memory stage

7

E_dstE = %eax e_valE 0 + 3 = 3 

D srcA = %edx srcB = %eax

valA valB 



M_valE = 10 e_valE = 3

valP

17

18

Page 3

Implementing Forwarding

W_valE

Write back W_valM

W

icode

valE

valM

dstE dstM

data out

W_valM

Data Data memory memory

write

Memory

Add additional feedback paths from E, M, and W pipeline registers into decode stage Create logic blocks to select from multiple sources for valA and valB in decode stage

data in

Addr

M

icode

M_valA

M_valE

Bch

valE

valA

dstE dstM

e_Bch e_valE

ALU ALU

CC CC

E

ALU fun.

ALU A

Execute icode ifun

ALU B

valC

W_valE

m_valM

valE

read

Mem. control

M_Bch

Implementing Forwarding

valA

valB

valM

## What should be the A value? int new_E_valA = [ # Use incremented PC D_icode in { ICALL, IJXX } : D_valP; # Forward valE from execute d_srcA == E_dstE : e_valE; # Forward valM from memory d_srcA == M_dstM : m_valM; # Forward valE from memory d_srcA == M_dstE : M_valE; # Forward valM from write back d_srcA == W_dstM : W_valM; # Forward valE from write back d_srcA == W_dstE : W_valE; # Use value read from register file 1 : d_rvalA; ];

dstE dstM

data out

m_valM

Data Data memory memory data in

Addr M_valA

valE

valA

dstE dstM e_valE

ALU ALU

ALU fun. ALU B

dstE dstM srcA srcB d_srcA d_srcB

dstE dstM srcA srcB

valA

valB

dstE dstM srcA

srcB

d_srcA d_srcB

Sel+Fwd A

Decode

Fwd B

A

dstE dstM srcA

Sel+Fwd A

W_valM

B

Register M Register file file

W_valE

E

A

D

icode ifun

rA

rB

valC

srcB

Fwd B

W_valM

B

Register Register M file file E

valP

W_valE

19

20

Avoiding Load/Use Hazard

Limitation of Forwarding # demo-luh.ys

1

2

F 0x006: irmovl $3,%ecx 0x00c: rmmovl %ecx, 0(%edx)

3

D F

4

5

6

7

8

9

E D

M E

W M

W

F

D

E

M

W

0x012: irmovl $10,%ebx F 0x018: mrmovl 0(%edx),%eax # Load %eax

D F

E D

M E

W M

W

F

D F

E D

M E

0x000: irmovl $128,%edx

0x01e: addl %ebx,%eax # Use %eax

1

2

3

4

5

0x000: irmovl $128,%edx 0x006: irmovl $3,%ecx

F

D

E

M

W

F

D

E

M

W

F D 0x012: irmovl $10,%ebx F 0x018: mrmovl 0(%edx),%eax # Load %eax

E D F

M

W M

0x01e: addl %ebx,%eax # Use %eax

W

7

8

M E

W M

W

D

E

M

W

E

M

W

F

D F

D F

E D

M E

0x020: halt

Stall using instruction for one cycle Can then pick up loaded value by forwarding from memory stage

M

M_dstE = %ebx M_valE = 10

6

bubble

Cycle 8

Cycle 7

Value needed by end of decode stage in cycle 7 Value read from memory in memory stage of cycle 8

# demo-luh.ys

M_dstM = %eax m_valM M[128] = 3

M_valE = 10 R[%eax] = 0

M

• • •

Error

D valA valB

# demo-luh.ys

icode

ifun

valC

valA

valB

dstE

dstM

srcA

srcB

d_srcA d_srcB

dstE

Sel +Fwd A

Decode

D

icode

Condition Load/Use Hazard

rA

rB

valC

srcA

srcB

Fwd B

A

ifun

dstM

B

Register RegisterM file file E

1

2

3

4

irmovl $128,%edx F D E M irmovl $3,%ecx F D E rmmovl %ecx, 0(%edx) F D irmovl $10,%ebx F mrmovl 0(%edx),%eax # Load %eax bubble 0x01e: addl %ebx,%eax # Use %eax 0x020: halt 0x000: 0x006: 0x00c: 0x012: 0x018:

e_valE

E

W_valE = 10 m_valM = 3 



22

Control for Load/Use Hazard

ALU fun. ALU B

W



e_Bch

ALU A

W M

M_dstM = %eax m_valM M[128] = 3

Detecting Load/Use Hazard CC CC

12

W

21

Execute

11

Cycle 8

D

ALU ALU

10

W_dstE = %ebx W_valE = 10

• • •

valA valB

9

11

0x00c: rmmovl %ecx, 0(%edx)

0x020: halt

Load-use dependency

10

5

6

7

W M E D F

W M E D

W M E

F

D F

8

9

10

11

W M E D F

W M E D

W M E

W M

12

W

Stall instructions in fetch and decode stages Inject bubble into execute stage

W_valM

W_valE

valP

Trigger

Condition

E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }

Load/Use Hazard

23

F

D

E

M

W

stall

stall

bubble

normal

normal

24

Page 4

Handling Misprediction

Branch Misprediction Example demo-j.ys

# demo-j.ys

1

2

3

4

5

0x000:

xorl %eax,%eax

F

D

E

M

W

0x002:

jne target # Not taken

F

D F

E D

M

W

E

M

W

D F

E D F

0x011: t: irmovl $2,%edx # Target

0x000: xorl %eax,%eax 0x002: jne t 0x007: irmovl $1, %eax 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx 0x017: irmovl $4, %ecx 0x01d: irmovl $5, %edx

bubble

# Not taken # Fall through

0x017:

6

7

8

9

M E

W M

W

D

E

M

10

F

irmovl $3,%ebx # Target+1 bubble

0x007:

irmovl $1,%eax # Fall through

0x00d:

nop

W

Predict branch as taken

# Target (Should not execute) # Should not execute # Should not execute

Fetch 2 instructions at target

Cancel when mispredicted Should only execute first 8 instructions

Detect branch not-taken in execute stage On following cycle, replace instructions in execute and decode by bubbles No side effects have occurred yet 25

26

Control for Misprediction

Detecting Mispredicted Branch M

icode

Bch

valE

valA

dstE

# demo-j.ys

1

2

3

4

5

0x000:

xorl %eax,%eax

F

D

E

M

W

0x002:

jne target # Not taken

F

D

E

M

W

F

D E

M

W

D F

E D F

dstM

e_Bch

6

7

8

9

M E

W M

W

D

E

M

10

e_valE

ALU ALU

CC CC ALU A

Execute E

Condition

icode

0x011: t: irmovl $2,%edx # Target

ALU fun.

bubble

ALU B

0x017: ifun

valC

valA

valB

dstE

dstM

srcA

F

irmovl $3,%ebx # Target+1 bubble

srcB

0x007:

irmovl $1,%eax # Fall through

0x00d:

nop

W

Trigger

Mispredicted Branch E_icode = IJXX & !e_Bch

Condition

F

Mispredicted Branch normal

D

E

M

W

bubble

bubble

normal

normal

27

28

Correct Return Example

demo-retb.ys

Return Example

# demo-retb 0x026:

0x000: 0x006: 0x00b: 0x011: 0x020: 0x020: 0x026: 0x027: 0x02d: 0x033: 0x039: 0x100: 0x100:

irmovl Stack,%esp call p irmovl $5,%esi halt .pos 0x20 p: irmovl $-1,%edi ret irmovl $1,%eax irmovl $2,%ecx irmovl $3,%edx irmovl $4,%ebx .pos 0x100 Stack:

ret

F

bubble

# Initialize stack pointer # Procedure call # Return point

D F

bubble bubble

0x00b:

irmovl $5,%esi # Return

E D F

M E D F

W M E D F

W M E D

W M E

W M

W

# procedure # # # #

Should Should Should Should

not not not not

be be be be

As ret passes through pipeline, stall at fetch stage

executed executed executed executed

While in decode, execute, and memory stage

Inject bubble into decode stage Release stall when reach write-back stage

# Stack: Stack pointer

Previously executed three additional instructions

29

W valM = 0x0b

• • • F valC 5 rB %esi

30

Page 5

Detecting Return

Control for Return # demo-retb

M

icode

Bch

valE

valA

dstE dstM

0x026:

F

ret

D

E

M

W

F

D F

E D

M E

W M

W

F

D

E

M

W

F

D

E

M

e_Bch e_valE

ALU ALU

CC CC ALU A

ALU B

valC

valA

Execute

bubble

ALU fun.

bubble bubble

0x00b: E

icode

ifun

valB

irmovl $5,%esi # Return

dstE dstM srcA srcB

W

d_srcA d_srcB

dstE dstM srcA srcB

Condition Sel+Fwd A

Decode

D

icode

Fwd B

A

Processing ret

rA

rB

valC

D

E

M

W

bubble

normal

normal

normal

W_valM

B

Register Register M file file E

ifun

F stall

W_valE

valP

Condition

Trigger

Processing ret

IRET in { D_icode, E_icode, M_icode }

31

32

Special Control Cases

Implementing Pipeline Control

Detection

W

icode

valE

valM

dstE dstM

valE

valA

dstE dstM

M_icode

Condition

Trigger

Processing ret

IRET in { D_icode, E_icode, M_icode }

Load/Use Hazard

E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }

M

icode

Bch

e_Bch

E_dstM E_icode

Mispredicted Branch E_icode = IJXX & !e_Bch

Pipe control logic

Action (on next cycle) Condition

CC CC

E_bubble

E

icode

ifun

icode

ifun

valC

valA

valB

valC

valP

dstE dstM srcA

d_srcA

srcB

D_icode

F

D

E

M

W

Processing ret

stall

bubble

normal

normal

normal

Load/Use Hazard

stall

stall

bubble

normal

normal

bubble

bubble

normal

normal

Mispredicted Branch normal

srcB

d_srcB

srcA

D_bubble D_stall

D

F_stall

F

rA

rB

predPC

Combinational logic generates pipeline control signals Action occurs at start of following cycle 33

34

Initial Version of Pipeline Control

Control Combinations Load/use

bool F_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode };

M E D

Load Use

ret 1

Mispredict M E D

M JXX

E D

ret

ret 3

ret 2 M E D

M

ret

E D

bubble

ret bubble

bubble

Combination A

bool D_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };

Combination B

bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode };

Special cases that can arise on same clock cycle

Combination A Not-taken branch ret instruction at branch target

bool E_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};

Combination B Instruction that reads from memory to %esp Followed by ret instruction 35

36

Page 6

Control Combination A M E D

Control Combination B

ret 1

Mispredict M

Fetch

M_valA

E ret D Combination A

Processing ret

M E D

f_PC

JXX

Condition

PC PC increment increment

Select PC

F

ret 1

Load/use

Predict PC

Instruction Instruction memory memory

W_valM

predPC

M Load Use

E D

ret

Combination B

F

D

E

M

W

F

D

E

M

W

stall

bubble

normal

normal

normal

Processing ret

Condition

stall

bubble

normal

normal

normal

stall

bubble

normal

normal

bubble + bubble stall

normal

normal

Mispredicted Branch normal

bubble

bubble

normal

normal

Load/Use Hazard

stall

Combination

bubble

bubble

normal

normal

Combination

stall

stall

Should handle as mispredicted branch Stalls F pipeline register But PC selection logic will be using M_valM anyhow

Would attempt to bubble and stall pipeline register D Signaled by processor as pipeline error

37

38

Handling Control Combination B M E D

Corrected Pipeline Control Logic

ret 1

Load/use M E D

Load Use

bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Stalling at fetch while ret passes IRET in { D_icode, E_icode, M_icode # but not condition for a load/use && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB

ret

Combination B

Condition

F

D

E

M

W

Processing ret

stall

bubble

normal

normal

normal

Load/Use Hazard

stall

stall

bubble

normal

Combination

stall

stall

bubble

normal

Condition

through pipeline } hazard });

F

D

E

M

W

Processing ret

stall

bubble

normal

normal

normal

normal

Load/Use Hazard

stall

stall

bubble

normal

normal

normal

Combination

stall

stall

bubble

normal

normal

Load/use hazard should get priority ret instruction should be held in decode stage for additional cycle

Load/use hazard should get priority ret instruction should be held in decode stage for additional cycle 39

40

Pipeline Summary Data Hazards Most handled by forwarding No performance penalty

Load/use hazard requires one cycle stall

Control Hazards Cancel instructions when detect mispredicted branch Two clock cycles wasted

Stall fetch stage while ret passes through pipeline Three clock cycles wasted

Control Combinations Must analyze carefully First version had subtle bug Only arises with unusual instruction combination 41

Page 7

Suggest Documents