Introduction to Pipelining: Datapath

Laboratorio de Tecnologías de Información Introduction to Pipelining: Datapath Arquitectura de Computadoras Arturo Díaz Pérez Centro de Investigación...
Author: Joy Hines
31 downloads 0 Views 1MB Size
Laboratorio de Tecnologías de Información

Introduction to Pipelining: Datapath Arquitectura de Computadoras Arturo Díaz Pérez Centro de Investigación y de Estudios Avanzados del IPN Laboratorio de Tecnologías de Información [email protected]

Arquitectura de Computadoras

Pipelining- 1

Pipelining

Laboratorio de Tecnologías de Información

A way of exploiting instruction level parallelism 1 2 3 4

instrns

time

1

2

3

Throughput

4 1

2

3

4 1

2

3

4

Latency

instrns

time

1

Throughput

2

3

4

1

2

3

4

1

2

3

4 Latency

Arquitectura de Computadoras

Pipelining- 2

Observations

Laboratorio de Tecnologías de Información

♦ Pipelining doesn’t help latency of a single task, it ♦ ♦ ♦ ♦ ♦

helps throughput of the entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup Arquitectura de Computadoras

Pipelining- 3

5 Steps of DLX Datapath Instruction Fetch

Instruction Decode/ Register Fetch

Execute Addr. Calc.

Memory Access

Laboratorio de Tecnologías de Información

Write Back

M u x

4

Add

Zero ? NPC

A PC

IR Inst. Memory

Registers B

16

Arquitectura de Computadoras

Sign Extend

32

M u x M u x

Add Data LM ALU Output Memory D SM D

M u x

Pipelining- 4

Pipelined DLX Datapath

Laboratorio de Tecnologías de Información

Data stationary control - local decode for each phase / pipeline stage Arquitectura de Computadoras

Pipelining- 5

Visualizing Pipelining

Arquitectura de Computadoras

Laboratorio de Tecnologías de Información

Pipelining- 6

Single Cycle, Multiple Cycle, vs. Pipeline Cycle 1

Cycle 2

Clk Single Cycle Implementation: Load

Waste R-type

Store

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load Ifetch Reg Exec Mem

Wr

Store Ifetch

Reg

Pipeline Implementation: Load Ifetch

Reg

Store Ifetch

Exec

Mem

Wr

Reg

Exec

Mem

R-type Ifetch

Reg

Exec

Wr Mem

Wr

Exec

Mem

R-type Ifetch

Why Pipeline?

Laboratorio de Tecnologías de Información

♦ Suppose we execute 100 instructions ♦ Single Cycle Machine ■ 45 ns/cycle x 1 CPI x 100 inst = 4500 ns

♦ Multicycle Machine ■ 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns

♦ Ideal pipelined machine ■ 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

Arquitectura de Computadoras

Pipelining- 8

Limits to Pipelining

Laboratorio de Tecnologías de Información

♦ Its not that easy for computers ♦ Limits to pipelining: Hazards prevent next instruction from

executing during its designated clock cycle ■ Structural hazards: HW cannot support this combination of instructions ■ Data hazards: instruction depends on result of prior instruction still in the pipeline ■ Control hazards: pipelining of branches & other instructions that change the PC

♦ Common solution is to stall the pipeline until the hazard is

resolved, inserting one or more “bubbles” in the pipeline Arquitectura de Computadoras

Pipelining- 9

1 Memory is Structural Hazard

Laboratorio de Tecnologías de Información

Detection is easy in this case! (right half highlight means read, left half write) Arquitectura de Computadoras

Pipelining- 10

1 Memory is Structural Hazard

Arquitectura de Computadoras

Laboratorio de Tecnologías de Información

Pipelining- 11

Data Hazard on r1

Laboratorio de Tecnologías de Información

• Dependencies backwards in time are hazards Time (clock cycles) IF

Arquitectura de Computadoras

Dm

Im

Reg

Dm

Im

Reg

Dm

Im

Reg

Dm

Im

Reg

ALU

xor r10,r1,r11

Reg

ALU

or r8,r1,r9

WB

ALU

and r6,r1,r7

MEM

ALU

O r d e r

sub r4,r1,r3

Im

EX ALU

I n s t r.

add r1,r2,r3

ID/RF

Reg Reg

Reg

Reg

Dm

Reg

Pipelining- 12

Data Hazard Solution

Laboratorio de Tecnologías de Información

• “Forward” result from one stage to another Time (clock cycles) IF

Dm

Im

Reg

Dm

Im

Reg

Dm

Im

Reg

Dm

Im

Reg

ALU

xor r10,r1,r11

Reg

ALU

or r8,r1,r9

WB

ALU

and r6,r1,r7

MEM

ALU

O r d e r

sub r4,r1,r3

Im

EX ALU

I n s t r.

add r1,r2,r3

ID/RF

Reg

Reg

•“or” OK if define read/write properly

Arquitectura de Computadoras

Reg

Reg

Dm

Reg

Pipelining- 13

3 Generic Data Hazards

Laboratorio de Tecnologías de Información

♦ Instri followed by Instrj ♦ Read After Write (RAW) ■ Instrj tries to read operand before Instri writes it ♦ Write After Read (WAR) ■ Instrj tries to write operand before Instri reads it ■ Can’t happen in DLX 5 stage pipeline because » all instructions take 5 stages » reads are always in stage 2, and » writes are always in stage 5

♦ Write After Write (WAW) ■ Instrj tries to write operand before Instri writes it » Leaves wrong result (Instri not Instrj)

■ Can’t happen in DLX 5 stage pipeline because » all instructions take 5 stages » writes are always in stage 5 Arquitectura de Computadoras

Pipelining- 14

Control Hazard: Wait I n s t r.

Laboratorio de Tecnologías de Información

Time (clock cycles)

Mem

Reg

Reg

Mem

Reg

Lost potential

Mem

Reg

Load

ALU

Mem ALU

O r d e r

Beq

Reg

ALU

Add

Mem

Mem

Reg

♦ Stall: wait until decision is clear ♦ Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) =>

slow ♦ Move decision to end of decode ■ save 1 cycle per branch

Arquitectura de Computadoras

Pipelining- 15

Control Hazard: Predict



Beq Load

Reg

Mem

Mem

Reg

Reg

Mem

Reg

Mem

Reg

ALU



Add

Mem

ALU

O r d e r

Time (clock cycles) ALU

I n s t r.

Laboratorio de Tecnologías de Información

Mem

Reg

Predict: guess one direction then back up if wrong Impact: 0 lost cycles per branch instruction if right, 1 if wrong (right - 50% of time) ■ Need to “Squash” and restart following instruction if wrong ■ Produce CPI on branch of (1 *.5 + 2 * .5) = 1.5 ■ Total CPI might then be: 1.5 * .2 + 1 * .8 = 1.1 (20% branch)

Arquitectura de Computadoras



More dynamic scheme: history of 1 branch (- 90%)

Pipelining- 16

Control Hazard: Delayed Branch

Misc Load

Mem

Mem

Reg

Reg

Mem

Reg

Mem

Reg

Mem

Reg

Mem

Reg

ALU

Beq

Reg

ALU

Add

Mem

ALU

O r d e r

Time (clock cycles) ALU

I n s t r.

Laboratorio de Tecnologías de Información

Mem

Reg

♦ Delayed Branch: Redefine branch behavior (takes place after next

instruction) ♦ Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” (- 50% of time) ♦ As launch more instruction per clock cycle, less useful Arquitectura de Computadoras

Pipelining- 17

Forwarding to Avoid Data Hazard

Arquitectura de Computadoras

Laboratorio de Tecnologías de Información

Pipelining- 18

Data Hazard even with Forwarding

Laboratorio de Tecnologías de Información

Arquitectura de Computadoras

Pipelining- 19

Forwarding and Loads

Laboratorio de Tecnologías de Información

• Dependencies backwards in time are hazards Time (clock cycles) IF

MEM

Reg

Dm

Im

Reg

ALU

sub r4,r1,r3

Im

EX ALU

lw r1,0(r2)

ID/RF

WB Reg

Dm

Reg

Can’t solve with forwarding: Must delay/stall instruction dependent on loads Arquitectura de Computadoras

Pipelining- 20

Forwarding and Loads

Laboratorio de Tecnologías de Información

Dependencies backwards in time are hazards Time (clock cycles) IF

Reg

Stall

MEM

WB

Dm

Reg

Im

Reg

ALU

sub r4,r1,r3

Im

EX ALU

lw r1,0(r2)

ID/RF

Dm

Reg

Can’t solve with forwarding: Must delay/stall instruction dependent on loads Arquitectura de Computadoras

Pipelining- 21

Designing a Pipelined Processor

Laboratorio de Tecnologías de Información

♦ Go back and examine your datapath and control diagram ♦ associated resources with states ♦ ensure that flows do not conflict, or figure out how to

resolve ♦ assert control in appropriate stage

Arquitectura de Computadoras

Pipelining- 22

Control and Datapath: Split state diag into 5 pieces IR