Recovery Mechanism for Latency Misprediction

Recovery Mechanism for Latency Misprediction Recovery Mechanism for Latency Misprediction Enric Morancho, José María Llabería and Àngel Olivé Departa...
Author: Alan Lamb
8 downloads 0 Views 50KB Size
Recovery Mechanism for Latency Misprediction

Recovery Mechanism for Latency Misprediction Enric Morancho, José María Llabería and Àngel Olivé Departament d'Arquitectura de Computadors Universitat Politècnica de Catalunya - Spain

Work supported by the Ministry of Education and Science of Spain (TIC98-0511-C02-01)

Departament d'Arquitectura de Computadors

1

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Motivation • High-performance processors demand back-to-back execution of dependent instructions R1 ← ... ... ← R1

...

IQ ...

R IQ

exe R

W exe

W

❑ Source-instruct. latency must be known on its issue cycle • Load instructions have unknown latency ❑ Delaying the issue of depend. instructions degrades IPC: ✓ Hit latency 3 cycles: 6% (+1), 11% (+2), 16% (+3) ❑ Back-to-back execution is achieved by: ✓ Latency prediction (hit) ✓ Speculative scheduling of dependent instructions ✓ Recovery mechanism on latency mispredictions Departament d'Arquitectura de Computadors

2

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Outline • Terminology • Processor Model • Recovery Mechanisms ❑ Issue-Queue Mechanism ❑ Recovery-Buffer Mechanism • Methodology and Results • Conclusions

Departament d'Arquitectura de Computadors

3

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Terminology r1 ← load ... (latency predicted)

1 IQ

2 R

3 @

4 M

5 M/TC

IQ

R

exe

W

IQ

R

exe

W

IQ

R

exe

W

IQ

R

exe

r2 ← r1 (1-cycle latency) r3 ← r2 IW

6 ...

7

SW

• Independent Window (IW): interval where issued instructions are independent on the latency-predicted load instruction • Speculative Window (SW): interval between issuing the first instr. potentially dependent on the load and tag-checking (TC) • Verification Delay: duration of the Speculative Window ❑ Constant value Departament d'Arquitectura de Computadors

4

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Tasks of a Recovery Mechanism • Nullify some instructions issued during the Speculative Window ❑ Remain marked as uncompleted in the Reorder Buffer ❑ Nullification policy: ✓ Non-selective: all instructions ✓ Selective: only dependent instructions • Sleep instrs. dependent on mispredicted and nullified instrs. • Re-issue nullified instructions ❑ Independent instructions: on next cycles ❑ Dependent instructions: on data availability • Keep speculatively issued instructions in a storage structure Departament d'Arquitectura de Computadors

5

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Ex.: Non-selective Recovery Mechanism a

v

issued instructs.

1

2

3

4

a, v

IQ

R

@

M M/TC ...

IQ

R

exe

W

IQ

R

exe

W

IQ

R



b,y nullified

IQ



c,z no issued

w

w

x

x

b

y

b, y

c

z

c, z

5

y

6

IQ

z

7 a mispredicted

R

...

IQ

...

y re-issued

b,c,z slept Lost Cycles

Departament d'Arquitectura de Computadors

6

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Processor Model (1/2) • Pipeline stages: Fetch

Decode/ Rename

Issue Queue

Register Execute Read

Write

Commit

❑ Instructions are extracted from IQ after issuing them ✓ Issue-Queue capacity < Reorder-Buffer capacity • Latency predictor: ❑ Always predicts cache-hit latency

Departament d'Arquitectura de Computadors

7

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Processor Model (2/2) • Structure of the Issue Queue Dependence Matrix ready0

instruction0 . . .

. . .

instructionm R0

R1

Rn ...

latency counters

readym

Register Scoreboard Circuit

❑ Rows: related to queued instructions ❑ Columns: related to physical registers, mark data availability. Columns are set by latency counters ❑ Ready bits are evaluated every cycle Departament d'Arquitectura de Computadors

8

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Storage for speculative instructions • Evaluated structures ❑ Issue Queue ❑ Recovery Buffer

Departament d'Arquitectura de Computadors

9

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Issue-Queue Mechanism • Issued instructions are made non-visible to the select logic ❑ non-request bits are added to the IQ entries misprediction remove Removal Circuit

issued • • • entries

non-visible/visible • • • •

• •

ready



b)

a)

b) •••









a) ready • no-request b) selected

Register Scoreboard Circuit activation of latency counters

Select Logic





non-request



Destination Register

Dependence Matrix

latencypredicted

latency-predicted result available misprediction

• On mispredictions, unsets columns / makes instructions visible • non-visible instructs. are extracted after Verification Delay cycles Departament d'Arquitectura de Computadors

10

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Recovery-Buffer Mechanism (1/3) • Keeping speculatively-issued instructions in the Issue Queue reduces its ability to look-ahead for independent instructions • RB keeps issued instructs while they can be nullified ❑ As soon as an instruction is issued, it is extracted from IQ ❑ A RB entry contains all the instructions issued concurr. ❑ Instructions are ordered in issue-cycle order Fetch

Decode/ Rename

Register Execute Read

Issue Queue

Recovery Buffer (RB)

Departament d'Arquitectura de Computadors

11

Write

Commit

correct/mispredicted result available

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Recovery-Buffer Mechanism (2/3) • On a misprediction: ❑ Nullifies issued instructions dependent on the mispredict. ❑ Sleeps instructions dependent on the misprediction ✓ Same operations than in IQ mechanism ❑ Nullified instructions are kept in the RB ✓ An entry range is related to every misprediction Verification Delay -1 entries Recovery Buffer Misprediction Buffer

✓ The scheduling recorded in RB is valid Re-issue does not need to account latencies Departament d'Arquitectura de Computadors

12

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Recovery-Buffer Mechanism (3/3) • Re-issue performed from the RB ❑ Checks a RB entry per cycle ❑ Also issues instructions from IQ in the free issue slots ❑ Wakes-up dependent instructions recorded in IQ •



Select Logic IQ RB







• • • •••

Register Scoreboard Circuit

destination registers

latency-predicted

Departament d'Arquitectura de Computadors

To Execution Pipelines and to Recovery Buffer

• •••

mispredictions

•••



ready



Destination Register

Dependence Matrix



•••

IQ RB

13

Recovery Buffer (RB)

mispredictions latency-predicted

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Example: RB with selective nullification issued 1 instructs. a a IQ m n b b o c c d d p

2

3

R IQ

@ R IQ

4

5

6

7

M M TC exe W R exe W IQ R exe W IQ R exe IQ R IQ

8

i

• • • ✗ W ✗ ✗ IQ

RB

i+2

M

W

R

exe

W

RB

R IQ

R

exe

IQ

Recovery Buffer:

(b,o,c)

14

i+3

• • •

x

Departament d'Arquitectura de Computadors

i+1

(b,-,c)

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Methodology • Cycle-by-cycle simulation of SPEC-95 benchmarks • Simulations performed: ❑ 4-way processor ✓ first-level cache latency: 2 cycles ❑ Issue-Queue size: ✓ 15, 20 and 25-entry integer IQ's ✓ 10, 15 and 20-entry floating-point IQ's ❑ Verification Delay: 2, 3 and 4 cycles

Departament d'Arquitectura de Computadors

15

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Evaluated mechanisms Storage Structure Nullification policy

Issue Queue IQNS IQS

Non selective Selective

Recovery Buffer RBNS RBS

• After issuing an instruction, extracting it from IQ is delayed: ❑ IQNS: Verification-Delay cycles ❑ IQS: 1 cycle (to decide dependencies) ❑ RBNS and RBS: 0 cycles

Departament d'Arquitectura de Computadors

16

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Results: Integer benchmarks (1/2) • Sensitivity to the verification delay ❑ IQNS and IQS: function of Issue-Queue size ✓ significant for 15-entry Issue Queue ❑ RBNS and RBS: almost independent 20-entry integer issue-queue

25-entry integer issue-queue

1.9

1.9

1.8

1.8

1.8

1.7

1.7

1.7

IPC

1.9

IPC

IPC

15-entry integer issue-queue

1.6

1.6

1.6

1.5

1.5

1.5

1.4 verif=2

1.4 verif=2

1.4 verif=2

verif=3

verif=4

verif=3

verif=4

RBS RBNS IQS IQNS

verif=3

verif=4

• For 25-entry Issue Queues, IQS & RBNS are almost equivalent Departament d'Arquitectura de Computadors

17

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Results: Integer benchmarks (2/2) • Sensitivity to Issue-Queue size ❑ RBNS and RBS allow Issue-Queue size reductions respect IQNS and IQS around 25% 4-cycle verification delay

3-cycle verification delay 1.9

1.8

1.8

1.7

1.7

IPC

IPC

1.9

1.6

1.6

1.5

1.5

1.4 iq=15

Departament d'Arquitectura de Computadors

iq=20

1.4 iq=15

iq=25

18

RBS RBNS IQS IQNS

iq=20

iq=25

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Results: Floating-Point benchmarks (1/2) • Present different behaviour than integer benchmarks ❑ Latencies forbid the existence of a chain of dependent instructions larger than one in the Speculative Window Fr1 ← load...

Fr2 ← float (Fr1, ...)

IQ

R

@

M

IQ

R

...

IQ

R

...

IQ

R

exe

exe

IQ

R

...

IQ

R

...

IQ

R

...

IQ

R

Fr3 ← float (Fr2, ...)

Departament d'Arquitectura de Computadors

19

M

TC

exe

exe

W

exe

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Results: Floating-Point benchmarks (2/2) • Sensitivity to the verification delay ❑ Non-selective mechanisms present degradation ✓ Most nullified instructions are independent on the misprediction: 85% (floating-point) versus 53% (integer) ❑ Selective mechanisms are almost independent 20-entry floating-point issue-queue 2.2

IPC

2.1 2 RBS RBNS IQS IQNS

1.9 1.8 1.7 verif=2

Departament d'Arquitectura de Computadors

verif=3

20

verif=4

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Conclusions • The Recovery Buffer increases the capacity of the scheduler to look-ahead for independent instructions • Results depend on the dominating instruction latency ❑ Integer benchmarks: (1-cycle latency) ✓ Recovery-Buffer mechanisms are less sensitive to the Verification Delay than IQ mechanisms ✓ Recovery-Buffer mechanisms allows a reduction in the Issue-Queue size around 25% ✓ The Recovery-Buffer mechanism with non-selective nullification policy is an attractive alternative ❑ Floating-point benchmarks: (4-cycle latency) ✓ Non-selective nullification degrades performance Departament d'Arquitectura de Computadors

21

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Comparison of prediction types Branch prediction

Memory dependence prediction

Latency prediction

Which instructions are predicted?

branches

loads

loads

Can the speculative instructions be issued before issuing the predicted instruction?

Yes

No

No

Which instruction performs the verification of the prediction?

branch

previous store

load

Speculative-Window duration?

Variable

Variable

Fixed

Which instructions must be re-executed?

New path

The nullified ones

The nullified ones

Departament d'Arquitectura de Computadors

22

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Verification-Delay values Decoupled TC IQ

R

@

M

M/TC

IQ

R

exe

W

IQ

R

exe

IQ

R

2

IQ

R

@

M

M

IQ

R

exe

W

IQ

R

exe

W

IQ

R

exe

IQ

R

3

IQ

TC

IQ

2 Read Stages, Decoupled TC IQ

Pipelined scheduling logic, Decoupled TC

R

R

@

M

M

IQ

R

R

exe

W

IQ

R

R

exe

W

IQ

R

R

exe

IQ

R

R

IQ

R

4

TC

WU

R

@

M

M

WU

S

R

exe

W

WU

S

R

exe

W

WU

S

R

exe

WU

S

R

WU

S

4

IQ

Departament d'Arquitectura de Computadors

S

TC

WU

23

Universitat Politècnica de Catalunya

Recovery Mechanism for Latency Misprediction

Recovery Buff. & Branch Mispredictions • On a branch misprediction, some instructions that belong to a wrong path can be recorded in the Recovery Buffer ❑ These instructions must not be re-issued ✓ A "structure" contains the instruction-identifier ranges related to wrong-path instructions to filter-out them ❑ Recovery Buffer maintains locally the status of the physical registers: ✓ Set: on issue and re-issue ✓ Unset: on nullifications • These actions are performed concurrently with the normal operations of the Recovery Buffer Departament d'Arquitectura de Computadors

24

Universitat Politècnica de Catalunya