Designing MIPS Processor

CSE 675.02: Introduction to Computer Architecture Designing MIPS Processor (Single-Cycle) Presentation G Reading Assignment: 5.1-5.4 Slides by Gojko...
Author: Candice Greene
350 downloads 0 Views 904KB Size
CSE 675.02: Introduction to Computer Architecture

Designing MIPS Processor (Single-Cycle) Presentation G Reading Assignment: 5.1-5.4

Slides by Gojko Babić

Introduction • We're now ready to look at an implementation of the system that includes MIPS processor and memory. • The design will include support for execution of only: – memory-reference instructions: lw & sw, – arithmetic-logical instructions: add, sub, and, or, slt & nor,

– control flow instructions: beq & j, – exception handling: illegal instruction & overflow. • But that design will provide us with principles, so many more instructions could be easily added such as: addu, lb, lbu, lui, addi, adiu, sltu, slti, andi, ori, xor, xori, jal, jr, jalr, bne, beqz, bgtz, bltz, nop, mfhi, mflo, mfepc, mfco, lwc1, swc1, etc. g. babic

Presentation G

2

1

Single Cycle Design • We will first design a simpler processor that executes each instruction in only one clock cycle time. • This is not efficient from performance point of view, since: – a clock cycle time (i.e. clock rate) must be chosen such that the longest instruction can be executed in one clock cycle and – that makes shorter instructions execute in one unnecessarily long cycle. • Additionally, no resource in the design may be used more than once per instruction, thus some resources will be duplicated. • The singe cycle design will require: – two memories (instruction and data), – two additional adders. g. babic

Presentation G

3

Elements for Datapath Design 4

32 32

overflow

32 PC ALU 32

MemWrite

A L U c o n tr o l

Z e ro ALU r e s u lt

32

32

32 Write data

a . P ro g ra m c o u n te r

16

Read data

Address 32

32

Sign extend

Data memory g. Sign-extension unit

c. ALU MemRead e. Data memory unit

MemRead=1 MemWrite =0

Register num bers

5

Read register 1

5

Read register 2 Reg isters W rite register

5 32 Data

W rite d ata

Read data 1

32

32

32 32 A dd

Data

In s tr u c tio n address

S um

32 In s tru c tio n

32 Read data 2

RegWrite

32

32

32

Shift Left 2

In s tru c tio n m em ory

d. A d d e r

f . In st ru c tio n m e m o r y

h. Shift left 2

b . Register File

g. babic

Presentation G

4

2

Abstract /Simplified View (1st look)

Data Register # PC

Address Instruction memory

Instruction

Registers

ALU

Address

Register # Data memory

Register # Data

• This generic implementation: – uses the program counter (PC) to supply instruction address, – gets the instruction from memory, – reads registers, – uses the instruction opcode to decide exactly what to do. 5 g. babic Presentation G

Abstract /Simplified View (2nd look)

Figure 5.1

• PC is incremented by 4 by most instructions, and 4 + 4×offset by branch instructions. • Jump instructions change PC differently (not shown). g. babic

Presentation G

6

3

Our Implementation • •

An edge triggered methodology Typical execution: – read contents of some state elements at the beginning of the clock cycle, – send values through some combinational logic, – write results to one or more state elements at the end of the clock cycle. State element 1

State element 2

Combinational logic

Figure 5.5 Clock cycle

• An edge triggered methodology allows a state element to be read and written in the same clock cycle. g. babic

Presentation G

7

Incrementing PC & Fetching Instruction

A d d

4

R e a d P C a d d r e s s

I n s tr u c t io n

I n s t r u c tio n

Clock

m e m

o r y

Figure 5.6 with addition in red

g. babic

Presentation G

8

4

Datapath for R-type Instructions R e g W r ite

Clock

ALU control I25-21

4

R e a d re g is te r 1 R e a d

I20-16

d a ta

1

R e a d Z e ro re g is te r 2

I n s t r u c t io n

R e g is te r s

I15-11

A L U

W r it e

A L U r e s u lt

re g is te r R e a d d a ta

2

W r it e d a ta

31

R-type

26 25

000000

21 20

rs

16 15

rt

g. babic

11 10

rd

00000

6 5

0

funct

Presentation G

add = 32 sub = 34 slt = 42 and = 36 or = 37 nor = 39

9

Complete Datapath for R-type Instructions Based on contents of op-code and funct fields, Control Unit sets ALU control appropriately and asserts RegWrite, i.e. RegWrite = 1. Add

Clock

R e g W r ite

4

I25-21 Read PC

4

R e a d re g is te r 1

address

I20-16

R e a d d a ta

R e a d

1 Z e ro

re g is te r 2 Instruction

R e g is te r s

I15-11 Instruction

clock

ALU control

A L U

W r ite

A L U r e s u lt

re g is te r R e a d

memory W r ite

d a ta

2

d a ta

g. babic

Presentation G

10

5

Datapath for LW and SW Instructions 31

26 25

21 20

sw or lw opcode

16 15

rs

0

rt

offset M e m W r it e

Clock

I25-21 I20-16 In s tr u c tio n

I20-16

A L U control

4

R ead r e g is t e r 1

M e m W r ite

R ead d a ta 1

R ead r e g is t e r 2

Zero

R e g is te r s W r ite r e g is t e r W r ite d a ta

ALU

ALU r e s u lt

R ea d d a ta

A d d re s s

R ead d a ta 2 D a ta m e m ory W r it e d a ta

R e g W r it e 16

I15-0

32 M em R e ad

S ig n e x te n d

Control Unit sets: • ALU control = 0010 (add) for address calculation for both lw and sw • MemRead=0, MemWrite=1 and RegWrite=0 for sw • MemRead=1, MemWrite=0 and RegWrite=1 for lw g. babic

Presentation G

11

Datapath for R-type, LW & SW Instructions

R e g W r it e

Clock A dd M e m W r it e

rs PC

R e g is t e r s

R ead

rt

rd Clock

m e m o ry

0

d a ta 1

M e m to R e g A L U S rc

Z e ro

R e ad

W r it e

0

d a ta 2

r e g is te r

1

A L U control

R e ad

Read re g is te r 2

In s t r u c ti o n

I n s tr u c t io n

4

r e g is te r 1

R e ad a d d re s s

Clock

RegDst

4

W r it e

ALU A LU re s u lt

A d d res s

R ea d d a ta

1

D a ta

d a ta W r ite

m e m o ry

1 0

d a ta

MemRead =1 MemWrite =0

16

offset

S ig n

32

M e m R ea d

e xte n d

Let us determine setting of control lines for R-type, lw & sw instructions. g. babic

Presentation G

12

6

Datapath for BEQ Instruction 31

26 25

beq

21 20

16 15

rs

0

rt

offset

Branch target = [PC] + 4 + 4×offset P C + 4 fr o m in s t r u c ti o n d a ta p a t h

Add

Sum

B r a n c h ta r g e t

S h if t l e ft 2

rs In s tr u c tio n

rt

A L U control

4

R e ad re g is te r 1 R ead re g is te r 2 R e g is te r s W r ite re g is te r W r ite d ata

R ead d a ta 1 ALU

Z e ro

T o b ran c h c o n t r o l lo g ic

R ead d a ta 2

R e g W r it e 16

offset

Figure 5.9

32 S ig n

with additions in red

e x te n d

g. babic

Presentation G

13

Datapath for R-type, LW, SW & BEQ

P C S rc 0 Add

Clock

R e g W r it e

4

A dd

ALU res u lt

S h ift le ft 2 In str uc tio n [2 5 – 2 1 ] PC

R ead a d d res s In s tru ctio n [3 1 – 0 ] I ns tru ct io n m e m o ry

rs

In str uc tio n [2 0 – 1 6 ]

rt

R ead re g iste r 2 0

M u In str uc tio n [1 5 – 1 1 ] x 1

rd

clock MemRead=1 MemWrite=0

R ead re g iste r 1

W r ite d a ta

Clock

A L U S rc ALU

0 M u x 1

R e g is te rs

Z er o ALU re s u lt

offset

16

S ig n ex te nd

32

M e m to R e g A d d re ss

W rite d a ta

R e g D st In str uc tio n [1 5 – 0 ]

M e m W r it e

R e ad d a ta 1 R e ad d a ta 2

W r ite re g iste r

M u x 1

R ead d a ta

D a ta m e m o ry

1 M u x 0

4 M em Read

ALU control

Figure 5.15 with additions in red g. babic

Presentation G

14

7

Control Unit and Datapath 0 M u x Add A dd

1

S hift le ft 2

R eg D st 4

AL U res u lt

P C S rc

B ra nc h M e m R e ad M e m to R e g

In s tru ction [31 2 6] Co n tro l

opcode

A LU O p M e m W rite A LU S rc R e g W rite

In s tru ction [25 2 1] PC

rs

R ead a d dres s In s tru ction [20 1 6] Ins tru ctio n [31 – 0 ]

Clock

In stru ction m e m o ry

rt

0 M u x

In s tru ction [15 1 1]

1

rd MemRead=1 MemWrite=0

Clock anded R e ad reg ister 1

Clock anded

R e ad d a ta 1 R e ad reg ister 2 R e g is ters R e ad W rite d a ta 2 reg ister

Z ero 0 M u x 1

W rite d a ta

A LU

AL U res ult

A d d re ss

R e ad da ta D ata m e m ory

W rite da ta

offset

In s tru ction [15 0 ]

16

Ins tru ctio n [5 0]

1 M u x 0

32 S ig n e xte nd

A LU co n trol

funct

Figure 5.17 with additions in red g. babic

Presentation G

15

Truth Table for (Main) Control Unit Input

Output

MemtoOp-code

RegDst ALUSrc

Reg

Reg

Mem Mem

Write Read Write Branch ALUOp1 ALUp0

R-type lw

000000

1

0

0

1

d

0

0

1

0

100011

0

1

1

1

1

0

0

0

0

sw

101011

d

1

d

0

0

1

0

0

0

beq

000100

d

0

d

0

d

0

1

0

1

• ALUOp[1-0] = 00  signal to ALU Control unit for ALU to perform add function, i.e. set Ainvert = 0, Binvert=0 and Operation=10 • ALUOp[1-0] = 01  signal to ALU Control unit for ALU to perform subtract function, i.e. set Ainvert = 0, Binvert=1 and Operation=10 • ALUOp[1-0] = 10  signal to ALU Control unit to look at bits I[5-0] and based on its pattern to set Ainvert, Binvert and Operation so that ALU performs appropriate function, i.e. add, sub, slt, and, or & nor g. babic

Presentation G

16

8

Truth Table of ALU Control Unit

Input

ALUOp ALUOp1 ALUOp0 0 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0

F5 d d 1 1 1 1 1 1

Output

Funct field F4 F3 F2 F1 d d d d d d d d 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 1

F0 d d 0 0 0 1 0 1

ALU Control 0 0 10 0 1 10 0 0 10 0 1 10 0 0 00 0 0 01 0 1 11 1 1 00

add sub add sub and or slt nor

Ainvert Bivert Operation

g. babic

17

Design of (Main) Control Unit Op-code Memto- Reg Mem Mem bits 5 4 3 2 1 0 RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 000000

1

0

0

1

d

0

0

1

0

100011

0

1

1

1

1

0

0

0

0

101011

d 0 d 0

1

d

0

0

1

0

0

0

0

d

0

d

0

1

0

1

000100

… 0

… 0

In p u t s Op5 Op4 Op3

Figure C.2.5

Op2 Op1

RegDst =Op5Op4Op3Op2Op1Op0

Op0

O u tp u ts

ALUSrc= Op5Op4Op3Op2Op1Op0 +Op5Op4Op3Op2Op1Op0

R - fo r m a t

Iw

sw

be q

R e gD st A LU S rc M e m to R e g R e g W r ite M emRead M e m W r ite B ra n c h A LU O p 1 A LU O p O

g. babic

18

9

Datapath for R-type, LW, SW, BEQ & J 31

26 25

0

j

jump_target

PC  PC31-28 || jump_target || 00 Instruction [25– 0] 26

Add 2 zeros

Jump address [31– 0]

Shift left 2

28

0

PC[31-28] PC+4 [31– 28]

Add Add 4 Instruction [31– 26]

Control

Instruction [25– 21] PC

Read address

M u x

1

0

shift left 2

RegDst Jump Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Read register 1

Read data 1 Read register 2 Registers Read Write data 2 register

Instruction [20– 16] Instruction [31– 0]

Instruction memory

ALU result

1

M u x

0 M u x 1

Instruction [15– 11]

0 M u x 1

Write data

Zero ALU ALU result

Write data 16

Instruction [15– 0]

Figure 5.24

Sign extend

Read data

Address

1 M u x 0

Data memory

32 ALU control

Instruction [5– 0]

with correction in red g. babic

19

Design of Control Unit (J included) Op-code Memto- Reg Mem Mem bits 5 4 3 2 1 0 RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 Jump

J

000000

1

0

0

1

d

0

0

1

0

100011

0

1

1

1

1

0

0

0

0

0 0

101011

d

1

d

0

0

1

0

0

0

0

000100 000010

d d

0 d

d d

0 0

d d

0 0

1 d

0 d

1 d

0

In p u t s Op5

1 … 0

Op4 Op3 Op2

Jump =Op5Op4Op3Op2Op1Op0

Op1 Op0

R -fo r m a t

Iw

sw

be q

R egD st

Jump

A LU S rc M e m toR e g

No changes in ALU Control unit

R e g W r ite M emRead M e m W r ite B ra n c h A LU O p 1

g. babic

A LU O p O

20

10

Cycle Time Calculation • Let us assume that the only delays introduced are by the following tasks: – Memory access (read and write time = 3 nsec) – Register file access (read and write time = 1 nsec) – ALU to perform function (= 2 nsec) • Under those assumption here are instruction execution times: Instr Reg ALU Data Reg fetch read oper memory write Total R-type 3 + 1 + 2 + 1 = 7 nsec lw 3 + 1 + 2 + 3 + 1 = 10 nsec sw 3 + 1 + 2 + 3 = 9 nsec branch 3 + 1 + 2 = 6 nsec jump 3 = 3 nsec • Thus a clock cycle time has to be 10nsec, and clock rate = 1/10 nsec = 100MHz g. babic

Presentation G

21

Single Cycle Processor: Conclusion • Single Cycle Problems: – what if we had a more complicated instruction like floating point? – a clock cycle would be much longer, – thus for shorter and more often used instructions, such as add & lw, wasteful of time. • One Solution: – use a “smaller” cycle time, and – have different instructions take different numbers of cycles. • And that is a “multi-cycle” processor.

g. babic

Presentation G

22

11