Chapter Four (Part A) The Processor: Datapath and Control
EE334 Spring 2010
1
The Processor: Datapath & Control • We're ready to look at an implementation of the MIPS • Simplified to contain only: – memory-reference instructions: lw, sw – arithmetic-logical instructions: add, sub, and, or, slt – control flow instructions: beq, j • Generic Implementation: – – – –
use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do
• All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow?
EE334 Spring 2010
2
More Implementation Details • Abstract / Simplified View:
Data
PC
Address Instruction memory
Instruction
Register # Registers Register #
ALU
Address Data memory
Register # Data
Two types of functional units: – elements that operate on data values (combinational) – elements that contain state (sequential)
EE334 Spring 2010
3
An unclocked state element •
The set-reset latch – output depends on present inputs and also on past inputs
R
Q
S
Q
A
B
NOR
0
0
1
0
1
0
1
0
0
R
S
Q
Q
1
1
0
0
0
Q
Q
0
1
1
0
1
0
0
1
1
1
0
0
EE334 Spring 2010
Not allowed
4
Latches and Flip-flops • Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) • Change of state (value) is based on the clock • Latches: whenever the inputs change, and the clock is asserted • Flip-flop: state changes only on a clock edge (edge-triggered methodology) "logically true", — could mean electrically low A clocking methodology defines when signals can be read and written — wouldn't want to read a signal at the same time it was being written
EE334 Spring 2010
5
D-latch •
•
Two inputs: – the data value to be stored (D) – the clock signal (C) indicating when to read & store D Two outputs: – the value of the internal state (Q) and it's complement
C
D
EE334 Spring 2010
Q
D
_ Q
C
Q
6
D flip-flop •
Output changes only on the clock edge
D
D C
D latch
Q
D
Q D latch _ C Q
Q _ Q
C
D
C
Q
EE334 Spring 2010
7
Our Implementation • •
An edge triggered methodology Typical execution: – read contents of some state elements, – send values through some combinational logic – write results to one or more state elements
State element 1
Combinational logic
State element 2
Clock cycle
EE334 Spring 2010
8
Register File •
Built using D flip-flops
Read register number 1 Register 0 Register 1 Register n – 1 Register n
M u x
Read data 1
Read register number 1 Read register number 2 Register file
Write register
Read register number 2
Write data M u x
EE334 Spring 2010
Read data 1
Read data 2 Write
Read data 2
9
Register File •
Note: we still use the real clock to determine when to write
Write 0
Register number
C
Register 0
1
D
n-to-1 decoder
C
n– 1
Register 1 D
n
C Register n – 1 D C Register n Register data
EE334 Spring 2010
D
10
Simple Implementation • Include the functional units we need for each instruction Instruction address PC Instruction
Add Sum
Instruction memory
MemWrite
a. Instruction memory
b. Program counter
c. Adder
Address
Write data
5 Register numbers
5 5
Data
3
Read register 1
Read data 1 Read register 2 Registers Write register Read data 2 Write data
ALU control
Read data Data memory
Sign extend
32
MemRead a. Data memory unit
Data
16
b. Sign-extension unit
Zero ALU ALU result
Why do we need this stuff?
RegWrite
a. Registers
EE334 Spring 2010
b. ALU
11
Building the Datapath •
Use multiplexers to stitch them together P C S rc
M u x
Add Add A LU r e s u lt
4 S h ift le f t 2 R e g is te rs PC
R ea d a d d re s s
R ead r e g is t e r 1 R e ad re g is t e r 2
3 R ea d d a ta 1
In s tru c tio n m e m o ry
R ead d a ta 2
M u x
W rite d a ta
A LU ALU re s u lt
Instruction Fetch
EE334 Spring 2010
M e m to R e g
A d d re s s
W ri te d a ta
R e g W rit e 16
M e m W rite
Z e ro
In s tru c tio n W rite r e g is t e r
A L U o p e ra tio n
A L U S rc
S ig n
R ead d a ta
D a ta m e m o ry
M u x
32 M em R ea d
e x te n d
Execution Instruction Decode/Operand Fetch
WB Memory
12
Building the Datapath •
Use multiplexers to stitch them together P C S rc
M u x
Add Add A LU r e s u lt
4 S h ift le f t 2 R e g is te rs PC
R ea d a d d re s s
R ead r e g is t e r 1 R e ad re g is t e r 2
3 R ea d d a ta 1
In s tru c tio n m e m o ry
R ead d a ta 2
M u x
W rite d a ta
S ig n
e x te n d
EE334 Spring 2010
M e m to R e g
A LU ALU re s u lt
A d d re s s
W ri te d a ta
R e g W rit e 16
M e m W rite
Z e ro
In s tru c tio n W rite r e g is t e r
A L U o p e ra tio n
A L U S rc
R ead d a ta
D a ta m e m o ry
M u x
32 M em R ea d
13
Control •
Selecting the operations to perform (ALU, read/write, etc.)
•
Controlling the flow of data (multiplexer inputs)
•
Information comes from the 32 bits of the instruction
•
Example: add $8, $17, $18
•
Instruction Format:
000000
10001
10010
01000
op
rs
rt
rd
00000 100000 shamt
funct
ALU's operation based on instruction type and function code
EE334 Spring 2010
14
Control • •
•
e.g., what should the ALU do with this instruction Example: lw $1, 100($2)
35
2
1
op
rs
rt
16 bit offset
ALU control input 000 001 010 110 111
•
100
AND OR add subtract set-on-less-than
Why is the code for subtract 110 and not 011?
EE334 Spring 2010
15
ALU •
Needs to support Logic and Arithmetic Operation – AND, OR
– ADD, Subtract (using two’s complement) •
Needs to support the set-on-less-than instruction (slt) – remember: slt is an arithmetic instruction
– produces a 1 if rs < rt and 0 otherwise – use subtraction: (a-b) < 0 implies a < b •
Needs to support test for equality (beq $t5, $t6, $t7)
– use subtraction: (a-b) = 0 implies a = b
EE334 Spring 2010
16
Supporting slt
Binvert
Operation CarryIn
a 0
•
Can we figure out the idea?
1 Result b
0
2
1 Less
3
a.
CarryOut
Binvert
Operation CarryIn
a 0 1 Result b
0
2
1 Less
3 Set Overflow detection
b.
EE334 Spring 2010
Overflow
17
Binvert
CarryIn
a0 b0
CarryIn ALU0 Less CarryOut
a1 b1 0
CarryIn ALU1 Less CarryOut
a2 b2 0
CarryIn ALU2 Less CarryOut
Operation
Result0
Result1
Result2
CarryIn
a31 b31 0
EE334 Spring 2010
CarryIn ALU31 Less
Result31 Set Overflow
18
Test for equality •
Bnegate
Operation
Notice control lines: 000 001 010 110 111
= = = = =
and or add subtract slt
a0 b0
CarryIn ALU0 Less CarryOut
Result0
a1 b1 0
CarryIn ALU1 Less CarryOut
Result1
a2 b2 0
CarryIn ALU2 Less CarryOut
Result2
Zero
•Note: zero is a 1 when the result is zero!
a31 b31 0
EE334 Spring 2010
CarryIn ALU31 Less
Result31 Set Overflow
19
Control 3
ALU control
•
Must describe hardware to compute 3-bit ALU control input Zero ALU ALU result – given instruction type 00 = lw, sw ALUOp 01 = beq, computed from instruction type 11 = arithmetic (see next slide) – function code for arithmetic
•
Describe it using a truth table (can turn into gates): ALUOp ALUOp1 ALUOp0 0 0 0 1 1 X 1 X 1 X 1 X 1 X
EE334 Spring 2010
F5 X X X X X X X
Funct field F4 F3 F2 F1 X X X X X X X X X 0 0 0 X 0 0 1 X 0 1 0 X 0 1 0 X 1 0 1
Operation F0 ALU control X 010 lw/sw X 110 beq 0 010 add 0 110 sub 0 000 and 1 001 or 0 111 slt 20
Control 0 M u x Add Add 4 Instruction [31– 26]
ALU result
1
Shift left 2
R egDst Branch MemRead MemtoReg Control ALUOp MemWrite ALUSrc RegWrite Read register 1
Instruction [25– 21] PC
Read address
Instruction [20– 16] Instruction [31– 0]
Instruction memory
Instruction [15– 11]
0 M u x 1
Read data 1 Read register 2 Registers Read Write data 2 register
0 M u x 1
Write data
Zero ALU ALU result
Address
Write data Instruction [15– 0]
16
Read data Data memory
1 M u x 0
32 Sign extend
ALU control
Instruction [5– 0]
Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 EE334 Spring 2010
21
Control (R-format instruction) 0 M u x Add Add
Opcode
4
Instruction [31– 26]
RegWrite
rs rt
Read address
Instruction [20– 16] Instruction [31– 0] Instruction [15– 11]
rd shamt func
Instruction memory
Instruction [15– 0]
1
0
1
Read register 1
Instruction [25– 21] PC
Shift left 2
R egDst Branch MemRead MemtoReg Control ALUOp MemWrite ALUSrc
ALU result
0 M u x 1
1
0
Read data 1 Read register 2 Registers Read Write data 2 register
0 0 M u x 1
Write data
Zero ALU ALU result
0 Address
Write data 16
Read data Data memory
1 M u x 0
32 Sign extend
ALU control
0
Instruction [5– 0]
10
Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 EE334 Spring 2010
22
Control (lw instruction) 0 M u x Add Add
Opcode
4
Instruction [31– 26]
RegWrite
rs rt
Read address
Instruction [20– 16] Instruction [31– 0] Instruction [15– 11]
immediate
Instruction memory
Instruction [15– 0]
1
0
1
Read register 1
Instruction [25– 21] PC
Shift left 2
R egDst Branch MemRead MemtoReg Control ALUOp MemWrite ALUSrc
ALU result
0 M u x 1
0
0
Read data 1 Read register 2 Registers Read Write data 2 register
1 0 M u x 1
Write data
Zero ALU ALU result
1 Address
Write data 16
Read data Data memory
1 M u x 0
32 Sign extend
ALU control
1
Instruction [5– 0]
00
Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 EE334 Spring 2010
23
Control (beq instruction) 0 M u x Add Add
Opcode
4
Instruction [31– 26]
RegWrite
rs rt
Read address
Instruction [20– 16] Instruction [31– 0] Instruction [15– 11]
immediate
Instruction memory
Instruction [15– 0]
1
1
0
Read register 1
Instruction [25– 21] PC
Shift left 2
R egDst Branch MemRead MemtoReg Control ALUOp MemWrite ALUSrc
ALU result
0 M u x 1
X
0
Read data 1 Read register 2 Registers Read Write data 2 register
0 0 M u x 1
Write data
Zero ALU ALU result
X Address
Write data 16
Read data Data memory
1 M u x 0
32 Sign extend
ALU control
0
Instruction [5– 0]
01
Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 EE334 Spring 2010
24
Control (Part 1) •
Simple combinational logic (truth tables) ALUOp ALU control block ALUOp0 ALUOp1
Operation2
F3 F2
Operation
Operation1
F (5– 0) F1 Operation0 F0
ALUOp ALUOp1 ALUOp0 0 0 0 1 1 X 1 X 1 X 1 X 1 X EE334 Spring 2010
F5 X X X X X X X
Funct field F4 F3 F2 F1 X X X X X X X X X 0 0 0 X 0 0 1 X 0 1 0 X 0 1 0 X 1 0 1
Operation F0 X X 0 0 0 1 0
010 110 010 110 000 001 111
lw/sw branch add sub and or slt
25
Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 From MIPS Data Reference
Control (Part 2) •
Simple combinational logic (truth tables)
Opcode [31:26] Op5 0 Op4 0 Inputs
Op3 0
Op2 0 Op1 0 Op0 0
0
000000
R-format inst
4
000100
branch inst.
35
100011
lw inst.
43
101011
sw inst.
Outputs R-format
Iw
sw
beq
RegDst ALU Src MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOpO
EE334 Spring 2010
26
Our Simple Control Structure •
All of the logic is combinational
•
We wait for everything to settle down, and the right thing to be done
– ALU might not produce “right answer” right away – we use write signals along with clock to determine when to write •
Cycle time determined by length of the longest path State element 1
Combinational logic
State element 2
Clock cycle
We are ignoring some details like setup and hold times EE334 Spring 2010
27
Single Cycle Implementation •
Calculate cycle time assuming negligible delays except: – memory (2ns), ALU and adders (2ns), register file access (1ns) PCSrc
Add ALU Add result
4 RegWrite Instruction [25– 21] PC
Read address Instruction [31– 0] Instruction memory
Instruction [20– 16] 1 M u Instruction [15– 11] x 0 RegDst Instruction [15– 0]
Read register 1 Read register 2
Read data 1 Read data 2
Write register Write Registers data 16
Sign 32 extend
1 M u x 0
Shift left 2
MemWrite ALUSrc 1 M u x 0
Zero ALU ALU result
MemtoReg Address
Write data
ALU control
Read data
Data memory
1 M u x 0
MemRead
Instruction [5– 0] ALUOp
EE334 Spring 2010
28
Where we are headed •
•
Single Cycle Problems: – what if we had a more complicated instruction like floating point? – wasteful of area One Solution: – use a “smaller” cycle time – have different instructions take different numbers of cycles – a “multicycle” datapath: Instruction register PC
Address
Data A
Memory
Data
EE334 Spring 2010
Register #
Instruction or data Memory data register
ALU
Registers Register #
ALUOut
B Register #
29
Multicycle Approach •
• •
We will be reusing functional units – ALU used to compute address and to increment PC – Memory used for instruction and data Our control signals will not be determined solely by instruction – e.g., what should the ALU do for a “subtract” instruction? We’ll use a finite state machine for control
EE334 Spring 2010
30
Review: finite state machines •
Finite state machines: – a set of states and – next state function (determined by current state and the input) – output function (determined by current state and possibly input)
Current state
Next-state function
Next state
Clock Inputs
Output function
Outputs
– We’ll use a Moore machine (output based only on current state)
EE334 Spring 2010
31
Multicycle Approach •
•
Break up the instructions into steps, each step takes a cycle – balance the amount of work to be done – restrict each cycle to use only one major functional unit At the end of a cycle – store values for use in later cycles (easiest thing to do) – introduce additional “internal” registers
PC
0 M u x 1
Address Memory MemData Write data
Instruction [25– 21]
Read register 1
Instruction [20– 16]
Read Read register 2 data 1 Registers Write Read register data 2
Instruction [15– 0] Instruction register Instruction [15– 0] Memory data register
EE334 Spring 2010
0 M Instruction u x [15– 11] 1
A
B 4
Write data
0 M u x 1 16
Sign extend
0 M u x 1
32
Zero ALU ALU result
ALUOut
0 1M u 2 x 3
Shift left 2
32
Five Execution Steps •
Instruction Fetch
•
Instruction Decode and Register Fetch
•
Execution, Memory Address Computation, or Branch Completion
•
Memory Access or R-type instruction completion
•
Write-back
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
EE334 Spring 2010
33
R-format instruction (ALU inst) FORMAT opcode
Step: WB IF (ID) Cycle: 4321 OF EX
PC
0 M u x 1
Address
Memory MemData Write data
Instruction [25– 21]
Read register 1
Instruction [20– 16]
Read Read register 2 data 1 Registers Write Read register data 2
Instruction [15– 0] Instruction register Instruction [15– 0] Memory data register
EE334 Spring 2010
0 M Instruction u x [15– 11] 1
rs
B 4
0 M u x 1 Sign extend
0 M u x 1
A
Write data
16
rt
32
rd shamt funct
Zero ALU ALU result
ALUOut
0 1M u 2 x 3
Shift left 2
34
Load instruction FORMAT opcode
Step: WB IF (ID) Cycle: 54321 OF EX MEM
PC
0 M u x 1
Address
Memory MemData Write data
Instruction [25– 21]
Read register 1
Instruction [20– 16]
Read Read register 2 data 1 Registers Write Read register data 2
Instruction [15– 0] Instruction register Instruction [15– 0] Memory data register
EE334 Spring 2010
0 M Instruction u x [15– 11] 1
rs
B 4
0 M u x 1 Sign extend
0 M u x 1
A
Write data
16
rt
32
immediate
Zero ALU ALU result
ALUOut
0 1M u 2 x 3
Shift left 2
35
branch instruction FORMAT opcode
Step: EX IF (ID) Cycle: 321 OF
PC
0 M u x 1
Address
Memory MemData Write data
Instruction [25– 21]
Read register 1
Instruction [20– 16]
Read Read register 2 data 1 Registers Write Read register data 2
Instruction [15– 0] Instruction register Instruction [15– 0] Memory data register
EE334 Spring 2010
0 M Instruction u x [15– 11] 1
rs
B 4
0 M u x 1 Sign extend
0 M u x 1
A
Write data
16
rt
32
immediate
Zero ALU ALU result
ALUOut
0 1M u 2 x 3
Shift left 2
36
Step 1: Instruction Fetch • • •
Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using RTL "Register-Transfer Language" IR = Memory[PC]; PC = PC + 4;
address
Can we figure out the values of the control signals? What is the advantage of updating the PC now?
EE334 Spring 2010
37
Step 2: Instruction Decode and Register Fetch • • •
Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC + (sign-extend(IR[15-0])