Computer Architecture
ESE 345 Computer Architecture Multiply, Divide, Shift
CA: Multiply, Divide, Shift
1
MIPS Arithmetic Instructions
Instruction add subtract add immediate add unsigned subtract unsigned add imm. unsign. multiply multiply unsigned divide
Example add $1,$2,$3 sub $1,$2,$3 addi $1,$2,100 addu $1,$2,$3 subu $1,$2,$3 addiu $1,$2,100 mult $2,$3 multu$2,$3 div $2,$3
Meaning $1 = $2 + $3 $1 = $2 – $3 $1 = $2 + 100 $1 = $2 + $3 $1 = $2 – $3 $1 = $2 + 100 Hi, Lo = $2 x $3 Hi, Lo = $2 x $3 Lo = $2 ÷ $3,
divide unsigned
divu $2,$3
Lo = $2 ÷ $3,
Move from Hi Move from Lo
mfhi $1 mflo $1
$1 = Hi $1 = Lo
CA: Multiply, Divide, Shift
Comments 3 operands; exception possible 3 operands; exception possible + constant; exception possible 3 operands; no exceptions 3 operands; no exceptions + constant; no exceptions 64-bit signed product 64-bit unsigned product Lo = quotient, Hi = remainder Hi = $2 mod $3 Unsigned quotient & remainder Hi = $2 mod $3 Used to get copy of Hi Used to get copy of Lo
2
MULTIPLY (Unsigned) ° Paper and pencil example (unsigned):
Multiplicand Multiplier
Product
1000 1001 1000 0000 0000 1000 01001000
° m bits x n bits = m+n bit product ° Binary makes it easy: •0 => place 0 ( 0 x multiplicand) •1 => place a copy ( 1 x multiplicand) ° 4 versions of multiply hardware & algorithm: •successive refinement
CA: Multiply, Divide, Shift
3
Unsigned Combinational Multiplier 0 A3
A3
A3
A3
P7
P6
A2
P5
A2
A1
P4
A2
A1
0 A2
A1
0 A1
0 A0
A0
B1
A0
B2
A0
P3
B0
B3 P2
P1
P0
Stage i accumulates A * 2 i if Bi == 1 Q: How much hardware for 32 bit multiplier? Critical path? CA: Multiply, Divide, Shift
4
Carry Save Addition of 4 Integers
Add Columns first, then rows! Can be used to reduce critical path of multiply Example: 53 bit multiply (for floating point): At least 53 levels with naïve technique Only 9 with Carry save addition!
C2 B 2 A2
C1 B1 A1
C0 B 0 A0
I1
I1
I1
I2
I3
Carry Save Adder 3=>2 S1
I2
I3
Carry Save Adder 3=>2
S0
S1
D2 I1
I2
I3
S1
S0
S1
S0
D0 I1
I2
I3
S1
0
I1
Carry Save Adder 3=>2
S0
I3
Carry Save Adder 3=>2
D1
Carry Save Adder 3=>2
I2
I2
I3
Carry Save Adder 3=>2
S0
S1
S0
0 I1
I2
I3
Carry Save Adder 3=>2 S1
S0
I1
I2
I3
Carry Save Adder 3=>2 S1
S0
S4 S3 S2 CA: Multiply, Divide, Shift
I1
I2
I3
Carry Save Adder 3=>2 S1
S0
S1
S0 5
How Does It Work? 0
0
0
0 A3
A3
A3
A3
P7
P6
A2
A2
A1
P5
A2
A1
0 A2
A1
0 A1
0 A0
B0
A0
B1
A0
B2
A0
P4
B3 P3
P2
P1
P0
at each stage shift A left ( x 2) use next bit of B to determine whether to add in shifted multiplicand accumulate 2n bit partial product at each stage CA: Multiply, Divide, Shift
6
Unisigned Shift-Add Multiplier (Version 1)
64-bit Multiplicand reg, 64-bit ALU, 64-bit Product reg, 32-bit multiplier reg Shift Left
Multiplicand 64 bits
Multiplier 64-bit ALU
Product
Shift Right
32 bits Write Control
64 bits
Multiplier = datapath + control CA: Multiply, Divide, Shift
7
Multiply Algorithm Version 1 Multiplier0 = 1
Start
1. Test Multiplier0
Multiplier0 = 0
1a. Add multiplicand to product & place the result in Product register
Product 0000 0000 0000 0010 0000 0110 0000 0110
Multiplier 0011 0001 0000
Multiplicand 0000 0010 0000 0100 0000 1000
2. Shift the Multiplicand register left 1 bit.
3. Shift the Multiplier register right 1 bit.
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions Done CA: Multiply, Divide, Shift
8
Observations on Multiply Version 1
1 clock per cycle => 100 clocks per multiply Ratio of multiply to add 5:1 to 100:1 1/2 bits in multiplicand always 0 => 64-bit adder is wasted 0’s inserted in left of multiplicand as shifted => least significant bits of product never changed once formed Instead of shifting multiplicand to left, shift product to right?
CA: Multiply, Divide, Shift
9
MULTIPLY HARDWARE Version 2
32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, 32-bit Multiplier reg
Multiplicand 32 bits Multiplier 32-bit ALU
Shift Right
32 bits Shift Right
Product 64 bits
Control Write
CA: Multiply, Divide, Shift
10
How to Think of This? Remember original combinational multiplier: 0 A3
A3
A3
A3
P7
P6
A2
P5
A2
A1
P4
A2
A1
0 A2
A1
0 A1
0 A0
A0
B1
A0
B2
A0
P3
B0
B3 P2
P1
CA: Multiply, Divide, Shift
P0
11
Simply warp to let product move right... 0
0
0
0
A3
A2
A1
A0
A3
A2
A1
A0
B0
B1 A3
A2
A1
A0
A3
A2
A1
A0
P7
P6
P5
B2
B3
P4
P3
P2
P1
P0
Multiplicand stay’s still and product moves right CA: Multiply, Divide, Shift
12
Start
Multiply Algorithm Multiplier0 = 1 Version 2
1. Test Multiplier0
Multiplier0 = 0
1a. Add multiplicand to the left half of product & place the result in the left half of Product register Product Multiplier Multiplicand 1: 2: 3: 1: 2: 3: 1: 2: 3: 1: 2: 3:
0000 0000 0010 0000 0001 0000 0001 0000 0011 0000 0001 1000 0001 1000 0001 1000 0000 1100 0000 1100 0000 1100 0000 0110 0000 0110
0011 0011 0011 0001 0001 0001 0000 0000 0000 0000 0000 0000 0000
0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010
0000 0110
0000
0010
2. Shift the Product register right 1 bit.
3. Shift the Multiplier register right 1 bit.
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions Done CA: Multiply, Divide, Shift
13
Still More Wasted Space!
Start
Multiplier0 = 1
1. Test Multiplier0
Multiplier0 = 0
1a. Add multiplicand to the left half of product & place the result in the left half of Product register Product Multiplier Multiplicand 1: 2: 3: 1: 2: 3: 1: 2: 3: 1: 2: 3:
0000 0000 0010 0000 0001 0000 0001 0000 0011 0000 0001 1000 0001 1000 0001 1000 0000 1100 0000 1100 0000 1100 0000 0110 0000 0110
0011 0011 0011 0001 0001 0001 0000 0000 0000 0000 0000 0000 0000
0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010
0000 0110
0000
0010
2. Shift the Product register right 1 bit.
3. Shift the Multiplier register right 1 bit.
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions Done CA: Multiply, Divide, Shift
14
Observations on Multiply Version 2
Product register wastes space that exactly matches size of multiplier => combine Multiplier register and Product register
CA: Multiply, Divide, Shift
15
MULTIPLY HARDWARE Version 3
32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, (0-bit Multiplier reg)
Multiplicand 32 bits 32-bit ALU Shift Right Product (Multiplier) 64 bits
Control Write
CA: Multiply, Divide, Shift
16
Multiply Algorithm Version 3 Multiplicand 0010
Start
Product 0000 0011
Product0 = 1
1. Test Product0
Product0 = 0
1a. Add multiplicand to the left half of product & place the result in the left half of Product register
2. Shift the Product register right 1 bit.
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions Done CA: Multiply, Divide, Shift
17
Observations on Multiply Version 3
2 steps per bit because Multiplier & Product combined MIPS registers Hi and Lo are left and right half of Product Gives us MIPS instruction MultU How can you make it faster? What about signed multiplication? easiest solution is to make both positive & remember whether to complement product when done (leave out the sign bit, run for 31 steps) apply definition of 2’s complement need to sign-extend partial products and subtract at the end Booth’s Algorithm is elegant way to multiply signed numbers using same hardware as before and save cycles can handle multiple bits at a time
CA: Multiply, Divide, Shift
18
Motivation for Booth’s Algorithm ° Example 2 x 6 = 0010 x 0110: 0010 x 0110 + 0000 + 0010 + 0010 + 0000 00001100
shift (0 in multiplier) add (1 in multiplier) add (1 in multiplier) shift (0 in multiplier)
° ALU with add or subtract gets same result in more than one way: 6 = – 2 + 8 0110 = – 00010 + 01000 = 11110 + 01000 ° For example ° x
– . . 1)
+
0010 0110 0000 shift (0 in multiplier) 0010 sub (first 1 in multpl.) 0000 shift (mid string of 1s) 0010 add (prior step had last 00001100 CA: Multiply, Divide, Shift
19
Booth’s Algorithm end of run
middle of run
beginning of run
0 1 1 1 1 0 Current Bit Bit to the Right
Explanation
Example
Op
1
0
Begins run of 1s
0001111000
sub
1
1
Middle of run of 1s
0001111000
none
0
1
End of run of 1s
0001111000
add
0
0
Middle of run of 0s
0001111000
none
Originally for Speed (when shift was faster than add) ° Replace a string of 1s in multiplier with an initial subtract when we first see a one and then later add for the bit after the last one –1 + 10000 01111 CA: Multiply, Divide, Shift
20
Booths Example (2 x 7) Operation
Multiplicand
Product
next?
0. initial value
0010
0000 0111 0
10 -> sub
1a. P = P - m
1110
+ 1110 1110 0111 0
shift P (sign ext)
1b.
0010
1111 0011 1
11 -> nop, shift
2.
0010
1111 1001 1
11 -> nop, shift
3.
0010
1111 1100 1
01 -> add
4a.
0010
+ 0010 0001 1100 1
shift
0000 1110 0
done
4b.
0010
CA: Multiply, Divide, Shift
21
Booths Example (2 x -3) Operation
Multiplicand
Product
next?
0. initial value
0010
0000 1101 0
10 -> sub
1a. P = P - m
1110
+1110 1110 1101 0
shift P (sign ext)
1111 0110 1 + 0010
01 -> add
1b.
0010
2a.
0001 0110 1
shift P
2b.
0010
0000 1011 0 + 1110
3a.
0010
1110 1011 0
shift
3b. 4a
0010
1111 0101 1 1111 0101 1
11 -> nop shift
4b.
0010
1111 1010 1
done
CA: Multiply, Divide, Shift
10 -> sub
22
Radix-4 Modified Booth’s Multiple representations Once admit new symbols (i.e. 1), can have multiple representations of a number: Current Bits
Bit to the Right
Explanation
00
0
Middle of zeros
00 00 00 00 00
00
(0)
01
0
Single one
00 00 00 01 00
01
(1)
10
0
Begins run of 1s
00 01 11 10 00
10 (-2)
11
0
Begins run of 1s
00 01 11 11 00
01 (-1)
00
1
Ends run of 1s
00 00 11 11 00
01
(1)
01
1
Ends run of 1s
00 01 11 11 00
10
(2)
10
1
Isolated 0
00 11 10 11 00
01 (-1)
11
1
Middle of run
00 11 11 11 00
00
CA: Multiply, Divide, Shift
Example
Recode
(0)
23
Divide: Paper & Pencil 1001 Divisor 1000 1001010 –1000 10 101 1010 –1000 10
Quotient Dividend
Remainder (or Modulo result)
See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or 0 * divisor Dividend = Quotient x Divisor + Remainder 3 versions of divide, successive refinement
CA: Multiply, Divide, Shift
24
DIVIDE HARDWARE Version 1
64-bit Divisor reg, 64-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Shift Right
Divisor 64 bits
Quotient 64-bit ALU
Remainder
Shift Left
32 bits Write Control
64 bits
CA: Multiply, Divide, Shift
25
Divide Algorithm Version 1 Takes n+1 steps for n-bit Quotient & Rem. Remainder Quotient Divisor 0000 0111 0000 0010 0000
Start: Place Dividend in Remainder
1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register.
Remainder 0
2a. Shift the Quotient register to the left setting the new rightmost bit to 1.
Test Remainder
Remainder < 0
2b. Restore the original value by adding the Divisor register to the Remainder register, & place the sum in the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0.
3. Shift the Divisor register right1 bit.
n+1 repetition?
No: < n+1 repetitions
Yes: n+1 repetitions (n = 4 here) Done CA: Multiply, Divide, Shift
26
Divide Algorithm I example (7 / 2) Remainder Quotient
1: 2: 3: 1: 2: 3: 1: 2: 3: 1: 2: 3: 1: 2: 3:
0000 1110 0000 0000 1111 0000 0000 1111 0000 0000 0000 0000 0000 0000 0000 0000
0111 0111 0111 0111 0111 0111 0111 1111 0111 0111 0011 0011 0011 0001 0001 0001
00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00001 00001 00001 00011 00011
Divisor
0010 0010 0010 0001 0001 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 1000 1000 1000 0100 0100 0100 0010 0010 0010 0010
CA: Multiply, Divide, Shift
Answer: Quotient = 3 Remainder = 1
27
Observations on Divide Version 1
1/2 bits in divisor always 0 => 1/2 of 64-bit adder is wasted => 1/2 of divisor is wasted Instead of shifting divisor to right, shift remainder to left? 1st step cannot produce a 1 in quotient bit (otherwise too big) => switch order to shift first and then subtract, can save 1 iteration
CA: Multiply, Divide, Shift
28
Divide: Paper & Pencil Divisor 0001
01010 00001010 00001 –0001 0000 0001 –0001 0 00
Quotient Dividend
Remainder (or Modulo result)
Notice that there is no way to get a 1 in leading digit! (this would be an overflow, since quotient would have n+1 bits)
CA: Multiply, Divide, Shift
29
DIVIDE HARDWARE Version 2
32-bit Divisor reg, 32-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg
Divisor 32 bits Quotient Shift Left 32-bit ALU
32 bits Shift Left
Remainder 64 bits
Control Write
CA: Multiply, Divide, Shift
30
Divide Algorithm Version 2 Remainder 0000 0111
Quotient 0000
Divisor 0010
Start: Place Dividend in Remainder 1. Shift the Remainder register left 1 bit.
2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Remainder 0
3a. Shift the Quotient register to the left setting the new rightmost bit to 1.
Test Remainder
Remainder < 0
3b. Restore the original value by adding the Divisor register to the left half of the Remainder register, &place the sum in the left half of the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0.
nth repetition?
No: < n repetitions
Yes: n repetitions (n = 4 here) Done CA: Multiply, Divide, Shift
31
Observations on Divide Version 2
Eliminate Quotient register by combining with Remainder as shifted left Start by shifting the Remainder left as before. Thereafter loop contains only two steps because the shifting of the Remainder register shifts both the remainder in the left half and the quotient in the right half The consequence of combining the two registers together and the new order of the operations in the loop is that the remainder will shifted left one time too many. Thus the final correction step must shift back only the remainder in the left half of the register CA: Multiply, Divide, Shift
32
DIVIDE HARDWARE Version 3
32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg)
Divisor 32 bits 32-bit ALU “HI”
“LO”
Shift Left
Remainder (Quotient) 64 bits
Control Write
CA: Multiply, Divide, Shift
33
Divide Algorithm Version 3 Remainder 0000 0111
Divisor 0010
Start: Place Dividend in Remainder
1. Shift the Remainder register left 1 bit. 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Remainder 0
3a. Shift the Remainder register to the left setting the new rightmost bit to 1.
Test Remainder
Remainder < 0
3b. Restore the original value by adding the Divisor register to the left half of the Remainder register, &place the sum in the left half of the Remainder register. Also shift the Remainder register to the left, setting the new least significant bit to 0.
nth repetition?
No: < n repetitions
Yes: n repetitions (n = 4 here) Done. Shift left half of Remainder right 1 bit. CA: Multiply, Divide, Shift
34
Observations on Divide Version 3
Same Hardware as Multiply: just need ALU to add or subtract, and 64-bit register to shift left or shift right Hi and Lo registers in MIPS combine to act as 64-bit register for multiply and divide Signed Divides: Simplest is to remember signs, make positive, and complement quotient and remainder if necessary Note: Dividend and Remainder must have same sign
Note: Quotient negated if Divisor sign & Dividend sign disagree e.g., –7 ÷ 2 = –3, remainder = –1
What about? –7 ÷ 2 = –4, remainder = +1 Possible for quotient to be too large: if divide 64-bit integer by 1, quotient is 64 bits (“called saturation”)
CA: Multiply, Divide, Shift
35
MIPS Logical Instructions
Instruction Example Meaning and and $1,$2,$3 or or $1,$2,$3 xor xor $1,$2,$3 nor nor $1,$2,$3 and immediate andi $1,$2,10 or immediate ori $1,$2,10 xor immediate xori $1, $2,10 shift left logical sll $1,$2,10 shift right logical srl $1,$2,10 shift right arithm. sra $1,$2,10 shift left logical sllv $1,$2,$3 shift right logical srlv $1,$2, $3 shift right arithm. srav $1,$2, $3
Comment $1 = $2 & $3 $1 = $2 | $3 $1 = $2 $3 $1 = ~($2 |$3) $1 = $2 & 10 $1 = $2 | 10 $1 = ~$2 &~10 $1 = $2 > 10 $1 = $2 >> 10 $1 = $2 > $3 $1 = $2 >> $3
CA: Multiply, Divide, Shift
3 reg. operands; Logical AND 3 reg. operands; Logical OR 3 reg. operands; Logical XOR 3 reg. operands; Logical NOR Logical AND reg, constant Logical OR reg, constant Logical XOR reg, constant Shift left by constant Shift right by constant Shift right (sign extend) Shift left by variable Shift right by variable Shift right arith. by variable
36
Shifters Two kinds: logical-- value shifted in is always "0" "0" msb lsb "0"
arithmetic-- on right shifts, sign extend msb
lsb
"0"
Note: these are single bit shifts. A given instruction might request 0 to 31 bits to be shifted!
CA: Multiply, Divide, Shift
37
Combinational Shifter from MUXes Basic Building Block sel 8-bit right shifter A7
B
A
1 0 D
A6
A5
A4
A3
A2
A1
A0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
R7
R6
R5
R4
R3
R2
R1
R0
What comes in the MSBs?
How many levels for 32-bit shifter?
What if we use 4-1 Muxes ? CA: Multiply, Divide, Shift
S2 S1 S0
38
General Shift Right Scheme using 16 bit example
S0 (0,1) S1 (0, 2) S2 (0, 4)
S3 (0, 8)
If added Right-to-left connections could support Rotate (not in MIPS but found in ISAs) CA: Multiply, Divide, Shift
39
Funnel Shifter Instead Extract 32 bits of 64. Y
X
Shift Right
Shift A by i bits (sa= shift right amount) Logical: Y = 0, X=A, sa=i Arithmetic? Y = _, X=_, sa=_ Rotate? Y = _, X=_, sa=_ Left shifts? Y = _, X=_, sa=_
R Y
X 32
32 Shift Right 32 R
CA: Multiply, Divide, Shift
40
Array Funnel Shifter Technology-dependent solutions: transistor per switch SR3
SR2
SR1
SR0 D3
D2 A6
D1 A5
D0 A4
A3
A2
A1 CA: Multiply, Divide, Shift
A0 41
Summary
Multiply: successive refinement to see final design
32-bit Adder, 64-bit shift register, 32-bit Multiplicand Register Booth’s algorithm to handle signed multiplies There are algorithms that calculate many bits of multiply per cycle
Shifter: success refinement 1/bit at a time shift register to a funnel shifter
CA: Multiply, Divide, Shift
42
Acknowledgements
These slides contain material developed and copyright by:
Morgan Kauffmann (Elsevier, Inc.) Arvind (MIT) Krste Asanovic (MIT/UCB) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB) David Patterson (UCB)
CA: Multiply, Divide, Shift
43