A design of 16 bit pipelined serial/parallel multiplier Team Members

Advisor

JingRen

Dr. DurgaMisra

[email protected]

[email protected]

KrutikaPatil [email protected]

Index 1.Introduction……………………………………………………………………………………………………….E rror! Bookmark not defined. Restrictions and Objective………………………………………………………………………………3 2. I/O Setup…………….…………………………………………………………………………………………….4 Pin definitions………….……..……………………………………………………………….………..Error! Bookmark not defined. 3. Working…………………………………………………………………………………………………………….5 4. Components Chosen for Design Full Adder……………………………..…………………………………………………………………..7 D Flip Flop……………………………..…………………………………………………………………..9 PISO…………………..…………………………………………………………………………………….10 5. Transistor Level Schematic Full Adder…………………………………………………………………………………………………11 D Flip Flop……………………………..…………………………………………………………………11 PISO…………………………………..…………………………………………………………………….12 16-Bit Multiplier ………………………………………………………………………………………12 6. HSICE Simulation …………………….………………………………………………………………………13 7. Output Verification Full Adder…………………………………………………………………………………………………13 D Flip Flop…………………………………..……………………………………………………………14 4-Bit Multiplier…………………………………………………………………………………………14 8. Delay Calculation………………………………..…………………………………………………………..15 9. Layout 1

Inverter……………………………….………………………………………………………………….17 NAND……………………………………………………………………………………………………….18 Latch…………………..……………………………………………………………………………………19 DFF……………………………………………..……………………………………………………………20 Mirror Adder………………………………………………..………………………………………….21 10. Summary And Future Work……………………………………………………………………………22 11. Schematic Driven Layout…………………………..…………………………………..………………23 Appendix A.1 Final netlist ........... …………………………………………………………………………………31 A.2 The netlist generated by mentor graphics ....... Error! Bookmark not defined. 4.3 PEX Extraction ................................................ Error! Bookmark not defined.

2

1. Introduction The Designing of the 16-bit pipelined serial/parallel multiplier is done by utilizing the MOSIS (TSMC) nwell 0.35 μm CMOS (SUBM) process (λ = 0.2 μm) as described in course. The 16-bit Pipelined Serial/Parallel Multiplier is capable of multiplying two 16-bit numbers. The two inputs are fed to the multiplier in parallel fashion. The multiplier is the converted in serial input by means of 16-bit “Parallel-In-Serial-Out Shift Register” (PISO). The shifting of the input is controlled by Shift/Load (S/L) signal line.

Restrictions and Objective 

Minimum total chip power dissipation: Specify the initial and final power consumption of your chip and what steps you have taken so that the power is reduced.



Minimum silicon chip area. Specify the final area of your chip with and without the bonding pads. Demonstrate the compactness of your final design.



Data output rate: preferably 500 MHz if possible but definitely not less than 450MHz.



Good Noise Margin



Less than 40 pin-out count

3

2. I/O Setup Pin definitions Since we have only 40 pins in total, total, and for the convenience of the user we have kept both inputs parallel. So we have kept the output serial. serial We use 38 pins as following:

 16 pins: Input Port M  16 pins: Input Port N  1 pins: Output Port L  1 pin: Clock;  1 pin: S/L(Shift Register)  1 pin: CE  1 pin: VDD;  1 pin: GND.

Pin Count Assignment

4

3. Working: Basic Block diagram of the pipelined serial/parallel multiplier:

Fig 3.1. Serial/Parallel Multiplier

Approach to the design process: The above fig. 3.1 is the main logic block diagram of the serial/parallel multiplier. 16bit signal ‘Y’ and 16bit signal ‘X’ are inputted to the 16-bit multiplier in parallel manner. Since we need input ‘X’ to be serial, we have used 16-bit parallel to serial converter. Then the output is computed. The multiplication is performed by means of successive additions of columns of the shifted partial products matrix. As left-shifting by one bit in serial systems is obtained by a 1-bit delay element, the multiplier is successively shifted and gets the appropriate bit of the multiplicand. The delayed, gated instance of the multiplicand must all be in the same column of the shifted partial-product matrix. They are then added to form the required product bit for the particular column.

5

Fig. 3.2 Pipelined Serial/Parallel Multiplier

The structure in Fig. 3.1 Can be pipelined with introduction of two delay elements in each cell, as shown in Fig 3.2. If rounding or truncation of the product term to the same word length as the input is tolerated, then the time necessary to produce a product is 2M clock cycles,where M is no. of bits in multiplicand. In this case the multiplier accumulates partial product sums, starting with the least significant partial product. After each addition, the result is an N-bit number that shortens to N-1 bits before the next partial product is added. Here, it can be noted that the chip area increases as the length of multipliers increases.

6

4. Components chosen for the design 

Full Adder:



NAND and Inverter



D Flip Flop



Shift register (PISO)

Full Adder Full Adder: There are many adder topologies to satisfy the above truth table. However, the adder chosen to for our design is the Mirror Adder with 28 transistors. Below are some of the criteria we have considered and compared the mirror adder with the other adders. Comparison graphs of different adders based on the transistor count, power consumption and delay: I. Transistor count II.

Power consumption

III.

Delay

Fig. 4.1:: Adder architecture comparisons: Criteria Criteria- Transistor Count

7

Fig. 4.2:: Adder architecture comparisons: Criteria-Power Criteria Power Consumption

Fig. 4.3:: Adder architecture comparisons: Criteria Criteria- Delay Since the mirror adder balances and achieves the best result for the transistor count, power consumption and delay, we tend to choose mirror adder with 28 transistors in our design. Also, the number of transistor in series is less in mirror adder when compared to the the other architectures which is a huge advantage in the delay.

8

D Flip Flop The flip flop chosen in our design is D- Flip flop. The D- Flip Flop is preferred integrated circuits applications. It avoids the indeterminate state of the S-R flip flop when all the inputs are high. It is also much simpler than the J-K Flip flop. D flip flop can be composed by two latches and one inverter, while each of the latches can be constructed by two transgates and three inverters. The schematic of the latch and d flip flop are shown in figure 2.5.

(a)

(b) Fig. 4.4: (a) Latch (b) D flip flop

9

Parallel-in, serial-out out (PISO) (PISO design This configuration has the data input on lines D1 through D4 in parallel format. To write the data to the register, the Write/Shift control line must be held LOW. To shift the data, the W/S control line is brought HIGH and the registers are clocked. The arrangement now acts as a PISO shift register, with D1 as the Data Input. However, as long as the number of clock cycles is not more than the length of the datastring, the Data Output, Q, will be the p parallel arallel data read off in order. In our project we need to design a 16 bit PISO for input port M.

Fig. 4.5: 4-bit PISO Shift Register

10

5. Transistor Level Schematic of each component We have designed each schematic separately and tested the working of the logic using the output waveforms. After verifying the outputs for all the components, we have combined the schematics by creating instances to achieve a better clarity.

• Full adder schematics

Fig. 5.1 : Transistor level schematic of Mirror adder

• D Flip Flop schematics

Fig. 5.2 : Transistor level schematic of D flip flop 11

• PISO schematics

Fig. 5.3: Transistor level schematic of 16 bit PISO

• 16 bit multiplier schematics

Fig. 5.4: Transistor level schematic of 16 bit Multiplier 12

6. HSPICE Simulation and Delay Analysis Extraction of Netlist for each component from schematic using HSPICE: The netlist for each component was extracted using the HSPICE. To draw all the layout by hand takes a lot of time and may easily cause logic errors. In this project, we use the function“schematic function“ driven layout”(SDL) (SDL) provided by mentor graphics to help us design the layout more efficiently. efficiently

7. Output Verification for each component: PRE-LAYOUT SIMULATION: The graphical output is expected to satisfy the truth table. Hence for each component truth table is shown and the graphical output is verified accordingly.



Full adder:

Fig. 7.1: Truth table of the full adder

Fig. 7.2: Output verification of Mirror Adder 13



D Flip Flop

Fig. 7.3: Output verification of D-Flip Flop



4 bit Multiplier

Fig. 7.4: Output verification of 4-bit multiplier 14

Figure shows when input is 0011*1010, the output is 00011110. However, to present the output of a 16 bit Multiplier is not feasible to be presented in the report as it becomes very conspicuous to identify the outputs. Hence we have just shown the 4 bit output. If the 4-bit output is working, quiet obvious that 16- bit should be working. As it can be seen, the 4-bit multiplier works fine at 500 MHz.

8. Delay Calculations and Analysis Referring to the initial delays, here is the Final delay measurements of each component after using the Optimization of W/L for improving the delay:



Full Adder: tpdr = 46.75 ps tpdf= 52.27 ps tavg= 24.757 ps



D- Flip Flop: tpdr= 86.243 ps tpdf= 91.20 ps tavg= 44.362 ps



PISO: tpdr= 81.5 ps tpdf= 84 ps tavg= 82.75 ps



Inverter: tpdr= 43.7 pstpdf= 47.2ps tavg= 45.45 ps

Critical path Analysis: The critical path is an essential analysis that is required since this is the slowest path in the circuit. The critical path in our circuit is the S0 as this is the longest path to get the product. Critical path is determined by the presence of number of transistors in series.

15

Noise Margin test: Noise Margin: The maximum voltage amplitude of extraneous signal that can be algebraically added to the noise-free worst-case case input level without causing the output voltage to deviate from the allowable logic voltage level. The term "input", as used here, refers to logic input terminals, power supply terminals, or ground reference terminals. A good noise margin is expected to tolerate a voltage swing of 10 – 20%. As a good noise margin is one of our specifications of the design, the test of noise margin was carried out by varying the input voltage. Vdd = 3.3 V, Variation Range: 2.64 V ~ 3.96V With this range of variation also, the output did not fluctuate erroneously. Below shown is the screen shot for the test of noise margin.

Fig. 8..1: Test for noise margin (Higher range: 3.96V)

16

Power Consumption Power consumed by the transistor devices is another important measure to figure out the heat dissipation of the circuit. The average and maximum power consumption for each cycle can be measured accordingg to the specified clock period and is measured using the HSPICE command: .measure pwr AVG P(Vdd) FROM =0ns TO=3ns. .measure pwr AVG P(Vdd) FROM =0ns TO=3ns. The power consumed by 4-bit bit slice is: 0.46 mW pwr= 4.6250E-04 04 from= 0.0000E+00 to= 3.0000E-09 The power consumed by 16-bit bit slice is : 7.49mW

9. Layout Inverter The Common inverter we used in our circuit is Wp= Wp=0.8u Wn=0.4u inverter. The layout of the inverter is shown in figure 9.1.

Area: 26*62 Fig 9.1 Layout of common inverter 17

NAND The NAND we use in this circuit is Wp=0.8u and Wn=0.4u. The layout of the NAND is shown in figure 9.2.

Area 64*39 Fig 9.2 layout of NAND

18

Latch After the transgate and the inverter’s layout is completed, we can combined the layout of these two together and eventually get the layout of the latch. (Figure 9.3)

Fig 9.3: layout of Latch

19

D Flip Flop Since the D flip flop is composed by two latches and one inverter, by combing the layout of two latches and one inverter together, we can get the layout of D flip flop shown in figure 9.5. 9.5

Area 62*180 Figure 9.5 layout of D Flip Flop

20

Mirror Adder

Figure 9.6: layout of Full adder

Area Estimation: 

Area estimated initially for 4 bit slice is 94micro m X 44 micro m



Area occupied by the entire 16 bit slice is 3.008 mm X 2.408 mm



The total transistor count is 1796

21

10. Summary and Future Work During the past month we made a design of this 16-bit Pipelined Serial/Parallel Multiplier. We tried some different ways to implement the function and finally finish the layout of all the basic components required for the implementation of the multiplier. The project still has a bit to go to the end, including the layout of the whole part and a total after-layout simulation. Nevertheless, we learns the principles of the 16-bit Pipelined Serial/Parallel. Besides, we learn how to use the function of SDL which greatly helps us in the work of layout. Our sincere thanks to Dr. Misra for timely guidance.

22

11. Schematic Driven layout (SDL) Schematic-driven layout (SDL) is a design methodology that enables physical design engineers to create IC layouts based on information from a logic source. We are always worrying about the problem which is the match between schematic and layout. If we enable SDL in Mentor IC Station, the tool will tell us how many instances, nets, ports included in the schematic. The IC Station will even help us to put these parts into layout.

Create Cell Firstly, click “File”, “Cell”, and “Create”. (Figure 11.1). We need to specify view point of the cell we want to create. The path of view point usually is “sdl” under schematic.

Figure 11.1

We’d better to select “Flat” in Logic Loading Options, when we are going to draw layout for single gate. (Figure 11.2)

Figure 7.2 23

After that, Click “OK”.

Process Layout We now can see the main work space of Mentor IC Station. On the right side, there is a menu. When choose “DLA Layout” here. (Figure 11.3)

Figure 11.3

And then, we can check our schematic after click “Open”. (Figure 11.4)

Figure 11.4

24

A new window will be opened which displays schematic of the cell. (Figure 11.5)

Figure 11.5

On the right side menu, there are three option: “AutoInst”, “Inst”, “Port”. “AutoInst” will help us put transistor onto layout automatically. “Inst” will ask you location of transistor. “Port” will help you put all ports in schematic. Here, for example, we click “AutoInst” and “Port”. ()

Figure 11.6

25

After that, on layout window, there is several instance and ports. And we can use shape or path to connect instance according to yellow line. (Figure 11.7)

Figure 11.7

LVS Check LVS means “Layout vs Schematic”. It is used to check if layout matches schematic. Until last step, we have finished the layout of cell. On the right side menu, we choose “ICTrace(M)”. (Figure 11.8)

Figure 11.8

26

Because we have decide the rule when we create a cell, we don’t need to load rule again. Here we just need to choose “LVS”. (Figure 11.9)

Figure 11.9

The “Source Name” here should be “lvs” under the schematic. And “Source Type”should be “eddm”. (Figure 11.10) And then click “OK”.

Figure 11.10

27

Now, we can view the lvs report through “Report”, “LVS”. (Figure 11.11)

Figure 11.11

The LVS report will be display as following: (Figure 11.12)

Figure 11.12 28

The smile face means our layout matches schematic. Or else, report will tell us the difference between layout and schematic. (Figure 11.13)

Figure 11.13

W/L Ratio Modification The W/L ratio can be modified easily with “DLA Device”. (Figure 11.14)

Figure 11.14 29

We can modify width and length of transistor here after choosing a transistor. (Figure 7.15)

Figure 7.15

30

Appendix A.1 Final netlist * ELDO netlist generated with ICnet by 'jr223' on Sat Dec 11 2010 at 12:52:21 * * Globals. * .global gndvdd * * Component pathname : $HOME/ECE658/Project/transgate * .subckttransgateclkclkb out in mp1 out clkb in vdd p L=0.4u W=0.8u mn1 out clk in gnd n L=0.4u W=0.4u .ends transgate * * Component pathname : $HOME/ECE658/Project/inv * .subcktinv in out mn1 out in gndgnd n L=0.4u W=0.4u mp1 out in vddvdd p L=0.4u W=0.8u .ends inv * * Component pathname : $HOME/ECE658/Project/df * .subcktdf q d clk x_transgate4 n$215 clk n$9 n$434 transgate x_transgate3clk n$215 n$8 n$227 transgate x_transgate2clk n$215 n$434 n$2 transgate x_transgate1 n$215 clk n$227 d transgate x_inv5clk n$215 inv 31

x_inv4 n$2 n$8 inv x_inv3 q n$9 inv x_inv2 n$434 q inv x_inv1 n$227 n$2 inv .ends df * * Component pathname : $HOME/ECE658/Project/NAND2 * .subcktnand2 a b y mn2 n$4 b gndgnd n L=0.4u W=0.8u mn1 y a n$4 gnd n L=0.4u W=0.8u mp2 y b vddvdd p L=0.4u W=0.8u mp1 y a vddvdd p L=0.4u W=0.8u .ends nand2 * * Component pathname : $HOME/workspace/and * .subcktand a b out x_nand21 a b n$3 nand2 x_inv1 n$3 out inv .ends and * * Component pathname : $HOME/workspace/adder * .subcktadder cout sum a b cin mn9 n$467 b gndgnd n L=0.4u W=0.4u mn8 n$457 b gndgnd n L=0.4u W=0.4u mn7 n$453 a n$457 gnd n L=0.4u W=0.4u mn6 n$882 cin n$453 gnd n L=0.4u W=0.4u mn5 n$214 a n$449 gnd n L=0.4u W=0.4u mn4 n$449 b gndgnd n L=0.4u W=0.4u mn3 n$433 a gndgnd n L=0.4u W=0.4u mn2 n$433 b gndgnd n L=0.4u W=0.4u mn1 n$476 cin n$433 gnd n L=0.4u W=0.4u mp13 n$882 cin n$223 vdd p L=0.4u W=0.8u mp12 n$223 b n$221 vdd p L=0.4u W=0.8u 32

mp11 n$221 a n$220 vdd p L=0.4u W=0.8u mp10 n$882 n$476 n$220 vdd p L=0.4u W=0.8u mp9 n$476 a n$217 vdd p L=0.4u W=0.8u mp8 n$214 cin n$213 vdd p L=0.4u W=0.8u mp7 n$217 b n$213 vdd p L=0.4u W=0.8u mp6 n$220 cinvddvdd p L=0.4u W=0.8u mp5 n$220 b vddvdd p L=0.4u W=0.8u mp2 n$213 b vddvdd p L=0.4u W=0.8u mp4 n$220 a vddvdd p L=0.4u W=0.8u mp3cout n$476 vddvdd p L=0.4u W=0.8u mp1 n$213 a vddvdd p L=0.4u W=0.8u mn14cout n$476 gndgnd n L=2u W=5u mn13 sum n$882 gndgnd n L=2u W=5u mp14 sum n$882 vddvdd p L=2u W=5u mn12 n$882 n$476 n$467 gnd n L=0.4u W=0.4u mn11 n$467 cingndgnd n L=0.4u W=0.4u mn10 n$467 a gndgnd n L=0.4u W=0.4u .ends adder * * Component pathname : $HOME/workspace/nand3 * .subcktnand3 dslbsl q out x_nand23 n$3 n$6 out nand2 x_nand22sl q n$6 nand2 x_nand21 d slb n$3 nand2 .ends nand3 * * Component pathname : $HOME/workspace/piso16 * .subcktpiso16 d0 d2 d3 d4 d5 d6 d7 d8 d9 d10 d12 d13 d14 d15 clk d1 out + d11sl x_df10 n$280 n$274 clkdf x_df6 n$270 n$266 clkdf x_df14 n$289 n$286 clkdf x_df5 n$267 n$263 clkdf x_df12 n$253 n$282 clkdf x_df4 n$264 n$13 clkdf x_df15 n$292 n$288 clkdf 33

x_df3 n$261 n$259 clkdf x_df2 n$7 n$257 clkdf x_df9 n$275 n$272 clkdf x_df16 out n$291 clkdf x_nand310 d10 n$293 sl n$280 n$279 nand3 x_inv1sl n$293 inv x_nand315 d15 n$293 sl n$292 n$291 nand3 x_nand314 d14 n$293 sl n$289 n$288 nand3 x_nand313 d13 n$293 sl n$287 n$286 nand3 x_nand311 d11 n$293 sl n$283 n$282 nand3 x_nand312 d12 n$293 sl n$253 n$284 nand3 x_df1 n$256 d0 clkdf x_nand37 d7 n$293 sl n$276 n$277 nand3 x_nand38 d8 n$293 sl n$273 n$272 nand3 x_nand39 d9 n$293 sl n$275 n$274 nand3 x_nand36 d6 n$293 sl n$270 n$269 nand3 x_nand35 d5 n$293 sl n$267 n$266 nand3 x_nand34 d4 n$293 sl n$264 n$263 nand3 x_nand33 d3 n$293 sl n$261 n$13 nand3 x_nand32 d2 n$293 sl n$7 n$259 nand3 x_nand31 d1 n$293 sl n$256 n$257 nand3 x_df11 n$283 n$279 clkdf x_df8 n$273 n$277 clkdf x_df13 n$287 n$284 clkdf x_df7 n$276 n$269 clkdf .ends piso16 * * MAIN CELL: Component pathname : $HOME/workspace/multi16 * x_df22 n$285 n$286 clkdf x_df21 n$281 n$282 clkdf x_df20 n$277 n$278 clkdf x_and15 n14 n$299 n$288 and x_df29 n$309 n$299 clkdf x_adder12 n$282 n$942 n$280 n$944 n$281 adder x_and16 n15 n$309 n$304 and x_adder10 n$266 n$267 n$264 n$1558 n$265 adder x_adder8 n$255 n$2168 n$258 n$2376 n$256 adder x_and7 n6 n$250 n$236 and x_df32 n$939 n$291 clkdf x_df35 n$944 n$936 clkdf 34

x_df45 n$3809 n$3811 clkdf x_df16 n$1965 n$1964 clkdf x_df25 n$297 n$296 clkdf x_and10 n10 n$272 n$264 and x_df15 n$256 n$255 clkdf x_adder11 n$278 n$936 n$276 n$946 n$277 adder x_df44 n$3198 n$213 clkdf x_df7 n$229 n$230 clkdf x_adder7 n$2374 n$2372 n$240 n$239 n$241 adder x_adder6 n$238 n$2580 n$236 n$2587 n$237 adder x_adder5 n$234 n$2579 n$232 n$2791 n$233 adder x_adder4 n$230 n$2789 n$228 n$2995 n$229 adder x_and4 n3 n$223 n$214 and x_df6 n$223 n$222 clkdf x_df5 n$222 n$221 clkdf x_df4 n$221 n$3809 clkdf x_df3 n$215 n$216 clkdf x_df2 n$211 n$212 clkdf x_adder3 n$216 n$2994 n$214 n$3198 n$215 adder x_adder2 n$212 n$213 n$210 n$938 n$211 adder x_df43 n$2995 n$2994 clkdf x_df42 n$2791 n$2789 clkdf x_df41 n$2587 n$2579 clkdf x_df30 n$270 n$251 clkdf x_and8 n8 n$270 n$258 and x_and9 n9 n$271 n$260 and x_df17 n$265 n$266 clkdf x_df18 n$271 n$270 clkdf x_df19 n$272 n$271 clkdf x_piso161 m0 m2 m3 m4 m5 m6 m7 m8 m9 m10 m12 m13 m14 m15 clk m1 + n$3811 m11 sl piso16 x_and3 n2 n$222 n$210 and x_and5 n4 n$248 n$228 and x_df14 n$251 n$250 clkdf x_df13 n$250 n$249 clkdf x_df12 n$249 n$248 clkdf x_df11 n$248 n$223 clkdf x_df10 n$241 n$2374 clkdf x_df9 n$237 n$238 clkdf x_df8 n$233 n$234 clkdf x_df31 n$938 n$937 clkdf x_adder9 n$1964 n$1557 n$260 n$2169 n$1965 adder 35

x_adder13 n$286 n$941 n$284 n$1149 n$285 adder x_df28 n$305 n$306 clkdf x_df40 n$239 n$2580 clkdf x_df39 n$2376 n$2372 clkdf x_df38 n$2169 n$2168 clkdf x_df37 n$1558 n$1557 clkdf x_df36 n$946 n$267 clkdf x_df34 n$1149 n$942 clkdf x_df33 n$940 n$941 clkdf x_adder15 n$306 l n$304 n$939 n$305 adder x_adder1 n$204 n$937 n$207 n$208 n$205 adder x_df1 n$205 n$204 clkdf x_and1 n0 n$3809 n$208 and x_and6 n5 n$249 n$232 and x_and2 n1 n$221 n$207 and x_adder14 n$290 n$291 n$288 n$940 n$289 adder x_and14 n7 n$251 n$240 and x_and13 n13 n$298 n$284 and x_and12 n12 n$297 n$280 and x_and11 n11 n$296 n$276 and x_df27 n$299 n$298 clkdf x_df26 n$298 n$297 clkdf x_df24 n$296 n$272 clkdf x_df23 n$289 n$290 clkdf * .end

36