A Design Tradeoff Study with Monolithic 3D Integration

A Design Tradeoff Study with Monolithic 3D Integration Chang Liu and Sung Kyu Lim Georgia Institute of Techonology Atlanta, Georgia, 30332 Phone: (404...
Author: Daniel Floyd
12 downloads 1 Views 1MB Size
A Design Tradeoff Study with Monolithic 3D Integration Chang Liu and Sung Kyu Lim Georgia Institute of Techonology Atlanta, Georgia, 30332 Phone: (404) 894-0315, Fax: (404) 385-1746

Abstract— This paper studies various design tradeoffs existing in the monolithic 3D integration technology. Different design styles in monolithic 3D ICs are studied, including transistor-level monolithic integration (MITR) and gate-level integration (MI-G). GDSII-level layout of monolithic 3D designs are constructed and analyzed. Compared with its 2D counterparts, MI-TR designs have advantages in footprint area, wire-length, timing, and power, because of the smaller footprint. MI-G design style also demonstrate advantages in area, timing and power over TSV-based designs, because of the smaller size and parasitics of inter-tier vias compared with TSVs. To further take the advantage of monolithic 3D technology, several technology improvement options are also explored. Besides, some possible design challenges with monolithic 3D are also studied, including global variation and signal integrity issues.

M3top Local via

M2top Local via

M1top Local via N+

This material is based upon work supported by the Semiconductor Research Corporation (SRC) under the Integrated Circuit & Systems Sciences (ICSS, Task ID: 2193.001) and the Interconnect Focus Center (IFC, Theme ID: 2050.001) programs.

N+

N+

ILD

Internal via

M1bot

P+

P+

P+

P+

Oxide

I. I NTRODUCTION 3D integration technology is actively being studied as a solution to continue the scaling trajectory predicted by Moore’s Law. Compared with other existing 3D integration technologies (wirebonding, interposer, TSV, etc.), monolithic 3D integration is the only one that enables ultra fine-grained vertical integration of devices and interconnects, thanks to the extremely small size of inter-tier vias (typically 50nm in diameter). Monolithic 3D technology, by its definition, is a 3D integration technology that fabricates two or more tiers of device tiers sequentially, rather than bonding two fabricated dies together using bumps or TSVs. Figure 1 shows a typical monolithic 3D structure. The two device tiers are connected by inter-tier-vias, which are essentially local vias. Metal layers are enabled between two device layers. To fabricate the top device tier, low-thermal-budgeting process must be applied. Currently, several monolithic 3D integration process are developed. CEA/LETI [1][2] developed a sequential integration flow based on low temperature bonding process. Samsung [3] developed a ”S3” technology for 3 tier SRAM cell using low-thermal TFT process. Rajendra [4] developed 3D sequential integration process using wafer bonding or seeded crystallization. Existing works [1][2][4] mainly focus on monolithic 3D process and device-level study. A few circuit/system level studies focus on the memory design using monolithic 3D technology [3][5][6], which belongs to highly regular custom design style. However, following critical questions need to be answered for adoption of monolithic 3D ICs: how to design a large logic system (CPU, DSP, etc.) with monolithic 3D technology; and how much benefit monolithic 3D ICs can provide for a digital system compared with existing technologies. This paper studies the new design opportunities with monolithic 3D in logic circuits, and compared them with the existing 2D and TSV-based 3D circuits. The rest of the paper is organized as follows. Section II analyzes the benefits monolithic 3D will bring compared with TSV-based 3D. Section III demonstrates two design approaches using monolithic 3D. Section IV compares the monolithic 3D designs

N+

Inter-tier via

Silicon substrate

Fig. 1.

Monolithic 3D structure in this study

with the 2D and TSV based 3D designs based on real layout and sign-off analysis. Section V further discusses the results and suggests the options for technology improvement. Section VI analyzes some potential design challenges with monolithic 3D. Then Section VII concludes the paper. II. B ENEFITS OF M ONOLITHIC 3D FOR 3D INTEGRATION Compared with the TSV-based 3D integration, the monolithic 3D integration has the following merits from designers’ perspective. First, since the inter-tier via in monolithic 3D ICs is much smaller than a TSV, fine-grained vertical integration is feasible, which provides more design freedom for the designers and EDA tools. The 3D vertical integration can be categorized into several levels in terms of partitioning granularity. The first one is core level integration. A typical example is core + memory stack [7], which provides very high memory access bandwidth. The second one is block-level integration [8], where functional blocks are partitioned into different tiers based on their logical connections. In block-level integration, the number of vertical connections is usually more than core + memory stacking. The third one is gate-level integration [9], where tiers are partitioned based on each single gate. Since the number of gate is huge in a digital system, the demand for vertical interconnection is very aggressive. The last one is transistor-level integration, which partitions the transistors into different tiers. In terms of vertical connections, transistor-level has a even finer granularity than the gatelevel integration. With current TSV technology, the typical TSV diameter is about 5 µm, which is much larger than a standard cell. If we consider a 2.5 µm keep-out-zone for each TSV to reduce mechanical reliability problem, the actual silicon area occupied by a single TSV is 100 um2 , which is about 5 standard cell rows in 45 nm technology. Figure 2 shows the size comparison between a TSV and a standard cell. We see that the area of TSV is many times bigger than a gate. The huge size gap implies that fine-grained vertical-integration with a lot of vertical

TSV NAND2_4X

1.4um

Fig. 2.

input inter-tier via

1.2

Inter-tier via

Voltage (V)

5um

1.4

50nm Diameter

TSV

1.0

0.8

0.6

0.4

0.2

Size comparison among a TSV, a gate, and an inter-tier via

0.0

connections cannot be achieved by TSVs. In other words, the number of TSVs that can be used in a 3D design is strongly limited by its size. For example, for a chip with 1 mm × 1 mm footprint, if we limit the total TSV areas to 30 % of the total area, the maximum TSV amount we can use is only 3000. Therefore, core-level or block-level 3D partitioning is usually preferable for TSV based 3D integration. Gate level partitioning is acceptable only when the cut size is small. Recently, nano-scale TSVs are actively being studied and developed. The diameter of the future TSV can reach 0.1 µm, which will boost the vertical interlocution density significantly. However, despite the effort in reducing the TSV size, the alignment precision in 3D bonding process becomes a major constraint for further improving the 3D IC design granularity. The current alignment precision is about 1 µm [1], and is very difficult to improve further. In contrast, the size of an inter-tier via is as small as a local via (50 nm in diameter). For the same design with 1 mm × 1 mm footprint, the maximum inter-tier via amount is 30 million, which means almost no limitation on the number of vertical connections between device tiers. And since the devices and the inter-tier vias are fabricated sequentially, the alignment precision is extremely high. Therefore, monolithic 3D technology is very suitable for gate-level, or even transistor-level 3D integration. Second, an inter-tier via has a much better electrical performance than a TSV, in terms of parasitics, mechanical stress, electrical coupling, etc., due to its small size. Consider the TSV in Figure 2 with 5 µm diameter, 0.1 µm thick liner and 50 µm height. The parasitic capacitance from the TSV to the substrate is about 80 f F [10], which is roughly equal to the capacitance of a 200 µm long wire. In contrast, the parasitic capacitance of an inter-tier via is less than 1 f F , which is negligible. Figure 3 shows the timing comparison between two timing paths, where a TSV and a inter-tier via is driven by an 4X inverter separately. We see that the delay on the TSV is about 0.73 ns, which is much bigger than the delay on the inter-tier via (0.04 ns). Therefore with the same timing performance, the TSV-based design needs more efforts on buffering and gate sizing, which will in turn increase the power consumption. Third, in some design styles in monolithic 3D ICs, existing 2D tools can handle the 3D design well without the need of using 3D specific tools. This feature will be discussed in Section III.

0.0

1.0n

2.0n

3.0n

4.0n

5.0n

6.0n

time (s)

Fig. 3.

Transient simulation of an INV driving a TSV and a inter-tier via

In transistor-level 3D ICs, basic units for tier partitioning are transistors. In standard cell based ASIC flow, the most intuitive way of transistor partitioning is to split the NMOS and PMOS in each standard cell into two device tiers as shown in Figure 4. The merits of this design style are two-fold. First, very few interconnect layers, usually one, in the bottom device tier are needed, because they are only used for local interconnection inside each cell. Second, existing 2D physical design tools can be used for place and route, since the two device tiers in each cell are strictly aligned. However, to perform monolithic 3D designs in transistor-level, we need to design monolithic 3D standard cells first. Figure 6 shows standard cell designs with monolithic 3D for INV, NAND2, NOR2 and DFF. The design only uses local interconnects at the bottom PMOS tier. The area reduction compared with their 2D counterpart is about 30 %. The area reduction does not reach 50 % because of the following two reasons. First the inter-tier vias occupy some area, which does not happen in the 2D standard cell. Second, the PMOS occupies more area than NMOS because of its larger size due to the worse mobility. Fortunately, the area skew problem is alleviated by using new technologies, such as 32nm and 22nm, thanks to the strained silicon for PMOS [11]. It is reported that using strained silicon, the mobility of PMOS is significantly improved and is comparable with the NMOS. Therefore, the area skew between PMOS and NMOS can be eliminated. With the balanced PMOS and NMOS, the area of MI-TR gates can be further reduced. Therefore monolithic 3D ICs shows more advantages when going to the advanced technology below 32nm. Since we 45 nm technology library in this work, we still consider the area skew between PMOS and NMOS. Using these redesigned standard cells and their physical library, we can use existing physical design tools to construct the full-chip layout, which is a significant benefit over TSV-based 3D ICs considering the fact that 3D specific EDA tools have not been fully developed yet. B. Gate Level Design

III. D ESIGNS S TYLES IN M ONOLITHIC 3D This section demonstrates two design styles in monolithic 3D ICs. Pros and cons in each design style are analyzed as well. A. Transistor Level Design Since monolithic 3D technology is suitable for fine-grained vertical integration, we focus on gate-level and transistor-level designs for monolithic 3D technology.

The second design style is gate-level monolithic 3D ICs. In this design style, each gate is placed either on the top device tier or on the bottom device tier, as shown in Figure 5. Each device tier has several metal layers for interconnect and inter-tier vias are used to connect the two tiers. The major merit of this design style is that we can use existing 2D standard cells. Also, we can control the inter-tier via count by properly partition the design. However, gatelevel monolithic 3D ICs tend to use more metal layers because the

INV

NOR

INV

NMOS tier PMOS tier

NAND2

Fig. 4.

NOR2

Illustration of transistor-level monolithic 3D design NAND

INV

top tier

ILD

INV inter-tier via

INV

NAND

DFF bottom tier

Fig. 5.

Fig. 6. Monolithic standard-cell design. NMOS tier(left or top) uses 1 metal layer. PMOS tier (right or bottom) uses only local interconnect Illustration of gate-level monolithic 3D design Partitioning

bottom tier also needs enough metal layers for cell interconnection. Moreover, traditional 2D design tools are not applicable here. Instead, 3D placer is required which is not matured in industry yet. IV. D ESIGN AND A NALYSIS A. Design Technology The monolithic 3D technology used in this study is similar to CEA/LETI process [1]. The 3D structure is shown in Figure 1, where inter-tier vias and internal vias [2] are used to connect gate/source/drain/metal of upper-device-tier to the lower-tier metals. [2] by CEA/LETI claimed that 2D regular performance can be achieved with their monolithic 3D process for a single transistor. Therefore, since the major purpose of this study is to figure out the possible design options, we assume that the transistor performance does not change much in monolithic 3D ICs, hence the 2D device model is still applicable. In this study, we compare four design styles, which are 2D, transistor-level monolithic 3D ICs (MI-TR), gate-level monolithic 3D ICs (MI-G) and TSV-based 3D ICs. The TSV-based 3D structure is shown in Figure 8. We use via-first TSV with 2.5 µm in diameter and 30 µm in height. Each TSV is connected to the metal wires though M1 and Mtop landing pad. B. Design and Analysis Flows We use two different design and analysis flows for the four design styles. The 2D and MI-TR designs can be fully handled by existing 2D commercial tools. Therefore we use Cadence Encounter to place and route these designs and obtain the layout. Then, we also use Encounter to perform timing optimization and power analysis. Finally we use Synopsys Primetime to perform timing analysis. The MI-G and TSV-based 3D design styles require 3D layout construction and analysis tools. There are no existing commercial tools available for the 3D design. Therefore, we use a partition based 3D placer in [9] for cell placement. The idea of of this placer is to perform Z-direction cut in addition to XY-direction cut to assign cells into different tiers. Then, we can use Encounter to route each die separately. After the layout construction, we use a timing-scaling method to perform 3D timing optimization. Then we use Primetime to perform timing and power analysis. The 3D analysis is based on

3D placement 2D placement Route each die separately Next die?

Route only on the top tier Y Analysis & Optimization

N Analysis & Optimization (a)

(b)

Fig. 7. Design flow comparison. (a) gate-level monolithic 3D and TSV based 3D, (b) transistor-level monolithic 3D.

stitching of the RC parasitic files (.SPEF), the netlist files (.v), and insertion of the TSV/inter-tier-via information. Figure 7 summarizes the two design flows. We see that MI-TR design flow is much simpler than MI-G design flow in terms of number of design steps. Since the design tools used in these two design flows are very different, the comparisons among designs with different design flows are not fair. For example, the placement quality of Cadence Encounter is better than the partition-based 3D placer. Our experiments show that for the same 2D design, Encounter is about 10 % better than the 3D placer in terms of wirelength. Since the goal of this study is not to compare the 3D placer with the commercial tool, we only compare the designs with the same design flow (2D vs. MI-TR and TSV-3D vs. MI-G) for fair comparisons. C. Testbench Circuits We choose three circuits of different gate count as our benchmark circuits. They are FIR filter, FFT processor, and JPEG decoder which contains 130K, 591K, and 1.17 million gates, respectively. We implement four designs styles for each circuit, which are 2D, TSV-based 3D, MI-TR, and MI-G. The physical design library we use is based on Nangate 45 nm PDK. We use the same size for both local via and inter-tier via. We also design our own monolithic 3D standard cells for MI-TR, as shown in Figure 6.

TR-level MI (overall)

gate-level MI (overall)

TSV-based 3D (overall)

NMOS-tier (zoom in)

top-tier (zoom in)

top-tier (zoom in)

PMOS-tier (zoom-in)

bottom-tier (zoom in)

bottom-tier (zoom in)

Fig. 9. Layouts of different types of designs for the FIR filter. The yellow dots shown in the monolithic designs are inter-tier vias, and the blue squares shown in TSV designs are TSV M1 landing pads.

M1 Landing Pad

30um

TSV 2.5um

Mtop Landing Pad

3um (a)

landing pad (M1)

metal layers back

face

Die0

landing pad (Mtop)

Die1

device

(b) Fig. 8.

(a) TSV size (b) TSV based 3D structure

The layout of the three 3D designs for the FIR filter are shown in Figure 9. The cell placement density of all the designs is 60 %. Table I and II list the area and the vertical connection (TSV or inter-tier via) counts in each design. We see that the MI-TR has the most finegrained vertical connection, therefore it shows the biggest footprint area among the 3D designs. Compared with 2D designs, the MI-TR circuits has a smaller footprint because of the smaller standard cell footprint. In the TSV-based 3D designs, TSVs occupy 2% to 8% area due to the large size of the TSV. In contrast, the percentage of intertier via area in the MI-G designs is almost 0. This is why TSV-based 3D design usually occupies a larger area than the MI-G design.

D. Analysis and Comparison The metrics used in this study include area, total wirelength, timing, and power consumption. In terms of timing, we compare both longest path delay and total negative slack. All the metrics reported are simulation results after timing optimization. We first compare 2D designs with MI-TR designs. Table I shows the analysis results. In terms of total wirelength, we see that all the MI-TR designs shows a shorter wirelength than the 2D designs by 9 % to 17 %. This is expected because a reduced footprint naturally leads to a reduced wirelength. We also see from the FIR circuit that the MI-TR design tends to use more metal layers. This is because the routing space for MI-TR is smaller than 2D designs, therefore the routability for MI-TR is worse than 2D ICs. We also observe that all the MI-TR designs achieve better longest path delay than 2D designs by approximately 3 % to 8 % , because of the reduced wirelength. We see that the longest path delay reduction is not as significant as the wirelength reduction. This is because the path delay is also strongly affected by the device, which we assume the same in the two design styles. As for the total negative slack, we see that the improvement of MI-TR is from 7 % to 35 %, which is very significant. Compared with longest path delay, the TNS is a metric that evaluates many timing paths together rather than only one path. To obtain a more comprehensive understanding on how MITR improves the timing compared with 2D ICs, we draw the path

FIR 2D

140

100

# timing path

# timing path

120

80 60 40 20 0 -1.5

-1.0

-0.5

0.0

timing slack (ns)

80 60

V. D ISCUSSIONS

40 20

A. The Impact of Chip Area

-2.0

-1.5

-1.0

-0.5

0.0

timing slack (ns)

(a) 300

FFT 2D

300

FFT MI-TR

250

# timing path

250

# timing path

100

0

-2.0

200

150

100

50

0 -1.0

-0.8

-0.6

-0.4

-0.2

timing slack (ns)

200

150

100

50

0

0.0

-0.8

-0.6

-0.4

-0.2

0.0

timing slack (ns)

(b) 1600

JPEG 2D

1400

JPEG MI-TR

1400

1200

# timing path

# timing path

inter-tier vias, the timing optimizer does not need to insert many buffers in MI-G designs as in the TSV-based 3D circuit. Therefore, power can be saved through the reduced number of buffers in the MI-G case.

FIR MI-TR

140

120

1000 800 600 400 200

1200 1000 800 600 400 200 0

0 -2.0

-1.5

-1.0

-0.5

0.0

-2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0

timing slack (ns)

timing slack (ns)

(c)

Fig. 10. Negative timing slack distribution of 2D and MI-TR designs for (a) FIR filter (b) FFT processor (c) JPEG decoder

delay distribution for the three designs as shown in Figure 10. From the distribution, we clearly see that MI-TR has a better potential for timing improvement, because it has much fewer paths that violate the timing constraints. We also identify that the power consumption of MI-TR designs are better than that of 2D designs for FIR, FFT and JPEG circuits by 1 % to 7 %. The power consumed by the wire and devices both reduces. The reduction of wire power is because of the reduced wirelength. The device power reduction is because of less buffers added due to the shorter wirelength. The reduction in total power is not as significant as the wirelength, because the power consumption is strongly affected by the devices as well, whose power reduction is not as significant as the wire. Experimental results show that the wire power consumes 15 %, 28 % and 40 % of the total power in FIR, FFT, and JPEG respectively. This explains why the JPEG circuit achieves the biggest power reduction, because it has the biggest wire power portion due to the large circuit size. Based on this result, we predict that larger MI-TR designs have more potential in power saving because of the bigger wire power portion. Now, we compare the MI-G designs with the TSV-based 3D designs. Table II shows the analysis results. We observe that compared with TSV-based 3D designs, the MI-G designs have smaller area, shorter wirelength, better timing, and smaller power consumption. The reduction in area is due to the smaller size of inter-tier via than the TSV. The improved timing for MI-G is from both the reduced wirelength and the smaller parasitics of the inter-tier vias than the TSVs as analyzed in section II. Due to the small parasitics of the

The analysis in section II explains well why MI-G shows a better performance over TSV-based 3D ICs. On the other hand, the benefits of MI-TR are actually coming from the smaller chip area compared with 2D ICs as analyzed in section IV. It is MI-TR designs’ smaller footprint area that results in a reduced wirelength. Also, a better timing and power can be achieved due to the reduced wirelength. To better understand how MI-TR outperforms the 2D designs in terms of footprint area, we study the impact of using different chip areas in MI-TR designs. We take the FIR filter as an example. To manipulate the chip area, we change the placement density of the FIR filter from 50 % to 85 %. To ensure routability with high placement density, we allow the router to use up to 10 metal layers. We record the wirelength, timing, and power change with the area as shown in Figure 11. From Figure 11(a), we clearly see that a smaller chip area is beneficial for wirelength reduction. Of course, as we push the placement density to the upper limit, the routability becomes worse, and the wirelength also begins to increase. This is because as the routing becomes difficult, the routing quality may also degrade as a result. The timing results of LPD and TNS both show that the MI-TR designs result in a better timing than the 2D ones. The general trend also reveals that the timing improves as the area reduces. Same trend is valid for the power consumption. The timing and power improvement trend with area reduction is not as strong as the wirelength, because the timing and power are also strongly affected by the devices. B. Exploration on Technology Improvement Options As discussed above, the smaller area is the key factor that helps reduce the wirelength in MI-TR designs. However, in larger designs such as FFT and JPEG, we cannot push the placement density to a high level as in the 2D designs. This is because in MI-TR designs, each standard cell footprint is shrinked by 30 % percent, resulting in 30 % area reduction. However, the interconnect does not scale simultaneously with the standard cell, which results in the difficulty in the MI-TR design routing. For example, using a force directed placer without routability awareness, we obtain more and more DRC errors as we keep reducing the area. Figure 12 shows the DRC errors of JPEG circuit with different placement densities. We see that for the larger circuit such as JPEG, the design is severely wire constrained rather than device constrained. Therefore, the interconnect should be improved by either adding more metal layers or reducing the metal width and pitch. We first examine the impact of adding more metal layers. Assume that the default number of metal layer is 10. We increase the available metal layers in the physical design library from 10 to 14. The routing results in Table III shows that the DRC errors reduce as the the number of metal layer increases, which is expected. However, we see that the DRC error count is still huge even if we use 14 metal layers. This is because adding more top metal layers is not very efficient in solving the local interconnection problem. On the other hand, more metal layers will significantly increase the fabrication cost. Therefore, we conclude that adding more metal layers is not an efficient solution to the routing problem in MI-TR designs.

TABLE I D ESIGN AND ANALYSIS SUMMARY OF 2D

VS

MI-TR

footprint (µm2 )

total silicon inter-tier via % area by total metal WL LPD TNS total power wire power device power area (µm2 ) count inter-tier via layers (µm) (ns) (ns) (mW ) (mW ) (mW ) FIR filter (130K gates) MI-TR 365×361 263,530 550K 4% 7 5.65×105 4.40 792.8 58.6 9.5 49.1 2D 449×445 199,805 0 6 6.65×105 4.75 843.5 59.4 10.0 49.4 FFT processor (591K gates) MI-TR 874×874 1,527,752 3.9M 5% 10 9.71×106 3.16 154.3 172.4 49.6 122.8 2D 1126×1126 1,267,876 0 10 11.6×106 3.39 236.9 175.5 51.5 124.0 JPEG decoder (1.17M gates) MI-TR 1081×1081 2,337,122 7M 6% 10 1.33×107 5.79 355.6 309.9 119.9 190.0 2D 1319×1312 1,730,528 0 10 1.41×107 5.98 500.2 330.2 133.8 196.4 TABLE II D ESIGN AND ANALYSIS SUMMARY OF MI-G footprint (µm2 ) MI-G 334×334 TSV-3D 342×342 MI-G 775×775 TSV-3D 778×778 MI-G 930×930 TSV-3D 933×933

AND

TSV- BASED 3D IC S

inter-tier via/ % area by TSV/ total metal WL LPD TNS total power wire power device power TSV count inter-tier vias layers (µm) (ns) (ns) (mW ) (mW ) (mW ) FIR filter (130K gates) 373 almost 0 12 6.72×105 4.96 330 64.4 10.3 54.1 373 8% 12 6.74×105 5.58 883 69.2 10.6 58.6 FFT processor (591K gates) 470 almost 0 20 1.21×107 3.96 403 173.6 51.7 121.9 470 2% 20 1.23×107 4.23 582 185.1 53.5 131.6 JPEG decoder (1.17M gates) 780 almost 0 20 1.46×107 6.11 174 349.7 134.1 215.6 780 2% 20 1.46×107 6.20 362 355.1 136.3 218.8

Fig. 11.

impact of chip area on (a) wirelength (b) longest path delay (c) total negative slack (d) power

We then reduce the metal pitch and width to see the impact. The original and new metal pitches are shown in Table IV. With the JPEG design above, we see from Table V that the DRC errors drop significantly under each placement densities with the same number

of metal layers. Therefore, reducing metal pitch and width is a more efficient process option to solve the routability problem than adding more metal layers. We further explore the impact of smaller metal width/pitch on the

TABLE IV D EFAULT AND REDUCED METAL WIDTH / PITCH USED IN

105

default default width (um) pitch (um) M1 M3 0.065 0.19 M4 M6 0.14 0.285 M7 M8 0.4 0.885 M9 M10 0.8 1.71

# DRC errors

104 103 102

TABLE V

101 100

DRC

65%

70%

75%

85%

Placement density

Fig. 12.

THE EXPERIMENT

reduced reduced width (um) pitch (um) 0.045 0.125 0.095 0.185 0.265 0.535 0.535 1.065

ERROR COUNTS BASED ON DIFFERENT PLACEMENT DENSITIES USING SMALLER METAL PITCH / WIDTH

default pitch/width reduced pitch/width

65% 70% 75% 85% 22 67 1384 193519 0 0 5 832

# DRC errors with different placement densities

TABLE III I MPACT OF ADDING MORE METAL LAYERS ON ROUTING FOR JPEG BENCHMARK , MI-TR DESIGN STYLE # Metal layers # DRC errors wirelength (um) 10 193519 1.469×107 11 160116 1.465×107 12 139039 1.454×107 13 114008 1.452×107 14 101137 1.451×107

Vth variation on the longest path delay by performing deterministic simulations. Figure 13 shows the impact of Vth variation on the longest path delay for the FIR filter design. Based on these timing libraries, we perform Monte Carlo timing simulations considering a Poisson distributed 30 mV global variation on Vth for the NMOS tier, as shown in Figure 14. We see that the global Vth variation causes more than 10 % variation on the longest path delay, which can significantly affect the yield. If we consider local variation as well, the problem could be more severe. B. Signal Integrity Issues in Monolithic 3D ICs

circuit performance. We re-characterize the interconnect models for the reduced metal pitch/width, finish physical design, and perform timing analysis. Table VI lists the design and analysis results on the JPEG circuit. We observe that after reducing the metal width/pitch, the wirelength further decreases. This is because with smaller metal pitch, more routing tracks are available on the same footprint. Therefore the router has more freedom to perform a better routing. We see that the timing also improves because of the reduced wirelength and reduced wire parasitics. VI. D ESIGN C HALLENGES WITH M ONOLITHIC 3D IC S In the above study, we showed the benefits of monolithic 3D ICs in terms of area, wirelength, timing, and power consumption compared with 2D and TSV-based 3D ICs. However, there are some unique design challenges associated with monolithic 3D technology. One challenge is about routability, which we discussed in Section V. Moreover, with the denser wires in MI-TR designs, the coupling caused signal-integrity (SI) problem may become severe. If we reduce the metal width and pitch as suggested in section V, the SI may become even worse. Besides, since the two tiers of device are fabricated sequentially, there is a high chance for a global variation between two tiers, which affects the timing and yield. This section discusses these two potential challenges in monolithic 3D designs. A. Inter-tier Global Variation Study Compared with 2D process, one of the unique characteristic in the MI-TR circuit is that the PMOS and NMOS are fabricated sequentially. This unique process will introduce global variation between the PMOS and NMOS tiers, which is known as global Pto-N skew. In this section, we analyze how the global P-to-N skew affects the performance of the circuit. To analyze the impact of global P-to-N skew, we generate different timing libraries for each standard cell considering the top NMOS Vth variations. We first examine the the impact of the global NMOS

As discussed in section V, the smaller routing space in MI-TR designs may cause routability problems. The routing congestion also results in a potential signal integrity problem. The coupling caused delay degradation will harm the timing performance of MITR circuits. This section analyzes the signal integrity problems in MI-TR designs. We perform timing degradation analysis on the MI-TR FFT design and compare it with the 2D design. Figure 15 shows the delay degradation distribution comparison. We observe that due to routing congestion induced wire coupling, the MI-TR design shows more timing degradation compared with the 2D counterpart. Therefore, in the MI-TR designs, we should pay more effort on SI issues than in the 2D designs. Also, SI-aware routing and timing optimization should be adopted to the MI-TR design flow. VII. C ONCLUSIONS Monolithic 3D technology provides new opportunities for further reducing wirelength and improving chip performance. We propose two physical design methodologies, namely gate-level monolithic 3D ICs (MI-G) and transistor-level monolithic 3D ICs (MI-TR). Experimental results show that MI-TR design style shows advantages in area, wirelength, timing, and power compared with 2D ICs, because of the smaller footprint. In addition, thanks to the small size and parasitics of inter-tier vias, MI-G designs also demonstrate advantages in area, timing, and power compared with the TSV-based 3D counterparts. Since the smaller footprint of MI-TR is the key reason why MI-TR is superior to 2D desings, we analyze the impact of footprint on the circuit performance. To overcome the routing problem due to the smaller footprint, we suggest improving the process technology by reducing the metal width and pitch rather than adding more metal layers. Finally, the challenges in monolithic 3D designs are discussed. The simulation and analysis results show that designers should pay attention to inter-tier global variation and signal integrity issues when designing monolithic 3D ICs.

Fig. 14. Monte Carlo timing analysis considering the inter-tier global Vth variation.

Fig. 13. The impact of Vth variation on the longest path delay for FFT design before timing optimization.

R EFERENCES 10000

transistor-level monolithic 3D 2D

# vcitim nets

1000

100

10

50-100 100-150 150-200 200-250 250-300

>300

Coupling-caused delay degradation (ps)

Fig. 15. Coupling-caused delay degradation analysis on transistor-level monolithic 3D and 2D designs TABLE VI I MPACT OF WIRE WIDTH / PITCH ON DESIGN

QUALITIES

footprint # metal wirelength LPD (um2 ) layer (um) (ns) 1319×1312 10 1.41×107 5.98

2D (default pitch/width) MI-TR (default 1081×1081 pitch/width) MI-TR (reduced 1081×1081 pitch/width)

10

1.33×107

5.79

10

1.19×107

5.70

[1] P. Batude et al., “Advances in 3D CMOS Sequential Integration,” in Proc. IEEE Int. Electron Devices Meeting, 2009. [2] O. Thomas et al., “Compact 6T SRAM cell with robust Read/Write stabilizing design in 45nm Monolithic 3D IC technology,” in Proc. IEEE Int. Conf. on Integrated Circuit Design and Technology, 2009. [3] S.-M. Jung et al., “A 500-MHz DDR High-Performance 72-Mb 3-D SRAM Fabricated With Laser-Induced Epitaxial c-Si Growth Technology for a Stand-Alone and Embedded Memory Application,” in IEEE Trans. on Electron Devices, 2010. [4] B. Rajendran, “Sequential 3D IC Fabrication: Challenges and Prospects,” in IEEE Trans. on Electron Devices, 2010. [5] P. Batude et al., “3D CMOS Integration: Introduction of Dynamic coupling and Application to Compact and Robust 4T SRAM,” in Proc. IEEE Int. Conf. on Integrated Circuit Design and Technology, 2008. [6] L. Chang et al., “Stable SRAM Cell Design for the 32nm Node and Beyond,” in Symposium on VLSI Technology, 2005. [7] M. B. Healy et al., “ Design and Analysis of 3D-MAPS: A Many-Core 3D Processor with Stacked Memory,” in Proc. IEEE Custom Integrated Circuits Conf., 2010. [8] D. H. Kim, R. Topaloglu, and S. K. Lim, “ Block-level 3D IC Design with Through-Silicon-Via Planning,” in Proc. Asia and South Pacific Design Automation Conf., 2012. [9] D. H. Kim, K. Athikulwongse, and S. K. Lim, “A Study of ThroughSilicon-Via Impact on the 3D Stacked IC Layout,” in Proc. IEEE Int. Conf. on Computer-Aided Design, Nov. 2009, pp. 674–680. [10] G. V. der Plas et al., “Design Issues and Considerations for Low-Cost 3-D TSV IC Technology,” in IEEE Journal of Solid-State Circuits, Jan. 2011, pp. 293–307. [11] S.-Y. Wu et al., “A 32nm CMOS Low Power SoC Platform Technology for Foundry Applications with Functional High Density SRAM,” in Proc. IEEE Int. Electron Devices Meeting, 2007.