Physical Design for 3D System on Package

3D Integration Physical Design for 3D System on Package Sung Kyu Lim Georgia Institute of Technology Editor’s note: Systems on package constitute a ...
8 downloads 0 Views 399KB Size
3D Integration

Physical Design for 3D System on Package Sung Kyu Lim Georgia Institute of Technology

Editor’s note: Systems on package constitute a specific class of 3D designs wherein multiple manufactured die are stacked atop one another in a package that embeds both active and passive components. This article considers the problem of physical design for such an environment. —Sachin Sapatnekar, University of Minnesota

THE SOC PARADIGM is a system integration approach that integrates large numbers of transistors as well as various mixed-signal active and passive components onto a single chip. However, the systems community is beginning to realize that this paradigm has fundamental engineering and investment limits.1 This realization led to the 3D system-in-package (SiP) approach, alternatively called 3D ICs or 3D stacked die/package. This approach lets designers stack multiple ICs or multiple-package stacked ICs at a far lower cost and in less space. The vertical die-to-die via pitch in a 3D stacked die is very small, so designers can arrange digital functional modules across multiple die at a fine level of granularity. This results in shorter wires, which translates into less wire delay and less power consumption. The SiP provides major opportunities in both miniaturization and integration for advanced and portable electronic products, but as a subsystem it is still limited by the CMOS process, just as the SoC is. Designers can take SiP a step further by embedding both active and passive components, but passive-component embedding is bulky and requires thick-film discrete components. T h i c k - filmcomponent embedding distinguishes SiP from system on package (SoP),1 an emerging 3D system integration concept that involves embedding both active and passive components. SoP, however, incorporates ultrathin films at microscale to embed the passive components, and the package rather than the board is the system. Figure 1 illustrates the different technologies.

532

0740-7475/05/$20.00 © 2005 IEEE

SoP can address the shortcomings of both SoC and SiP, as well as those of traditional packaging, which is bulky, costly, and lower in performance and reliability. With SoP, improvement comes in two ways: First, SoP uses CMOSbased silicon for its best purpose—that is, for transistor integration; second, SoP uses the package for its best purpose—RF, optical, and digital component integration using IC-package-system codesign. SoP overcomes both the computing and integration limitations of SoC, SiP, multichip modules (MCMs), and traditional system packaging by having global wiring as well as RF, digital, and optical component integration in the package instead of on the chip. Moreover, 3D SoP addresses the wire delay problem by enabling the replacement of long, slow global interconnects with short, fast vertical routes.

Enabling technologies Several fundamental enabling technologies support 3D SoP integration. ■



Electrical interconnect. With the availability of highdensity substrates with 3.5-micron line widths, SoP provides a unique opportunity for offloading global wiring to the package for enhanced performance. Key recent developments in next-generation buildup microvias for the SoP substrate include the integration of ultra-low-loss and high-k dielectrics, conductor geometries with submicron precision, and low-cost processes for multilayer stacked via interconnects.2 Chip-to-package interface. The current approach of lead-free solders with underfill presents major challenges in both dispensing the underfill and guaranteeing fatigue resistance as solder bumps shrink in

Copublished by the IEEE CS and the IEEE CASS

IEEE Design & Test of Computers

height. Recent SoP research advances concerning the chip-to-package interface include the extension of solder bumps to stretched-solder columns and improvements in underfill technology.3 ■ High-quality embedded passives. Designers use multilayer ceramic and multilayer organic structures with liquid crystal polymer technology to embed passives efficiently, including high-Q inductors, capacitors, matching networks, low-pass and band-pass filters, baluns, combiners, and antennas.4 The 3D design approach using multilayer topologies leads to highquality, compact components to support multiple bands and standards and wider bandwidth in a compact form factor and at low cost. ■ Analog/RF components. The recent development of thin-film RF materials Figure 1. Comparison between a SoC, or complete system on one chip (a); and processes lets designers bring the a multichip module (MCM), which interconnects components (b); a system SoP concept into the RF world to in package (SiP),with stacked chips or packages (c); and a system on meet the stringent needs of wireless package (SoP), with optimization between chip and package (d). communication.4 Researchers are addressing critical issues such as board-compatible embedded antennas and switch- nal SoCs6 and 3D stacked ICs,7 CAD research for 3D SoPs es, low-loss and low-cost boards, low-crosstalk has a short history. Early efforts in pioneering physical embedded transmission lines, and single-mode design research for SoPs8 by no means cover existing packages, as well as design rules for vertically inte- and emerging 3D SoP physical CAD issues to their fullest grated transceivers over a wide frequency range. extent. This article presents three physical design algo■ Optical interconnect. A high-speed optical clock and rithms for fast design of reliable 3D SoPs. data transport simplify the digital architecture by requiring fewer parallel transmission lines. Moreover, 3D circuits versus 3D packages optical links have low crosstalk and are not suscepIn the popular 3D stacked-IC form of 3D integration, tible to electromagnetic interference (EMI) noise. the mostly digital subsystem components form a stack Researchers have developed a low-temperature poly- of multiple die, as Figure 2 shows. In such an IC, it’s posmer process for fabricating and integrating opto- sible to fabricate transistors atop other transistors, resultelectronic components such as a microlens array, ing in multiple layers of active components. These lasers, waveguides, splitters, couplers, gratings, and transistors can then be wired to other transistors on the photodetectors on printed wiring boards for mixed- same device layer, to transistors on different device laysignal SoP applications.5 ers, or both, depending on the process technology. The several approaches to fabricating 3D ICs or 3D-compatAlthough 3D SoP manufacturing technology contin- ible transistors vary in terms of the maximum number ues to advance, research into how to actually apply the of device layers and the maximum density of intercontechnology lags. The complexity of designing large- nects between these layers. One leading approach is the scale 3D SoPs with various objectives and under multi- wafer bonding method,9 which glues discrete wafers ple constraints has made CAD tools indispensable. In together using a copper interconnect interface. This contrast to the active CAD research effort in mixed-sig- low-cost method lets designers implement 3D interNovember–December 2005

533

3D Integration

Figure 2. Examples of 3D integration: 3D SoP (a) and expansion of the 3D stacked die with both face-to-face and face-to-back bonding (b).

Table 1. Comparison of 3D ICs with 3D SoPs. Characteristic

3D ICs

3D SoPs

Enabling technology

Die stacking, bonding interface,

Flip-chip mount, thin-film embedded passives, optical

die-to-die vias

module/waveguide

Components

Gates, functional modules

Digital/analog die, embedded passives, optical modules, filters,

Interconnect

Metal wires and buffers

Metal wires, optical waveguides

Component scale

Nanometer to micron in size,

Micron to millimeter in size, counts from hundreds to thousands

antennas

counts from millions to billions Signal type

Digital

Digital, analog, optical

Physical design flow

Hierarchical: partitioning,

Nonhierarchical: module placement, module-level routing, interconnect

floorplanning, placement,

optimization

global routing, detailed routing, and interconnect optimization Physical design challenges

Area, power, and performance

Area, power, and performance optimization; thermal distribution;

optimization; thermal

mixed-signal substrate coupling and EMI; power supply noise;

distribution; process variations;

chip-package codesign; optical routing

signal integrity; power/clock distribution

connects for many wafers, overcoming the limitations of other proposed methods. Table 1 compares 3D ICs and 3D SoPs in terms of their basic enabling technologies and physical design challenges. The basic processes behind the physical design of 3D ICs and 3D SoPs are similar: placement of multiple components into multiple device layers, and routing using multiple groups of multiple metal layers and various types of vias. However, component sizes in 3D ICs are in the nano- to microscale, and the total

534

device and interconnect count is in the millions or billions. Therefore, it takes a hierarchical design methodology with design reuse to handle the complexity. On the other hand, 3D SoPs contain only hundreds to thousands of devices and interconnects, and these are in the micro- to milliscale. Therefore, a nonhierarchical design approach is sufficient for most cases. However, the SoP signal type is a mixture of digital, analog, and optical signals. Signal and power integrity challenges increase exponentially in mixed-signal systems that integrate RF IEEE Design & Test of Computers

Figure 3. Relieving thermal hot spots: thermal vias (a), 3D SoP with heat sink (b), and 3D mesh model (c).

front ends with optical signaling and digital baseband processing. Because designers use a highly complex and time-consuming mixed-signal noise-analysis tool to validate 3D SoP designs during the physical design process, design time and effort are at least as great as for 3D ICs.

Physical design algorithms for 3D SoPs Our research team, which includes graduate students at the Georgia Tech Computer-Aided Design Laboratory, devised algorithms to address thermal via management, decoupling capacitor management, and interlayer via management.

3D SoP module placement with thermal vias A 3D SoP structure’s increased module density exacerbates the thermal hot-spot problem: A larger module packed into a smaller footprint produces a higher maximum temperature. A popular choice for mitigating thermal issues is thermal vias that serve as thermal paths from a 3D SoP substrate’s core to the heat sink, 10 as Figure 3 shows. A recent study reports a reduction of up to 50% in the maximum temperature using copper-based thermal vias.10 Because the impact of thermal vias on the thermal problem is potentially significant, addressing thermal via planning at earlier stages in the physical design process—say, during module placement—allows greater flexibility in thermal management. We use a 3D thermal resistance mesh, shown in Figure 3b, for thermal analysis. Each node models a small volume of the 3D SoP substrate, and each edge denotes the connectivity between two adjacent regions. This is equivalent to using a discrete approximation of the steady-state November–December 2005

thermal equation − k— 2T = P, where k is thermal conductivity, T is temperature, and P is power. This results in matrix equation Rp = t, where R is a thermal resistance matrix, p is a power vector, and t is a temperature vector. The goal during thermal via planning is to determine thermal via density for each node in the 3D thermal mesh and thereby obtain a desired thermal distribution. This assumes that each thermal via goes all the way through the 3D SoP’s substrate, connecting to its top and bottom heat sinks. Thus, a single thermal via insertion at a certain mesh node lowers the thermal resistance values for all related vertical edges in the 3D thermal mesh. For a given 3D placement, our algorithm first performs thermal analysis and increases the thermal via densities of the hot spots. A second thermal analysis then validates the temperature drop at the target hot spots and identifies the next targets. The iteration between thermal via insertion and thermal analysis repeats until all hot spots are removed. At this point, it’s desirable to have an even distribution of the thermal via density because it helps reduce routing congestion later on. Therefore, a refin ement step spreads out the thermal vias to provide a more uniform thermal via density. Refinement operates according to the principles of diffusion, moving some vias from high-density mesh nodes to low-density nodes. Thermal via sites with higher density require more thermal vias and therefore more layout area. Because the modules become obstacles for the through-the-substrate thermal vias, we use the existing white space in each device layer to accommodate thermal vias, or we add more white space by expanding the placement in both the x and y directions, as necessary. (Here, we use white space for thermal via insertion. In the next section, we

535

3D Integration

Figure 4. 3D mesh modeling of a power supply network (a); 3D placement expansion for decoupling capacitor insertion, where the darker modules denote the neighboring modules of the decoupling capacitor (shown as white space) inserted (b). Modules from other layers can access this decoupling capacitor.

use white space for decap insertion.) Upon determination of a 3D placement, the described thermal via planning works as a postprocess. A more sophisticated approach is thermal-via-aware placement, whereby the algorithm determines module position and thermal via density simultaneously. This approach potentially has a greater impact on the overall placement quality because the module location directly affects the thermal profile. The basic idea is to use the sequence-pair method 11 to encode 3D module placement solutions and explore the solution space through a simulated annealing scheme. For a given candidate 3D placement solution, the algorithm performs thermal analysis and thermal via insertion to compute the area overhead and temperature drop. Because this evaluation is time-consuming, we use an incremental thermal analysis with a quick estimation of area increase for the greatest part of the annealing process. During the later stage of annealing, we perform the full-length thermal via insertion for a more accurate solution evaluation. A related experiment first reveals the trade-off between thermal cost and area cost. We varied the maximum temperature threshold and monitored the area overhead using our postplacement thermal via insertion scheme. As the temperature requirement (lower maximum temperature constraint) becomes more exacting, the number of thermal vias used increases along with their area overhead. Compared with the postplacement thermal via insertion method under the same thermal constraint, our thermal-via-aware placement scheme achieves lower area overhead at the cost of increased runtime.

536

3D SoP module placement with decoupling capacitors A major problem with 3D SoP integration is the power supply noise coupling between the various mixed-signal modules constituting the system. The highspeed digital processor is the primary generator of the noise, which is coupled through the power distribution network. With today’s high-performance packaging design, the wide use of on-package decoupling capacitors (decaps) mitigates the power supply noise problem. By charging during the steady state, decaps can assume the power supply role and provide the current needed during the simultaneous switching of multiple functional modules. Once the circuit module locations are fixed, constraints such as voltage drop and current density are so tight that no feasible power network design can keep power supply noise within a specified margin. Hence, it’s important to consider power supply planning during the early design stage, when designers can flexibly change the circuit module locations. Figure 4a shows our 3D mesh-based power/ground network model. The edges in the mesh have inductive and resistive impedances. The mesh contains power supply points and current-consuming points. These consumers draw the current from all sources, and the amount of current drawn along a path is inversely proportional to the impedance of the path in the power supply mesh. The dominant path for a module is the path from the nearest current source to the module causing the greatest drop in voltage. Then, the power supply noise for a given module is the summation of IR IEEE Design & Test of Computers

drop and L di/dt change (drop or increase) along the module’s dominant path p, where L is on-chip inductance. There might be several edges in p that are shared with the dominant paths for other modules. In this case, the sum of IR drop and L di /dt change on these shared edges caused by the related modules serves as the noise computation of the individual modules. Finally, the algorithm computes the decap budget for each module according to its current demand and the noise level. A straightforward 3D extension of the existing power supply noise-aware 2D placers would not take full advantage of the 3D technology’s potential. For example, using only the decaps adjacent to the modules, as Zhao et al. suggest,12 would limit interlayer access. Although neighboring decaps provide most of the current, a module can still draw current from non-neighboring decaps. We overcome this limitation on intralayer, neighboring decap access by formulating effective decap distance, whereby a decap’s effectiveness depends on the distance to the module that accesses it. Second, we devise an effective way to detect the existing white space for decap implementation. In case the existing white space is not enough, we perform footprint-aware decap insertion to minimize the overall area increase and allow functional modules to access decaps in other layers, as Figure 4b illustrates. Finally, we use a generalized network-flow-based approach for the decap-to-module assignment, whereby a flow approximation method accelerates the decap allocation step for faster annealing. The decap insertion discussed so far applies to a given 3D placement as a postprocess. We can also perform decap-aware 3D placement by incorporating decap management into the optimization engine. In this case, a quick estimation of area overhead due to decap is the key to evaluating each candidate 3D solution. Moreover, we can combine the thermal via planning discussed earlier and decap planning under the same framework, thereby achieving simultaneous thermal, power supply noise, and area optimization. In this case, we either perform thermal via and decap insertion sequentially as a postprocess or consider them during 3D placement optimization. In the latter case, our thermal and noise analyzer provides a quick estimate of area overhead for a given candidate solution. Our related experiment first reveals the trade-off between power supply noise and area cost. We varied the maximum power supply noise threshold and monitored the area overhead using our postplacement decap insertion scheme. As the noise requirement November–December 2005

becomes stricter (a lower noise threshold), the number of decaps used increases, along with their area overhead. Compared with the postplacement decap insertion method under the same power supply noise constraint, our decap-aware placement achieves lower area overhead at the cost of increased runtime. Finally, we observe from our simultaneous thermal/decapaware placement results a strong correlation between thermal and power supply noise objectives. These conflicting objectives compete for the existing and added white space for thermal via and decap insertion.

3D SoP routing with through vias Figure 5 shows the layer structure in a multilayer SoP. The placement layers contain the modules. A routing interval contains a stack of routing layers sandwiched between pin distribution layers. The pin distribution layers in each routing interval serve to evenly distribute pins from the nets assigned to this interval. A through via connects two pin distribution layers from different routing intervals. Nets that have all their terminals in the same placement layer are called i-nets, whereas those having terminals in different placement layers are x-nets. As Figure 5 shows, our 3D global router has the following steps: ■









Pin distribution: We first determine which set of i-net and x-net segments to assign to each routing interval. The algorithm evenly distributes the pins from these nets in the top and bottom pin distribution layers. Topology generation: The algorithm generates Steiner trees for all nets in each routing interval, to optimize the routed design’s performance. Layer assignment: The algorithm assigns the routed nets to a unique routing pair in the routing layer, to minimize the total number of layers used. Channel assignment: The algorithm determines the location of the through via for each x-net in the routing channel. We also assign channels and finish the connections for the i-nets to be routed in each placement layer. Local routing: The algorithm finishes connections between the pins now in the routing channels and the pins on the module boundaries. Compared with existing MCM routing algorithms, our SoP routing requires more-sophisticated pin distribution and channel assignment steps to handle the complexity.

The first step, pin distribution, further divides into three steps: coarse pin distribution, net distribution, and

537

3D Integration

Figure 5. SoP layer structure with through vias (a), pin distribution (b), net distribution (c), layer assignment (d), channel assignment (e), and local routing (f).

detailed pin distribution. In coarse pin distribution, we construct a coarse 2D grid and evenly distribute the pins from all nets in all routing intervals in this single grid. We use a move-based partitioning approach to minimize the overall wire length and congestion. In net distribution, we assign the routing interval for each i-net to either above or below the placement layer it belongs to. We use max-cut partitioning to separate nets with high crosstalk into different routing intervals. In detailed pin distribution, we refine our pin distribution results for each routing interval on the basis of the net distribution result. In addition, the algorithm assigns each pin to a unique node in a finer grid in pin distribution layers. Our force-directed algorithm encourages all pins from the same net to be placed closer together while minimizing the distance between the old and new pin locations. During the SoP channel assignment step, the algorithm maps each through via to a routing channel in the placement layer. In addition, some pins in the pin distribution layer having connections to the module boundary are also mapped to a routing channel. The objective of SoP channel assignment is to minimize the number of pin distribution layers and the wire length. The pitch and signal delay of through vias are larger than those of other types of vias. Because each channel has a capacity constraint, it’s important to assign the through vias to the nearest channels first. Therefore, we

538

perform this channel assignment first to minimize the delay of x-nets that require through vias. In addition, we give priority to pins included in long nets. Our networkflow-based bipartite matching algorithm seeks the channel with the lowest mapping cost for a given pin on the basis of layer usage and wire length. Our experiment first reveals the effect of the three SoP pin distribution steps. Our baseline algorithm skips coarse pin distribution, performs random net distribution, and uses our detailed distribution. Compared with this baseline, our three-step pin distribution algorithms achieve a s i g n i ficant improvement in the estimation of crosstalk, wire length, and congestion at the cost of additional runtime. Our through-via-aware SoP channel assignment algorithm accommodated all through vias with little area overhead resulting from routing-channel expansion. In addition, we observe the trade-off between the total layer count and wire length during SoP channel assignment. This trade-off occurs because some routing detour is necessary to avoid congested spots, and this decreases the total layer count. However, too great an increase in wire length adversely affects total layer usage as well.

FUTURE RESEARCH should focus on integrating optical

interconnects in SoP routing. Optomodule costs dictate an overall routing strategy that includes both electrical IEEE Design & Test of Computers

and optical routing integrated into a SoP. Efficient construction of ultrafast but currently expensive waveguides, placement of related optical modules, and selection of optical nets are important problems to be solved. Researchers must develop design tools for power distribution networks for digital and mixed-signal systems, with emphasis on power supply noise simulation, decoupling, and emerging technologies. The power distribution network for 3D SoP consists of 3D interconnects in the chip and package, which together provide the required target impedance over a range of frequencies. The SoP trend requires viewing the power distribution network in the chip and package as a single network, warranting a chip-package codesign methodology for minimizing resonance in the system. A cost/performance trade-off analysis for SoC versus SoP solutions for mixed-signal applications is crucial. Researchers should study design methodologies that quantitatively predict the performance and cost benefits of SoPs over SoCs. The performance analysis must evaluate the performance and power gain achievable through mixed-signal-component integration in SoPs versus SoCs. In addition, for SoP and SoC comparisons, we must address the effect of various mixed-signal isolation techniques between sensitive analog/RF circuits and noisy digital circuits. Moreover, the cost analysis should include new factors such as extra chip area and additional process steps for noise-free mixed-signal isolation and seamless integration of IP modules. Finally, we need a complete, systematic way to analyze tradeoffs for on- versus off-chip passive components. ■

References 1. R.R. Tummala, “SoP: What Is It and Why? A New Microsystem-Integration Technology Paradigm—Moore’s Law for System Integration of Miniaturized Convergent Systems of the Next Decade,” IEEE Trans. Advanced Packaging, vol. 27, no. 2, May 2004, pp. 241-249. 2. V. Sundaram et al., “Next-Generation Microvia and Global Wiring Technologies for SoP,” IEEE Trans. Advanced Packaging, vol. 27, no. 2, May 2004, pp. 315-325. 3. A. Tay et al., “Next Generation of 100-Micron Pitch Wafer-Level Packaging and Assembly for Systems-on-

on Organic Boards or Packages,” IEEE Trans. Advanced Packaging, vol. 27, no. 2, May 2004, pp. 386-397. 6. K. Kundert et al., “Design of Mixed-Signal Systems-on-aChip,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 19, no. 12, Dec. 2000, pp. 1561-1571. 7. S. Das, A. Chandrakasan, and R. Reif, “Design Tools for 3-D Integrated Circuits,” Proc. Asia and South Pacific Design Automation Conf. (ASP-DAC 03), IEEE Press, 2003, pp. 53-56. 8. J. Minz et al., “Placement and Routing for 3D Systemon-Package Designs,” to be published in IEEE Trans. Components and Packaging Technologies, 2005. 9. A. Fan, A. Rahman, and R. Reif, “Copper Wafer Bonding,” Electrochemical and Solid-State Letters, vol. 2, no. 10, Oct. 1999, pp. 534-536. 10. T. Chiang, K. Banerjee, and K. Saraswat, “Effect of Via Separation and Low-k Dielectric Materials on the Thermal Characteristics of Cu Interconnects,” IEEE Int’l Elec tron Devices Meeting, IEDM Technical Digest, IEEE Press, 2000, pp. 261-264. 11. H. Murata et al., “Rectangle Packing Based Module Placement,” Proc. IEEE Int’l Conf. Computer-Aided Design (ICCAD 95), IEEE CS Press, 1995, pp. 472-479. 12. S. Zhao, C. Koh, and K. Roy, “Decoupling Capacitance Allocation and Its Application to Power Supply Noise Aware Floorplanning,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 21, no. 1, Jan. 2002, pp. 81-92.

Sung Kyu Lim is an assistant professor in the School of Electrical and Computer Engineering at the Georgia Institute of Technology. His research interests include physical design automation for 3D ICs and 3D SoPs, microarchitectural physical planning, field-programmable analog arrays, and quantum cell automata. Lim has a BS, an MS, and a PhD, all in computer science, from the University of California at Los Angeles. He is a member of the IEEE Circuits and Systems Society and the ACM Special Interest Group on Design Automation.

Package,” IEEE Trans. Advanced Packaging, vol. 27, no. 2, May 2004, pp. 413-425. 4. R.R. Tummala et al., “The SoP for Miniaturized, MixedSignal Computing, Communication, and Consumer Systems of the Next Decade,” IEEE Trans. Advanced Packaging, vol. 27, no. 2, May 2004, pp. 250-267. 5. G.-K. Chang et al., “Chip-to-Chip Optoelectronics SoP

November–December 2005

Direct questions and comments about this article to Sung Kyu Lim, School of Electrical and Computer Engineering, Georgia Institute of Technology, 777 Atlantic Dr. NW, Atlanta, GA 30332-0250; limsk@ ece.gatech.edu.

539