Impact of Spatial Intrachip Gate Length Variability on the Performance of High-Speed Digital Circuits

544 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 5, MAY 2002 Impact of Spatial Intrachip Gate Length ...
0 downloads 0 Views 364KB Size
544

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 5, MAY 2002

Impact of Spatial Intrachip Gate Length Variability on the Performance of High-Speed Digital Circuits Michael Orshansky, Student Member, IEEE, Linda Milor, Member, IEEE, Pinhong Chen, Student Member, IEEE, Kurt Keutzer, Fellow, IEEE, and Chenming Hu, Fellow, IEEE

Abstract—In this paper we address both empirically and theoretically the impact of an advanced manufacturing phenomenon on the performance of high-speed digital circuits. Using data collected from an actual state-of-the-art fabrication facility, we conducted a comprehensive characterization of an advanced 0.18- m CMOS process. The measured data revealed a significant systematic, rather than random spatial intrachip variability of MOS gate length, leading to large circuit path delay variation. The delay of the critical path of a combinational logic block varies by as much as 17%, and the global skew is increased by 8%. Thus, a significant timing error and performance loss takes place if variability is not properly addressed. We derive a model, which allows estimating performance degradation for the given circuit and process parameters. We demonstrate explicitly that intrachip gate variation has a significant detrimental impact on the overall circuit performance, shifting the entire distribution of clock frequencies toward slower values. This is in striking contrast to the impact of interchip gate variation, traditionally considered in statistical circuit analysis, which leads to the variation of chip clock frequencies around the average value. Moreover, analysis shows that the spatial, rather than proximity-dependent systematic gate variability, is the main cause of large circuit speed degradation. The degradation is worse for the circuits with a larger number of critical paths and shorter average logic depth. We propose a location-dependent timing analysis methodology that allows mitigation of the detrimental effects of gate variability and have developed a tool linking the layout-dependent spatial information to circuit analysis. We discuss the details of practical implementation of the methodology, and provide guidelines for managing design complexity. Index Terms—Circuit modeling, integrated circuit design, integrated circuits manufacture, statistics, variational methods.

I. INTRODUCTION

T

HE INCREASING complexity of semiconductor processes makes the interaction between manufacturing and design more severe. This requires new models and methods to quantify this interaction, and new computer-aided design (CAD) tools capable of dealing with the new processing conditions [1]–[3]. In CMOS digital technologies, the single most important processing parameter affecting circuit performance ) of the MOS transistor. Control and is the gate length ( Manuscript received February 21, 2001; revised September 12, 2001 and January 15, 2002. This work was supported by fellowships from Semiconductor Research Corporation and Advanced Micro Devices, Inc. This paper was recommended by Associate Editor M. Pedram. M. Orshansky, P. Chen, K. Keutzer, and C. Hu are with the Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94720 USA (e-mail: [email protected]). L. Milor is with the Department of Electrical Engineering and Computer Sciences, Georgia Institute of Technology, Atlanta, GA 31405 USA. Publisher Item Identifier S 0278-0070(02)02849-X.

accurate modeling of is thus of utmost importance for accurate modeling and design of ICs. Deep submicron technologies, however, exhibit a new variability pattern, which is not addressed by the previously developed models and methods, i.e., the systematic spatial variability. As a result, the printed transistors intrachip map, making their characterisdisplay a distinct spatial tics dependent on the location within the chip. This variation is mainly caused by the stepper-induced illumination and imaging nonuniformity due to lens aberrations, which are worst near the optical resolution limit [4], [5]. Because the continuing scaling of semiconductor processes, following Moore’s Law, forces us to operate closer to the optical resolution limit of stepper variability will only increase. systems, the intrachip In this work we collected data from a state-of-the-art fabrication facility to study the complex interactions of design and manufacturing. We found a significant spatial variation of the circuit timing properties that lead to degradation of the overall circuit speed, if not properly addressed at the design stage. We provide a novel analytical framework that allows estimation of performance degradation for the given circuit and process parameters. The systematic nature of intrachip variability makes previously used approaches to statistical circuit analysis, such as worst case analysis, insufficient and inaccurate. Instead, we propose a new approach that makes the device characteristics dependent on their location within the chip. By accurately predicting the spatial dependence of circuit characteristics, the detrimental effect of intrachip variability can be substantially reduced. The paper is organized as follows. In Sections II and III, we describe the experimental procedure for accurate characterizavariability and circuit performance tion of systematic simulation results. In Section IV we present a set of analytvariation on cirical models for evaluating the impact of cuit performance. Section V discusses the methodology for location-dependent timing analysis. Section VI discusses the relation between location-dependent timing and traditional worst case analysis. II. EXPERIMENTAL CHARACTERIZATION OF SYSTEMATIC INTRACHIP VARIABILITY We performed a comprehensive, silicon-based experimental characterization of an advanced production 0.18- m logic process technology with the goal of capturing all the relevant variability patterns. One of the most important aspects of the characterization was to address the possible interaction

0278-0070/02$17.00 © 2002 IEEE

ORSHANSKY et al.: IMPACT OF SPATIAL INTRACHIP GATE LENGTH VARIABILITY

Fig. 1. Spatial profiles depend on proximity effects. We categorize all gates , where according to their local layout patterns. Vertical gates are labeled is the distance to the nearest neighbor on the left and is the distance to the nearest neighbor on the right. Three categories of distance are used: dense, denso, and isolated. Dense refers to the situation where the distance to the nearest gate is equal to the minimum design rule. Isolated spacing corresponds to the situation where no other polysilicon line is within a specified distance. Denso is the intermediate category. Similarly, horizontal gates are labeled , where is the distance to the nearest neighbor above the gate and is the distance to the nearest neighbor below the gate.

X

HXY

Y

X

545

V XY

Y

between the global lens aberration and the local layout pattern-dependent nonuniformities due to the optical proximity effect. Toward this end, we classified all the gates into 18 categories depending on their orientation in the layout (vertical or horizontal) and the spacing to the nearest neighboring gate (Fig. 1). To capture a particular lens aberration, the coma effect, we also distinguished the relative position of the surrounding gates, i.e., the neighbor being on the left versus the neighbor being on the right. profile, a test-chip In order to characterize the spatial was used which contained a 5 5 grid of test modules located across the area of the reticle field. Each module contained long and narrow polysilicon resistors, with a variety of distances to adjacent polysilicon lines. The polysilicon resistors were manufactured with the same process steps as polysilicon gates, including poly CVD, resist coating, exposure, development, and gate definition by plasma etching. One possible source of variability is the variation in the sheet resistance across the reticle field that would confound the measurement results. In order to eliminate this component, each module contained a test structure to calibrate the sheet resistance. Another possible source of variation is the silicide resistance, which is known to cause a large standard deviation in the sheet resistance of thin lines. In order to avoid this source of variation, the polysilicon resistors were not silicided. Finally, a third source of variation would be due to variability in the widths of lines on the test chip mask. This component of variation could not be eliminated and is confounded with our measurement results. However, we believe that this component is small. Data was collected by measuring the resistance of the polysilvalues were then computed. The data set came icon lines. from measuring 18 test modules coming from three distinct wafers, thus allowing tests for statistical significance. Because the observed spatial and proximity-dependent variability is determined by the combined effect of the stepper and the lens, the same stepper-lens system was used for all the measurements.

Fig. 2.

Lgate map for category V53, the most frequent category in the design.

The spatial maps were measured separately for each surgate category (Fig. 2). The range of variation of the is 8–12% depending on the category, face with a mean of 10.2%. Statistical -tests verified that the genvariation over the chip are statiserated spatial maps of tically significant, i.e., that the level of systematic variation is large in comparison with the random noise. It was also found maps for different categories have quite distinct that the spatial behaviors, due to the interaction between global lens aberrations and the pattern-dependent optical proximity effect. Therefore, any accurate approach to timing analysis must con, not limiting itself to the assider variation of the mean variation. Also, at least for sumption of purely random some gate categories, the distinct spatial maps have to be used in the course of timing analysis.

III. SIMULATION OF THE IMPACT OF SYSTEMATIC INTRACHIP VARIABILITY ON CIRCUIT PERFORMANCE The presence of systematic spatial variation significantly impacts the timing and, even functional, properties of integrated circuits. In this section we describe a tool capable of incorporating the spatial information into the verification flow. variWe use it to study the impact of systematic intrachip ation on design and discuss the implementation details. information into timing Incorporation of systematic verification requires making the device properties dependent on the device’s location within the chip. For this we developed SpaceTimer, a tool with the following functionality (Fig. 3). A netlist is first extracted from the original circuit layout. The layout and the netlist are then passed to SpaceTimer that classifies each gate as belonging to a particular category and determines the spatial location of the particular gate within the layout (chip). Using this information together with the set of maps produced at the stage of characterization, SpaceTimer generates a modified netlist in which each gate has a proper lovalue and simulates the critical paths cation-dependent using a circuit simulator. The simulator can be either dynamic (SPICE) or a static timing simulator.

546

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 5, MAY 2002

Fig. 3. SpaceTimer employs the original layout, netlist, and maps to produce a location-dependent timing report.

Lgate spatial

The most direct impact of spatial variation is the resulting variation of CMOS gate delay. In fact, because of the relationship ( ), nonlinear delay versus variation. We evalvariation of circuit speed is larger than uate speed variation by analyzing a 151-stage NAND ring oscillator (RO), often used as a predictor of chip performance. To achieve the highest accuracy, a SPICE simulator is used in this case to generate a spatial RO frequency map [6]. Results showed that the ring oscillator frequency map is consistent with the spavariation (Fig. 2); the frequency is highest tial patterns of is minimal. A comparin the center of the chip, where ison with the measurements was made for four ring oscillators available for test within each chip, and a good agreement was observed confirming the accuracy of the simulation result. The range of variation in RO speed across the chip is 14.5% (Fig. 4). Such a large variation in device performance also strongly affects the timing behavior of critical paths in the design. We simulated the timing behavior of a benchmark combinational circuit from ISCAS’85 [7], containing 1764 CMOS devices, using a static timing simulator PathMill from Synopsis [8]. Analysis was done for nine spatial locations on the reticle field in a uniform 3 3 grid. For the chip 4 located in the lower right quadrant, the delay of the same path placed at different corners (fast ps to and slow) of the chip varied from ps, a 16% difference. The variance of the path delay dis(Fig. 5). Thus, tribution is also different: circuit paths with identical designed-for delays will, in reality, have considerably different delay distributions, depending on the physical location of the path within the chip. As a result, the overall critical path delay distribution is broadened around the designed-for delay, with some slower and some faster paths. (The consequences of this effect will be discussed later in the paper.) Importantly, the order of critical paths also changes depending on location of the combinational block within the chip. Let us consider the extreme case and compare the top 20 critical paths of the above benchmark ISCAS circuit associated with the spatial points giving the fastest and the slowest path . The delays: the locations with the smallest and largest top 20 critical paths in the fast corner of the chip belong to the and are denoted , . Similarly, the top set 20 critical paths in the slow corner of the chip belong to the set

and are denoted , . The comparison shows can also be found in that only six of the paths found in . In particular, the paths , , become the paths , , . This regrouping significantly complicates the use of predesigned and precharacterized circuit blocks physically localized within the chip, such as hard intellectual property (IP) blocks, since their precharacterized behavior will not . adequately correspond to the location-dependent variation also affects the The systematic across-chip global circuit properties, such as clock skew in clock distribution networks containing buffers for driving and restoring the signal. Control of clock skew is critical, since in determining a conservative clock cycle time, a percentage delay due to clock skew is additive to the setup times and hold times of the circuitry. We considered clock skew of a global clock network, distributed using the popular -tree scheme. The basic intent of such a clock network is to equalize the arrival times of all of the clock signals to the output loads; thus, skew then is defined as the maximum difference between any of the clock arrival times. be the delay of the clock from the central buffer to the Let , output node , and define the skew to be , where and are the minimum and maximum delay values for the 16 output nodes. A simulation sets ps and ps. Using these values we find that the maximum systematic skew, for the chip in the upper ps. This left quadrant of the field, is is 8% of the total clock cycle. The minimum clock skew on this chip is found to be 47 ps, which is close to 5% of the clock cycle (Fig. 6). In general, the amount of clock skew introduced by systemvariation will depend on the clock tree design and atic the size of the chip. Clearly, for the popular -tree network, the clock skew increases as a function of the size of the chip. IV. ANALYTICAL MODEL OF DEGRADATION DUE TO

THE

CIRCUIT SPEED VARIATION

We now develop a theoretical framework that allows explicit studying of the impact of intrachip gate length variation on complex VLSI circuits in mass production. We show that variation has a significant detrimental effect on intrachip the overall circuit performance, shifting the entire distribution of clock frequencies toward slower values. In contrast, interchip variation, traditionally considered in statistical circuit analysis, leads to variation of chip clock frequencies around the average value. A. Path Delay Variation Due to Intrachip

Variation

We start the analysis by introducing a statistical model of invariation that decomposes the overall variation trachip into three distinct components: the proximity-dependent, the spatial, and the random residual (1) is the overall mean. The proximity-dependent In this model, is modeled by a discrete random variable. Its disterm tribution is determined by the frequency of each gate category in the layout and is found on a circuit-by-circuit basis through

ORSHANSKY et al.: IMPACT OF SPATIAL INTRACHIP GATE LENGTH VARIABILITY

Fig. 4.

547

As a result of spatial Lgate variability, the circuit timing properties significantly vary across the chip. (Chip 4 is shown.)

behavior of its mean at a particular position within the chip. For that reason we can approximate by a random distribution. by a normal distribution We model because empirical analysis shows that the assumption of normality is justified. The random residual component is also mod. eled by a normal distribution The clock frequency, or alternatively, the clock cycle, at which a circuit can be operated, is determined by the delay of the slowest path in the circuit (2) variation on clock Thus, in order to assess the impact of to the frequency, we must ultimately link the variation of variation of path delays. We start the analysis by noting that path delay is a sum of the delays of the individual gates. Delay of an individual CMOS gate can be calculated using the standard compact gate delay model [9] (3) Fig. 5. Distribution of critical paths of a combinational logic cell. The distribution of critical paths of a combinational logic cell changes significantly if placed at different locations within chip 4.

and are the drain currents of NMOS and PMOS where is supply voltage, , and is load catransistors, pacitance. For deep submicron MOS devices, the saturation current may be described by the universal empirical equation [9] (4) We can simplify the analysis by assuming that the parasitic , so that junction capacitance is small and . Combining the above equations, the delay of . Alternatively, we can an individual gate is given by write

H

Fig. 6. Global skew map for the -tree clock (ps) for chip 4. The chip size is 5 5 mm. Maximum skew is 74 ps. This is 8% of the clock cycle.

2

empirical analysis of the layout. The second term, , corresponds to the spatial variation component. In this section, our ultimate goal is to describe path delay variation, so we are privariance, rather than marily interested in the analysis of

(5) where is a lumped process-specific constant. Note that because we are considering the delay of a gate, which is a part of the gate chain, (3) represents the delay of one gate (driver) stands for driving another gate (load). The load capacitance the input capacitance of the load gate. Then, (5) can also be

548

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 5, MAY 2002

written as: . We can extend this analysis to a path consisting of gates. Denoting the delay of the th gate stage as , the delay of the entire path is given by

TABLE I DISTRIBUTION OF Lgate CATEGORIES IN THE LAYOUT

(6) The next step in the analysis is to find the variance of path delay. Using (6), the path delay variance is as follows: (7) We can find the variance of path delay using the statistical -method. The method is based upon a series expansion of a function around the nominal point. We use a first-order expansion of the delay function distinguishing the responses of variation of the driver gate (gate ) and the load delay to ). Denoting the nominal delay by and the gate (gate by , we can write nominal (8) From (5), the derivative is given by (9) ) It is useful to distinguish the contributions of pull-up ( ) networks within a logical gate stage. and pull-down ( Then, (9) becomes

(10) is still referring to the gate length delta of In this equation, the equivalent transistor network. Ultimately, we need to relate of individual polysilicon lines and the delay derivation to to the statistical model given by (1). However, since it is easier to carry out the delay analysis in terms of equivalent transistor network, rather than individual transistors, we need to formulate the statistical model of equivalent gate length from the statistical transistor model of (1). In order to properly model the proximity-dependent compovariability, we construct the critical path in such a nent of way that it represents the actual distribution of gate categories in the layout, as found by its empirical analysis. First, the critical path has to contain noninverter logic gates, in order to represent the transistors belonging to different gate categories. The delay analysis of such logic gates is similar to that for the basic inverter with the exception that we describe the parallel or series . Then, delay of a transistor connection by the equivalent complex gate can also be accurately described by (3). Second, in order to recreate the gate category distribution found in the actual layout (Table I), we need to come up with a corresponding distribution of logic gates in the critical path. One feature of our critical path model is that the gates comprising both the pull-up and pull-down networks in a CMOS design belong to the same gate categories. (For example, within a NAND2 gate, the pull-up

and pull-down networks are laid out as a V53–V35 transistor pair.) This is in fact consistent with many industrial layouts. The average fan-in of the gates in the critical path is 2, which is also typical for standard CMOS designs [10]. Table II describes the distribution of logic gates in the critical path. , we assume With respect to the spatial component of that the gate chain is spatially confined to a relatively small region of the chip that is significantly smaller than the range of variation. That means that for all the gates within spatial . This is a reasonable the critical path: assumption, since it is good design practice to break up long paths. Finally, the random residual component is spatially un, i.e., the values of this term for correlated: any two polysilicon lines within layout are uncorrelated. Thus, when using the equivalent CMOS gate to describe the pull-up and pull-down networks (as discussed above), we can model the contribution of by the averaged residual term, dependent on the fan-in of the gate. (We again use the observation that the average fan-in across multiple layouts is close to 2.) To simplify the algebra, we will use to represent the averaged , the distribution of random residual; given that is . as Let us rewrite (11) Then, substituting (11) into (10), the delay of gate stage can be represented as

(12) We now use the assumptions about the statistical properties of variation components that we established the various above to simplify (12). Because of the assumed spatial confinement of the gates in the path, . And, because of the assumed gate category composition of the transistor networks, . Following (6), we now sum up the individual delay terms to get the total path delay

(13)

ORSHANSKY et al.: IMPACT OF SPATIAL INTRACHIP GATE LENGTH VARIABILITY

549

TABLE II DISTRIBUTION OF LOGIC GATES IN THE CRITICAL PATH. THE RESULTING Lgate DISTRIBUTION IS IN ACCORDANCE WITH TABLE I

We can now find the variance of the full path delay, which is found by summing up the variances of the terms in (13). The variance of the first constant term is zero. The variance of the other terms is found using the standard statistical equations is a random variable, then (e.g., if is a constant term, and )

For large , and using sion for path delay variance to

, we can simplify the expresFig. 7. Intrafield Lgate variation leads to broadening of the distribution of path delays around the designed-critical path delay value .

Do

(14) Equation (14) can also be modified to describe the situation be the in which the path is only partially localized. Let number of spatial partitions of all the gates in the path, and be the number of gates in the th partition, so that let . Starting the analysis on (12), it is then straightforward to show that the path delay variance contributed by variation is given by the spatial component of . Then, (14) can be rewritten as

superimposed on the “ideal” timing simulation conditions. The variation can be approximated by a impact of interchip shift of every path delay toward either slower or faster speeds so that the entire path delay distribution shifts. In contrast, as we saw in Section IV-A, intrachip (spatial and proximity-devariability leads to path delay variation around pendent) . This happens because, due to their spatial location and composition, some paths will become slower while others become faster (Fig. 7). denote Now, let be the set of all circuit paths and let in the manufactured chip . Then, the delay of the path the clock period for chip is (15)

In the next section and beyond, we assume, however, that the combinational logic gate path is spatially localized and use (14) to predict the amount of circuit speed degradation due to the variance of path delay. B. Clock-Cycle Degradation Due to Path Delay Variation Let us consider a high-speed digital chip being manufactured in volume production. Complex high-performance silicon chips are designed in such a way that there are a large number of paths with delays close to the maximum designed-for delay . At the very least, this is the intention of a circuit designer when he performs timing analysis and adjustment of his design. Thus, when no variability is present, the distribution of circuit path delays (Fig. 7). Let us now analyze have a sharp peak bounded by variability is the space of path delays when a significant

This equation becomes key to understanding the difference in impact of intra- and interchip variability on clock period. As we variwill demonstrate by the analysis below, interchip ation leads to the variation of chip clock frequencies around . In stark contrast, because the clock pethe average value riod is always defined by the maximum chip delay, intrachip variation and its resultant path delay variance, force the . Thus, the average maximum path delay to be greater than clock period is uniformly increased, degrading the overall circuit speed (Fig. 8). We now derive a set of analytical models that allow predicting variability. the clock period degradation due to intrachip and let Let be the number of paths with delay close to be the maximum delay for chip , if there were no intrachip vari. The path delays are random variables, ation, i.e., if

550

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 5, MAY 2002

Fig. 8. In contrast to between-field Lgate, variability intrafield variation component degrades average delay, shifting the whole distribution. TABLE III

and for tractability of analysis, we approximate their distribution as multivariate normal with the diagonal covariance matrix (The Monte Carlo analysis, described below, confirms the validity of this approximation.) (16) analytically, we estimate its expected Instead of finding normally value as the expected value of the maximum of distributed random path delays. The number of trials required, to happen is on average, for an event of probability . Theorem 1: For a deviation factor , let . The expected clock period for chip is

In other words, intrachip path delay variation causes the clock for chip to deviate (on average) by from period the chip’s designed-for maximum. We can find the expected , of the clock period across the multiple chips value, variability which can be if we take into account interchip . modeled by a normal distribution Theorem 2: For chips, let (17) In other words, across all the chips, intrachip path delay varito deviate (on average) by ation causes the clock period from the designed-for critical path delay . For example, , , and . Table III for gives the values of for several different values of , and other values can be found from the table of the normal distribution.

Theorems 1 and 2 clearly show that the interchip variation component leads to variation of chip’s critical path delay around the designed-for critical path delay while the presence of intravariation degrades the average circuit delay. chip Let us now compare the impact of both inter- and intrachip variation components on the clock period. Theorem 3: The overall deviation of the actual critical path from the designed-for value is delay (18) due to interchip The first term is the variance of variation and by analogy with (14) can be shown to be . The second term in (18) is the shift of the and is given by Theorem 2. Then average

(19) This expression allows estimating the relative magnitude of the degradation of the average circuit speed compared to the vari, so ation around the average value. For example, if , the squared deviation of the average speed from that is 1.7 greater than the random varithe designed-for speed . Clearly, the effect intrachip variability ation around on circuit performance is very significant. A Monte Carlo simulation was performed for model verificavectors following tion by generating a number of random and . We calculated the the specified distribution of path delay for each of the vectors and compared the resulting

ORSHANSKY et al.: IMPACT OF SPATIAL INTRACHIP GATE LENGTH VARIABILITY

Fig. 9. Circuit speed is reduced by up to 21% compared to designed-for speed. Degradation is worse for more complex circuits (larger ) and smaller average path length (smaller ).

m

N

delay variances with the model predictions. The results show that very good accuracy is achieved between the model-based predictions of path delay variance [see (14)] and the variance given by the Monte Carlo simulation. The average error of pre[see (17)] also appears diction is just 1.7%. The model of to be very accurate; the average error of prediction is only 1.2%. variation on circuit perWe evaluated the impact of formance using the measured characteristics of the production 0.18 m CMOS process (Section II) for different values of the model variables. To study the potential gains of the location-dependent timing analysis methodology, we considered the reduc(Fig. 9). Both the Monte Carlo simulation and the tion of model predict an up to 20% degradation of the average circuit variation. Speed degradation speed as a result of intrachip is worse for more complex chips, since they contain more critical paths (larger ) and for shorter paths (smaller ). We also studied the relative sensitivity of speed degradation to the two variability, the proximity-depencontributors to intrachip ) and the spatially-dependent ( ) components. dent ( The partial sensitivities were characterized by assessing the re, and a corresponding reduction in , as a reduction in and . We found sult of a change in the variance of that spatial intrachip variation has a much stronger effect on degradation of circuit speed than proximity-dependent variation. This is because the averaging of of the gate stages within the path reduces the delay variation due to the proximity effect. The result implies that from the perspective of improving circuit speed, much more attention should be paid to improving the spatial intrachip uniformity rather than reducing variability. the proximity-dependent

V. PRACTICAL IMPLEMENTATION OF LOCATION-DEPENDENT TIMING ANALYSIS The analysis of the previous section showed that systematic variability has a large detrimental effect on intrachip

551

the overall (average) circuit speed. These negative effects may be reduced by location-dependent circuit analysis which takes the systematic variation into account. While this may bring much benefit, the practical implementation of location-dependent circuit analysis faces several difficulties. One difficulty is that the systematic, and thus correctable, variation is specific to a unique pattern of spatial combination of the stepper and the lens. Therefore, a correction approach would have to couple the modification of a layout to a specific system. The other important complication is that the proper unit of analysis of the systematic and repeatable spatial profile is the reticle field of a photolithographic stepper machine. National Technology Roadmap for Semiconductors 1997 projects that for microprocessor products, the number of chips per reticle field will be 2–4 [11]. Hence, for chips per reticle field, we have to keep track of distinct designs and optimize them individually, since each chip corresponding to a unique position within the reticle field will have different critical paths and timing properties. Consequently, the highest performance possible is only achievable when chips in each position are optimized individually. But this is expensive and can be justified only for high-end designs. An alternative approach is to give up optimality in exchange for simplicity of working with a single design. This may be achieved by using a location-dependent timing analysis based map. If is the number of chips per reton the combined is the map for chip , the combined icle and is

(20) This guarantees that the timing analysis based on is properly conservative, e.g., that the predicted clock period . (Note that such “collapsing” is still more accurate than “standard” timing analysis, which does not consider location-dependent circuit timing properties. Indeed, unless variation, there must exist a point for there is no , and . Then, which , and is not an accurate clock cycle estimate.) If the clock cycle has to be set by a rigidly designed clock generator, based on a fixed estimation of the circuit’s critical path, than such “collapsing” leads to supoptimal performance since any clock-cycle time in excess of the actual (chip-specific) critical path is a direct performance (performance loss loss. Specifically, the performance loss , and . For the for chip ) benchmark combinational circuit c499, for four chips/reticle, are (in ps) 1190, 1330, 1280, the critical path delays gives ps and 1380 . Collapsing %. The maximum performance loss, however, is %! In the clock-skew example, combining the four skew maps into the overall skew map (Fig. 10) raises the maximum skew to 88 ps. Thus, for the first chip (skew is 75 ps), the performance loss is 17%. For the given performance loss and potential gains, a suitable approach may be chosen to balance the tradeoff between higher performance (multiple designs) and simplicity (“collapsing”).

552

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 5, MAY 2002

Fig. 10.

Clock-skew map (ps) generated by “collapsing” the four chips of the reticle field. Maximum skew is now 87 ps. TABLE IV

VI. LOCATION-DEPENDENT AND WORST CASE TIMING ANALYSIS It is important to clarify the relation between the locationdependent delay variability analysis, which we considered so far, and the traditional worst case timing analysis, since both deal with deviations from the “nominal” case. In particular, we variation cannot be accurately show that the systematic modeled by traditional statistical methods. The goal of statistical circuit timing analysis is to determine the probability density function (pdf) of the circuit delay, or, . In most cases, however, a tacit assumption equivalently is made that the mean is known a priori, and one is concerned only with finding the delay variance. This is the approach taken by the widely used statistical method, worst case analysis [12], [13]. In contrast, the location-dependent timing analysis is concerned with variation of the mean timing properties of the circuit as a function of position. Despite the formal differences between the two approaches, one could argue that it is possible to get an accurate prediction of the statistical circuit behavior using the traditional worst case analysis if only the value of sigma properly included the spatial and proximity-dependent variability. This is so because systematic variability can always be absorbed in the random variation component. Simulations show, however, that a significant prediction error is likely to occur. The traditional statistical worst case analysis assumes the fol, where is lowing statistical model: varithe overall mean and absorbs all of the random value is . ation. Then, the worst case For the location-dependent timing analysis, both the mean and variance are proximity-dependent, and the mean is location-de, where pendent: is the spatial proximity-dependent map. In this case, the value is . worst case

We compared the accuracy of the above modeling approaches through statistical worst case simulations of the benchmark combinational circuit. The circuit was simulated using worst case values, assuming placement at two locations within the chip: a center point and a corner. The results (Table IV) suggest that such worst case analysis is overly pessimistic, at least for certain spatial locations. Thus, in predicting the worst case behavior of a circuit when it is located in the center of the chip, the traditional worst case analysis gives an error of 11%. This is a significant error for designs with tight timing constraints. VII. CONCLUSION In this paper, we demonstrated, using experimental evidence gathered from a state-of-the art 0.18- m fabrication facility, the variability. presence of significant systematic intrachip This variability causes an error of up to 17% in timing analysis of critical paths, resulting in a corresponding performance loss. The variability also leads to increased global skew of about 8%, which is additive to the setup time error. We developed a theoretical framework allowing explicit analvariysis of circuit speed degradation due to intrachip ability. Analysis shows that the spatial, rather than proximityvariability is the main cause of dependent, systematic large circuit speed degradation. The degradation is worse for circuits with a larger number of critical paths and shorter average logic depth. We proposed a location-dependent timing analysis methodology that allows the analysis and mitigation of the detrimental variability and have developed a tool linking effects of the layout-dependent spatial information to circuit analysis. We showed that the proposed methodology cannot be subsumed by a statistical, e.g., worst case, timing analysis. In a situation of multiple chips per reticle field, one can either treat the

ORSHANSKY et al.: IMPACT OF SPATIAL INTRACHIP GATE LENGTH VARIABILITY

problem as a multidesign problem (high-performance) or “collapse” timing information into a single set (simplicity). ACKNOWLEDGMENT The authors would like to thank Advanced Micro Devices for providing the experimental data used in this work.

553

Pinhong Chen (S’98) received the B.S. and M.S. degrees in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1991, and 1993, respectively. He is currently a Ph.D. candidate in the Department of Electrical Engineering and Computer Science at the University of California, Berkeley. From 1995 to 2001, he worked at TSMC as a member of technical staff focusing on design automation and design flow integration. From 2001, he joined Silicon Perspective, Inc., as a member of technical staff working on the development of EDA tools. His research interests include the areas of design automation, signal integrity, crosstalk analysis, static timing analysis, and system integration.

REFERENCES [1] A. Kahng and Y. Pati, “Subwavelength optical lithography: Challenges and impact on physical design,” in Proc. Int. Symp. Physical Design, 1999, p. 112. [2] L. Chen, L. Milor, C. Ouyang, W. Maly, and Y. Peng, “Analysis of the impact of proximity correction algorithms on circuit performance,” IEEE Trans. Semiconduct. Manuf., vol. 12, pp. 313–322, Aug. 1999. [3] V. Mehrotra, S. Nassif, D. Boning, and J. Chung, “Modeling the effects of manufacturing variation on high-speed microprocessor interconnect performance,” in IEDM Tech. Dig., 1998, p. 767. [4] C. Yu et al., “Use of short-loop electrical measurements for yield improvement,” IEEE Trans. Semiconduct. Manuf., vol. 8, May 1995. [5] B. Stine, D. S. Boning, and J. E. Chung, “Analysis and decomposition of spatial variation in integrated circuit processes and devices,” IEEE Trans. Semiconduct. Manuf., vol. 10, pp. 24–41, Feb. 1997. [6] HSPICE User Manual: Avant!, 1999. [7] Benchmark Combinational Circuits: ISCAS, 1985. [8] PathMill User Guide: Synopsys, 1999. [9] K. Chen and C. Hu, “Performance and V dd scaling in deep submicrometer CMOS,” IEEE J. Solid-State, Circuits, vol. 33, Oct. 1998. [10] D. Sylvester and K. Keutzer, “Getting to the bottom of deep-submicron,” in Proc. Int. Conf. Computer-Aided Design, 1998, p. 203. [11] Semiconductor Industry Association, “Nat. technol. roadmap for semiconductors,”, 1997. [12] S. Nassif, “Statistical worst-case analysis for integrated circuits,” in Statistical Approaches to VLSI. New York: Elsevier Sci., 1994. [13] P. Yang et al., “An integrated and efficient approach for MOS VLSI statistical circuit design,” IEEE Trans. Computer-Aided Design, Jan. 1986.

Michael Orshansky (S’96) received the Ph.D. degree in electrical engineering and computer science from the University of California, Berkeley, in 2001. He is currently a Research Scientist with the Electronics Research Lab at University of California, Berkeley, working on the development of algorithms and tools for low-power IC design and statistical timing analysis. His research interests include circuit design and analysis techniques for manufacturability of high-performance digital and mixed-signal circuits, statistical CAD algorithms for design-for-manufacturing and yield improvement, and modeling and simulation of semiconductor devices. He was a Fellow of Semiconductor Research Corporation and Advanced Micro Devices. He has published more than 20 technical papers in the area of statistical CAD algorithms, device physics, and modeling.

Linda Milor (S’86–M’90) received the Ph.D. degree in electrical engineering from the University of California, Berkeley, in 1992. She is currently an Associate Professor of Electrical and Computer Engineering at Georgia Institute of Technology, Atlanta. Prior to joining Georgia Tech, she served as Vice President of Process Technology and Product Engineering at eSilicon Corporation, as Product Engineering Manager at Advanced Micro Devices, Sunnyvale, CA, and was a faculty member at the University of Maryland in College Park. She has over 50 publications and three patents on yield and test of semiconductor ICs.

Kurt Keutzer (S’83–M’84–SM’94–F’96) received the B.S. degree in mathematics from Maharishi International University, Fairfield, IA, in 1978, and the M.S. and Ph.D. degrees in computer science from Indiana University, Bloomington, in 1981 and 1984, respectively. In 1984, he joined AT&T Bell Laboratories, where he worked to apply various computer science disciplines to practical problems in computer-aided design. In 1991, he joined Synopsys, Inc., where he continued his research in a number of positions culminating in his position as Chief Technical Officer and Senior Vice-President of Research. He left Synopsys in January 1998 to become a Professor of Electrical Engineering and Computer Science at the University of California, Berkeley, where he is currently Associate Director of the Gigascale Silicon Research Center. He coauthored Logic Synthesis (New York: McGraw-Hill, 1994). His research interests include areas related to synthesis and high-level design. Dr. Keutzer received three Best Paper Awards at the Design Automation Conference (DAC), a Best Paper Award at the International Conference in Computer Design (ICCD), and a Distinguished Paper Citation from the International Conference on Computer-Aided Design (ICCAD). He was an Associate Editor of IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS from 1989 to 1995 and currently serves on the editorial boards of Integration—The VLSI Journal, Design Automation of Embedded Systems, and Formal Methods in System Design. He has served on the technical program committees of DAC, ICCAD, and ICCD as well as the technical and executive committees of numerous other conferences and workshops.

Chenming Hu (S’71–M’76–SM’83–F’90) received the B.S. degree from National Taiwan University, Taiwan, and the M.S. and Ph.D. degrees in electrical engineering from the University of California, Berkeley. He is the Chief Technology Officer of TSMC, Hsinchu, Taiwan. He is on leave from the University of California, Berkeley, where he is the TSMC Distinguished Professor of Electrical Engineering and Computer Sciences. He was an Assistant Professor at the Massachusetts Institute of Technology, Cambridge, for three years. He was the Board Chairman of the East San Francisco Bay Chinese School and is a frequent advisor to industry and educational institutions. He is a cofounder and cochairman of Celestry Design Technologies, Inc. His present research areas include microelectronic devices, thin dielectrics, circuit reliability simulation, and nonvolatile memories. He has authored or coauthored four books and over 600 research papers and supervised 60 doctoral students. He leads the development of the MOSFET model BSIM3v3 that has been chosen as the first industry standard model for IC simulation by the Electronics Industry Association Compact Model Council. Dr. Hu was given a Research and Development 100 Award as one of the 100 most technologically significant new products of the year for the MOSFET model BSIM3v3, in 1996. He is a member of the U.S. National Academy of Engineering, an Adjunct Professor of Peking University, and an Honorary Professor of the Chinese Academy of Science. In 1991, he received the Excellence in Design Award from Design News and the inaugural Semiconductor Research Corporation Technical Excellence Award for leading the research of the IC reliability simulator, BERT. He received the SRC Outstanding Inventor Award in 1993 and 1994. In 1997, he received the IEEE Jack A. Morton Award for contributions to the understanding of MOSFET reliability physics. In 1999, he received the DARPA Most Significant Technological Accomplishment Award for codeveloping FinFET, a promising MOSFET structure for scaling to 10-nm gate length. He is a member of the U.S. National Academy of Engineering, a fellow of the Institute of Physics, an Honorary Professor of the Chinese Academy of Science, Beijing, and of Chiao Tung University, Taiwan. He has received the University of California at Berkeley’s highest honor for teaching—the Distinguished Teaching Award.

Suggest Documents