New Reliability Challenges for Electronics Deep Submicron Reliability issues

New Reliability Challenges for Electronics Deep Submicron Reliability issues Philippe Perdu DCT/AQ/LE EUROCALCE, May 3rd- 4th, TOULOUSE 1/ 1 Purp...
Author: Erik Ross
7 downloads 4 Views 3MB Size
New Reliability Challenges for Electronics Deep Submicron Reliability issues

Philippe Perdu DCT/AQ/LE

EUROCALCE, May 3rd- 4th, TOULOUSE

1/ 1

Purpose ■ DSM lifetime has been identified as a key issue Š Coming from survey, not from direct experimentation

■ Presentation of compiled data collected through selected references (more than 100) Š Google CMOS + DSM + “wear out” OR lifetime 131000 (limited to 1 year)

■ Purpose of this study Š Better evaluation of initial level of risk Š R&T program for Lifetime evaluation (models, Key parameters…), Design For Reliability, Adapted Burn In & Life Test …

■ Today, it is just an overview Š Limited to Bulk CMOS Š Focused on die rather that packaged device EUROCALCE, May 3rd- 4th, TOULOUSE

2

Outline ■ Introduction ■ Technology trends Š Moore’s Law Š Do we need DSM devices?

■ Warning charts and reliability trends Š Lifetime Charts Š Trends

■ DSM Wear out Š Front End Of the Line (FEOL) Š Back End Of the Line (BEOL)

■ Can we estimate and manage lifetime? Š Mandatory studies Š Design For Reliability Š Trade-Off

■ Conclusion EUROCALCE, May 3rd- 4th, TOULOUSE

3

Introduction

■ According to Moore’s Law, technology are still speedily evolving ■DSM Lifetime is a controversial issue Š Frightening trends have been underlined Š Manufacturers maintain they can manage high performance and long lifetime for specific applications

■ DSM should be used for space application Š We are not yet using deep DSM devices but there are some requests driven by performance needs (telecom payloads)

EUROCALCE, May 3rd- 4th, TOULOUSE

4

EUROCALCE, May 3rd- 4th, TOULOUSE

5

Outline ■ Introduction ■ Technology trends Š Moore’s Law Š Do we need DSM devices?

■ Warning charts and reliability trends Š Lifetime Charts Š Trends

■ DSM Wear out Š Front End Of the Line (FEOL) Š Back End Of the Line (BEOL)

■ Can we estimate and manage lifetime? Š Mandatory studies Š Design For Reliability Š Trade-Off

■ Conclusion EUROCALCE, May 3rd- 4th, TOULOUSE

6

Moore’s Law (1)

Area Speed Power Cost

EUROCALCE, May 3rd- 4th, TOULOUSE

7

Moore’s Law (2)

EUROCALCE, May 3rd- 4th, TOULOUSE

8

OBC From 1970 to 2004 (1) ■ARGOS / EOLE (D2A) control unit, magnetic core memory

EUROCALCE, May 3rd- 4th, TOULOUSE

9

OBC From 1970 to 2004 (2) ■ Myriade (microsat)

EUROCALCE, May 3rd- 4th, TOULOUSE

10

SoC and SIP

EUROCALCE, May 3rd- 4th, TOULOUSE

11

Scaling consequences (1) ■ Constant field technology scaling Š Supply voltage: Vdd → Š Gate length: L → Š Gate width: W → Š Gate-oxide thickness: tox → Š Junction depth: Xj → Š Substrate doping: NA →

Vdd / α L/α W/α tox / α Xj / α NA × α

■ “No exponential is forever…but forever can be delayed” (Gordon Moore, 2003 ISSSC Conference)

EUROCALCE, May 3rd- 4th, TOULOUSE

12

Scaling consequences (2)

EUROCALCE, May 3rd- 4th, TOULOUSE

13

Scaling limit

EUROCALCE, May 3rd- 4th, TOULOUSE

14

Do we need DSM? Generation Technology Number of transistors Surface

■ Self-fulfilling prophecy? “it happened because everyone believed it was going to happen” ■ Device/die area W × L → (1/α)2 = 0.49 (more functionality for the same size) ■ Higher frequency, more power for the same surface (High Performance) … ■ Or less consumption, smaller size for same function (C×V2×f→ (1/α)2 = 0.49) (Low power) EUROCALCE, May 3rd- 4th, TOULOUSE

15

Do we need DSM? ■ IC Logic technology scaling into deep submicron regime to: Š Increase speed and function density Š Decrease power dissipation (and cost) per function • ATMEL 0.18 micron CMOS, 5,5 Mgates, 85nW/MHz/Gate • ST 90 nm CMOS, 15 Mgates, 18nW/MHz/Gate

■ Some space applications require VLSI with Š High density Š High speed Š Low consumption

■ Unfortunately, These satellites are expected to have a long lifetime (18 years and more) ■ It concerns only a few number of component types (even if they are key components) EUROCALCE, May 3rd- 4th, TOULOUSE

16

Outline ■ Introduction ■ Technology trends Š Moore’s Law Š Do we need DSM devices?

■ Warning charts and reliability trends Š Lifetime Charts Š Trends

■ DSM Wear out Š Front End Of the Line (FEOL) Š Back End Of the Line (BEOL)

■ Can we estimate and manage lifetime? Š Mandatory studies Š Design For Reliability Š Trade-Off

■ Conclusion EUROCALCE, May 3rd- 4th, TOULOUSE

17

Statements (1) ■ Systems are always built with available technologies Š Scaling and integration give more and more performances Š So long, VLSI has offered us intrinsic lifetimes incredibly greater than application lifetimes. ■ This situation seems to be no longer ensured Š Estimated lifetimes are continuously decreasing with technology evolution Š Technical inputs confirm these estimations Š Other reliability related issues are raising (burn-in, soft error)

EUROCALCE, May 3rd- 4th, TOULOUSE

18

Statements (2) ■ Some reference papers Š « Impact of semiconductor technology on aerospace electronic system design production and support», 8th Joint NASA/FAA/DoD conference on aging aircraft (February 2005) Š « Advanced Test Methodologies and Strategies for Semiconductors », IPFA 2006 (July 2006) Š « Rapport sur l’évolution du secteur des semi-conducteurs et ses liens avec les micro et nanotechnologies », office parlementaire d'évaluation des choix scientifiques et technologiques (January 2003) Š « The Impact of Technology Scaling on Lifetime Reliability », The International Conference on Dependable Systems and Networks (June 2004) Š «Is CMOS more reliable with scaling? », BAST, Pacific Northwest Test Workshop( March 2003) Š… EUROCALCE, May 3rd- 4th, TOULOUSE

19

Estimated lifetime (1)

EUROCALCE, May 3rd- 4th, TOULOUSE

20

Estimated lifetime (2)

EUROCALCE, May 3rd- 4th, TOULOUSE

21

Estimated lifetime (3)

Is CMOS more reliable with scaling?, TM Mak Intel Corporation, BAST 2003 EUROCALCE, May 3rd- 4th, TOULOUSE

22

Estimated lifetime (4)

Joseph B. Bernstein, University of Maryland, IRPS 2007 EUROCALCE, May 3rd- 4th, TOULOUSE

23

Estimated lifetime (5)

Subhasish Mitra (Stanford University), 2005 EUROCALCE, May 3rd- 4th, TOULOUSE

24

Trends: ITRS 2006 (1) Reliability Technology Requirements—Near-term

EUROCALCE, May 3rd- 4th, TOULOUSE

25

Trends: ITRS 2006 (2)

■ [1] Failures during the first 4000 hours of operation (~1 year's use at 50% duty cycle). Early failures are associated with defects. ■[2] Long term reliability rate applies for the specified lifetime of the IC. ■[3] While the overall IC failure rate does not change with time, as the number of transistors per chip increases [from ORTC], the relative failure rate per transistor must decrease ■[4] As the length of interconnect per chip increases [from Interconnect Technology Requirements Tables], the failure rate per m of interconnect must decrease. Even more important for reliability is the increase in the number of vias.

EUROCALCE, May 3rd- 4th, TOULOUSE

26

Trends: ITRS 2006 (3) MPU and ASIC Interconnect Technology Requirements—Near-term Years

EUROCALCE, May 3rd- 4th, TOULOUSE

27

Trends: ITRS 2006 (4)

■ [1] Calculated by assuming that only one of every three minimum pitch wiring tracks for Metal 1 and five intermediate wiring levels are populated. The wiring lengths for each level are then summed to calculate the total interconnect length per square centimeter of active area. ■[2] This metric is calculated by assuming that a 5 FIT (failure in ten thousand) reliability budget is apportioned to interconnect for the highest reliability grade MPUs. This number is then divided by the total interconnect length to arrive at the FITs per meter of wiring per one square centimeter of active area.

EUROCALCE, May 3rd- 4th, TOULOUSE

28

Trends: ITRS 2006 (5)

■ Red Brick = ITRS Technology Requirement with no known solution ■ Alternate definition: Red Brick = something that REQUIRES billions of dollars in R&D investment

EUROCALCE, May 3rd- 4th, TOULOUSE

29

Trends: Temperature (1) ■ Dissipated

power density exponentially grows with technology evolution Š Scaling factors are not ideal (ie mobility). Voltage slower decreases than the scaling factor in order to keep sufficient noise margin and performance level (ultimate 0.3 V?) Š Leakage currents increase under combined effect of gate leakage and sub threshold current (Off state) Š (working frequency, number of gates … are triggering higher dynamic currents) Š “ To obtain the projected performance gain of 30% per generation, device designers have been forced to relax the device subthreshold leakage continuously from one to several nA/lm for the 250-nm node to hundreds of nA/lm for the 65-nm node. Consequently, passive power density is now a significant portion of the power budget of a high-speed microprocessor.”

■ =>

IC temperature is increasing

Š This high temperature affects the reliability (acceleration factor) Š … and has side effect issue on burn in process

■ More critical for HP VLSI EUROCALCE, May 3rd- 4th, TOULOUSE

30

Trends: Temperature (2)

EUROCALCE, May 3rd- 4th, TOULOUSE

31

Trends: Temperature (3)

EUROCALCE, May 3rd- 4th, TOULOUSE

32

Trends: Temperature (4)

EUROCALCE, May 3rd- 4th, TOULOUSE

33

Evidence of proof: Power PC FIT

180 nm EUROCALCE, May 3rd- 4th, TOULOUSE

65 nm 34

Trends: Other parameters ■ Electrical field Š Constant (limited by oxide thickness) Š From 1MV / cm (1970) up to 6 MV / cm (max breakdown field around 12 MV for SiO2)

■ Interconnect current density: 0,1 MA/cm² (1970) to 1 MA/cm² ■ Integration: more transistors, more interconnections ■ Process variability Š Statistical effects (“atomic” scale) Š Process complexity (OPC for FEOL lithography, CMP, damascene copper for BEOL)

■ New material (metal gate, High K, Low K … and thermomechanical issues) ■ Package … EUROCALCE, May 3rd- 4th, TOULOUSE

35

Trends: T and V effects (1) ■ Burn-in used to get rid of early failures Š 2 acceleration parameters: temperature and voltage Š Acceleration factor decrease by a factor of 10 between 180 nm et 90 nm (γ = 4, Ea = 0,7 eV)

Burn-in issue

EUROCALCE, May 3rd- 4th, TOULOUSE

36

Trends: T and V effects (2) ■ Thermal runaway during burn-in Š Parts heat up -> heat increases leakage currents -> generating more heat > thermal runaway Š With each generation the power dissipation of parts grows

■ Reduced margins Š Increased on chip electric fields limit the ability to apply over voltages

■ Less acceleration due to overvoltage as IC voltage is scaled down Š Assuming overvoltage is also scaled down

■ Danger that burn-in could “turn on” defects ■ From Dr. Ted Dellin, Sandia Natl. Lab.

EUROCALCE, May 3rd- 4th, TOULOUSE

37

Trends: Weibull distribution (1)

λ (t ) =

β c

t β

β −1

■ Weibull distribution (Empirical generalization of the exponential distribution): 2 parameters (shape parameter, time parameter) provide a wide variety of shapes

EUROCALCE, May 3rd- 4th, TOULOUSE

38

Trends: Weibull distribution (2)

Normalized scale t/c

EUROCALCE, May 3rd- 4th, TOULOUSE

39

Trends: Weibull distribution (3)

λ (t ) =

β c

β

t

β −1

β smaller than 1

■ Caused by “defects” and correlates with defect-related yield loss ■ Reduced by improved quality and by screens (e.g., burn-in) EUROCALCE, May 3rd- 4th, TOULOUSE

40

Trends: Weibull distribution (4)

λ (t ) =

β c

β

t

β −1

β=1

■ Caused by random defects, random events ■ Often used to design tests to demonstrate a given level of reliability EUROCALCE, May 3rd- 4th, TOULOUSE

41

Trends: Weibull distribution (5)

λ (t ) =

β c

β

t

β −1

β greater than 1

■ Intrinsic wearout depends on design, materials, process, application, environment… ■ Want the onset of intrinsic wearout to be beyond lifetime requirement EUROCALCE, May 3rd- 4th, TOULOUSE

42

Trends: Weibull distribution (6)

E. Y. Wu et al. Microelectronics Reliability 43 (2003) 1175-1184 EUROCALCE, May 3rd- 4th, TOULOUSE

43

Trends: Weibull distribution (6) ■ Field Failures has proven a more and more Constant Rate occurrences.

■ Joseph B. Bernstein, University of Maryland, IRPS 2007 EUROCALCE, May 3rd- 4th, TOULOUSE

44

Trends: Soft Defects

Increase Increaseof oftransient transientnoise noise Increase Increaseof of“white” “white”noise noise

Power Supply (V)

5.0 I/O power supply 3.3 2.5 Core Power Supply

1.5 0.7 0.5µ

EUROCALCE, May 3rd- 4th, TOULOUSE

0.35µ

0.18µ

90nm

65nm Technology node 45

Trends: Soft defect

■ INTEL: “Soft errors are the second biggest [reliability] concern after leakage current in submicron design” ■ Tim Dell, IBM: “for every 256 Mbytes of memory, you will get one soft error a month due to cosmic-ray-generated neutrons” ■ Link with robustness Š Ageing can reduce robustness (cumulative effects) Š Stresses below “threshold” can • Reduce the lifetime (ESD => TDDB, EMI) • Reduce the robustness (cumulative ESD) EUROCALCE, May 3rd- 4th, TOULOUSE

46

Trends: Reliability improvements (1) ■ Manufacturer could improve it Š For instance, barrier layers could be optimized in order to limit coper diffusion in low-K dielectrics (TDDB) Š ... But it could jeopardize electromigration robustness, performances, cost and block impurities inside low K dielectrics

■ …. They won’t do it ■ Performance and cost are the main drivers for manufacturers (consumer) and longer lifetime and reliability are not the main objectives ■ Spatial market is too tiny and do not have the economic weight to modify and direct process or design trends EUROCALCE, May 3rd- 4th, TOULOUSE

47

Trends: Reliability improvements (2) ■ Richard Goering, EE Times 4, Sept. 2006: Chipping away at design for Reliability Š … at 65 nanometers and below, … current densities go through the roof, exacerbating electromigration. Š Problems such as hot-carrier degradation loom larger. Š Ultra-thin gate oxides are prone to breakage. Š Without DFR, many 65- and 45-nm chips will ultimately break. Š That may not matter for a volume consumer product with a short life. But it matters a lot for chips that go into airplanes, pacemakers or cars.

■ Overview of wear out mechanisms ■ A look at DFR EUROCALCE, May 3rd- 4th, TOULOUSE

48

Outline ■ Introduction ■ Technology trends Š Moore’s Law Š Do we need DSM devices?

■ Warning charts and reliability trends Š Lifetime Charts Š Trends

■ DSM Wear out Š Front End Of the Line (FEOL) Š Back End Of the Line (BEOL)

■ Can we estimate and manage lifetime? Š Mandatory studies Š Design For Reliability Š Trade-Off

■ Conclusion EUROCALCE, May 3rd- 4th, TOULOUSE

49

Scaling: Electrical performances ■ Increase frequency, decrease propagation time Š Decrease RC (BEOL) • RP => Metal choice: copper rather than aluminum • C P => low K material (porous silicon dioxide…) Decrease switching times, we want IDsat has high as possible

■ Decrease power consumption

Cox μ Z (VGS − VT ) 2 I Dsat = 2L

Cox =

ε ox tox

Š I leak as small as possible (scaling worsens it) • Ioff • Igate

Š nW/MHz/Gate targeted by scaling • ( V t IDsat)) EUROCALCE, May 3rd- 4th, TOULOUSE

50

ITRS 2006 identified challenges Difficult Challenges ≥ 32 nm

Summary of Issues

High-κ gate dielectrics with metal gate electrodes

•Dielectric breakdown characteristics (hard and soft breakdown) •Transistor stability (charge trapping, work function stability, metal ion drift or diffusion) •Impact of implantation •Metal gate thermomechanical issues (coefficient of thermal expansion mismatch)

Copper/Low-κ interconnects

•Stress migration of Cu vias and lines •Cu via and line electromigration performance •Impact of degradation of properties with lowering k (strength, adhesion, thermal conductivity, •coefficient of thermal expansion) •Time Dependent Dielectric Breakdown of the Cu/low-κ system •Impact of packaging

Negative bias temperature instability

•Degradation of p channel current •Dependence on scaling and nitrogen in gate insulator •Impact on burn-in

EUROCALCE, May 3rd- 4th, TOULOUSE

51

FEOL: known wear out mechanisms ■ Reliability At Transistor Level Š Hot Carriers Degradation (HCI) Š Gate Oxide Degradation • Gate Oxide Breakdown • Time Dependant Dielectric Breakdown (TDDB)

Š Negative Bias Temperature Instability (NBTI)

■ Scanning down Channel length (L) and gate oxide thickness (Tox) Š E –Fields In Oxide And Channel Increase Š Device Reliability Issues (HCI And TDDB) Become Severe

EUROCALCE, May 3rd- 4th, TOULOUSE

52

FEOL: Gate Oxide Breakdown (1) ■ Dielectric Breakdown Mechanism

EUROCALCE, May 3rd- 4th, TOULOUSE

53

FEOL: Gate Oxide Breakdown (2) ■ Hard breakdown Š Current flowing through short in oxide raises temperature and electrode melts and diffuses into oxide Š Low resistance ohmic path through gate insulator Š Definitely an IC failure

■ Soft breakdown Less power dissipation results in less thermal effects Š High resistance ohmic path through gate insulator Š Increase in noise Š IC may still function after soft breakdown

■Trends Š Occurs more frequently in thinner oxides lower voltages Š happen very soon in DSM device life Š Can induce side effect (power dissipation, soft defects due to increased noise level) EUROCALCE, May 3rd- 4th, TOULOUSE

54

FEOL: Hot Electron Injection (1) ■Electrons are injected in the channel (NMOS ON) ■ Impact ionization creates electron hole pairs. ■ Holes drift to substrate (Isub) ■ Hot electrons create damage to the oxide ■ Isub is a measure of H-C generation rate ■ Injected carriers produce damage that reduces transistor current CHE : Channel Hot Electron Š Eventually, device becomes too slow Š Lifetime issue EUROCALCE, May 3rd- 4th, TOULOUSE

DAHC : Drain Avalanche Hot Carrier

55

FEOL: Hot Electron Injection (2)

■Was a NMOS problem Š N channel: increase in substrate currents

■with scaling it becomes also a PMOS issue Š P Channel: Increase in Off State Leakage Current EUROCALCE, May 3rd- 4th, TOULOUSE

56

FEOL: Mission Profile Dependance (1) ■ DRAM 90 nm: 1GB, DDR2, 266MHz, Vdd=1.8V, Temperature=75C, Simulation result

EUROCALCE, May 3rd- 4th, TOULOUSE

57

FEOL: Mission Profile Dependence (2) ■ If one unique bit is accessed constantly, HCI failure will dominate.

■ In addition, 10 year DC lifetime hard to achieve in deep sub micron region EUROCALCE, May 3rd- 4th, TOULOUSE

58

FEOL: NBTI (1) ■ Stress-conditions Š negative electrical field over gate oxide Š p-MOS device in inversion Š elevated temperature

■ Damages Š stress induced interface states trapping Š fixed positive oxide charges

■Electrical effects Š increase of the absolute value of Vth Š decrease of the drain current Š decrease of carrier mobility

■ Still under investigation, NBTI importance could be related to Si-H bonds in Nitrured gate oxide EUROCALCE, May 3rd- 4th, TOULOUSE

59

FEOL: NBTI (2) ■ Comparison of PMOS NBTI lifetimes vs. NMOS and PMOS HCI, 0.13 µm technology ■ C.H. Jeon IEEE Integrated Reliability 2002 Workshop, final repot pp130-132

110°C 150°C

EUROCALCE, May 3rd- 4th, TOULOUSE

60

FEOL: NBTI (3) ■ From Joseph B. Bernstein (University of Maryland/Bar-llan University)

EUROCALCE, May 3rd- 4th, TOULOUSE

61

FEOL: New materials (1) ■ Nitrided gate oxide Š Boron penetration is a problem for ultra-thin oxides, it lowers TDDB lifetime Š Nitrogen doping limits boron penetration and improve oxide reliability

But it triggers more NBTI issue!

■ EUROCALCE, May 3rd- 4th, TOULOUSE

62

FEOL: New materials (2) ■ High K dielectric Š Equivalent Oxide Thickness = Tox = THighK * (3.9/K), Š HfO2 (Keff~15 - 30); HfSiOx (Keff~12 - 16), La based in future Š Materials, process, integration issues to solve (thermal stability, thermal & chemical compatibility, interface with Si substrate and gate electrode Š Potential side effect (radiation robustness)

SiO2

Tox

High-k Material TK

Electrode

Si substrate

EUROCALCE, May 3rd- 4th, TOULOUSE

Electrode

Si substrate 63

FEOL: New materials (3) ■ Polysilicon depletion in gate electrode Š Tox(electric) = Tox + Wpoly depletion Š Decrease C Š Reduced Idsat

■ Potential solution Š Wpoly depletion ~ (poly doping) - 0.5 Š increase poly doping to reduce Wpoly depletion with scaling but max. poly doping is limited Š Poly depletion become more critical with Tox scaling Š metal gate electrodes Š Induce new reliability issues

EUROCALCE, May 3rd- 4th, TOULOUSE

Depletion Layer Polysilicon Gate

Wd,Poly

Gate Oxide

Substrate

Inversion Layer

64

FEOL: New materials (4) ■ Stressed Si, Si Ge Š Increase the mobility

EUROCALCE, May 3rd- 4th, TOULOUSE

Cox μ Z (VGS − VT ) 2 I Dsat = 2L

65

FEOL: New materials (5) ■ Stress engineering can deliver incredible performance gain through mobility enhancement ■ it can also degrade device reliability (NBTI) Š even though compressively stressed silicon nitride films could significantly increase mobility in the pFET channel, excess hydrogen in the nitride could degrade NBTI Š Hwa Sung Rhee, Samsung Electronics EUROCALCE, May 3rd- 4th, TOULOUSE

66

BEOL: Electromigration ■Metal Atoms Can Migrate Due to Currents and/or Stresses ■ Electromigration Š Requires an electrical current Š Atoms move due to collision of electrons

■Stress Migration Š Atoms move to relieve compressive stresses Š Stress gradients from processing and/or electromigration EUROCALCE, May 3rd- 4th, TOULOUSE

67

BEOL: Cu and Low K (1)

EUROCALCE, May 3rd- 4th, TOULOUSE

68

BEOL: Cu and Low K (2) ■ Low k Dielectrics Present Many Processing and Reliability Challenges ■ Compared to SiO2 low k dielectrics are less robust Š Weaker: makes chemical mechanical polishing more difficult Š Porous: can trap process gases and chemicals Š Poorer Adhesion: can lead to reliability problems Š Cu diffuses more easily along surfaces than through the bulk Especially under top cap layer Š Surface effects are larger in thinner lines Thinner lines + lower k dielectric • Weaker adhesion • Decrease in EM

■ May have to be implemented in a stack using more robust, higher k, dielectrics to protect the low k dielectrics Š Increases the effect dielectric constant, reducing the speed

EUROCALCE, May 3rd- 4th, TOULOUSE

69

BEOL: Cu and Low K (3) Relative time to failure

Line width (μm) ■Lifetime with scaling worsens ■Sato & Ogawa, 2001 Interconnect Tech. Conf., EUROCALCE, May 3rd- 4th, TOULOUSE

70

BEOL: Cu and Low K (4)

Dielectric Constant ■ From Ted Dellin’s IRPS tutorial ■ Proposed Low k Interlevel Dielectrics Have Reduced thermal Conductivity & Strength ■ Other things that get worse with lower k: interfacial adhesion, electrical breakdown and coef. of thermal expansion mismatch EUROCALCE, May 3rd- 4th, TOULOUSE

71

BEOL: Cu and Low K (5) ■ Packaging Challenges: The Poor Mechanical Properties of Low k Dielectrics

EUROCALCE, May 3rd- 4th, TOULOUSE

72

BEOL: Cu and Low K (6) ■ Leakage Currents (and TDDB) Between Cu Lines Degrades as k is Lowered ■ Systematic reduction in dielectric breakdown strength with lower k Š Copper extruding into low k makes things worse Š Sensitive to process damage, porosity,

EUROCALCE, May 3rd- 4th, TOULOUSE

73

BEOL: Cu and Low K (7)

■ size effect : linewidths shrink below around 100 nm Š close to mean free path of electrons in copper (39 nm) Š increased resistivity of copper caused by electron scattering at the surface of the line and at grain boundaries.

■ estimations of the magnitude of the effect the size effect has on interconnect delay has been overestimated Š next few device generations Š size effect can be effectively managed through interconnect design

EUROCALCE, May 3rd- 4th, TOULOUSE

74

Outline ■ Introduction ■ Technology trends Š Moore’s Law Š Do we need DSM devices?

■ Warning charts and reliability trends Š Lifetime Charts Š Trends

■ DSM Wear out Š Front End Of the Line (FEOL) Š Back End Of the Line (BEOL)

■ Can we estimate and manage lifetime? Š Mandatory studies Š Design For Reliability Š Trade-Off

■ Conclusion EUROCALCE, May 3rd- 4th, TOULOUSE

75

Mandatory studies

■ DSM lifetime has to be early taken into account ■ DSM reliability parameters has to be fine tuned for lifetime simulation purpose Š Technology dependant Š Application dependant (mission profile)

■ Manufacturer involvement is a critical issue Š We need them Š High reliability / long lifetime is a small market

EUROCALCE, May 3rd- 4th, TOULOUSE

76

DFR: example (1) ■ Design for Reliability Example: Layout of Cu Lines and Vias

EUROCALCE, May 3rd- 4th, TOULOUSE

77

DFR: example (2)

■ CMOS device reliability – dynamic NBTI recovery Š lifetime can improve by a factor of 10 – 30 Š recovery is always same fraction of in every cycle.

■ S. Chakravarthi IEEE IRPS, 2004, pp173 EUROCALCE, May 3rd- 4th, TOULOUSE

78

DFR: technology tolerance (1) ■ Methodologies for Adaptation to Process Variations, Manufacturing Defects, and Transient Errors in Scaled CMOS (Georgia Institute of Technology, August 2007) ■ Variation-Tolerant Design Š increase of process parameter variations in CMOS technologies Š Variation-Aware Placement • huge leakage variation problem addressed by looking at the effects that the gate placement have in leakage distribution (clusters) • algorithms for the placement of gates in a dual-Vt circuit to mitigate the large leakage variation by reducing the variation caused by correlated within-die process variation. • sub-threshold leakage variation reduced by an average of 17% and maximum of 31%. • obtained with a small increase in wire length.

Š Post-Manufacture Tuning Architecture • tunable gates, supply voltage, or body bias • deal with the delay and leakage variation • self-test/self adaptationcan improve the delay yield by 40%. EUROCALCE, May 3rd- 4th, TOULOUSE

79

DFR: technology tolerance (2) ■ Defect-Tolerant CMOS Gate Design Š significant defectivity due to manufacturing defects, random process variations, and wear-out Š future circuits must be equipped with a significant defect-tolerance capability Š little delay overhead (less than 6%) but incurs leakage power dissipation overhead (less than 20%) in the presence of defects.

■ Probabilistic Checksum-Based Error Š Relaxing the requirement of 100% correctness for devices and interconnects may dramatically reduce costs of manufacturing, verification and test Š hard to achieve 100% correctness because of an increase in transient error rate Š SNR improvements (up to 13 dB) can Š be obtained in the presence of soft errors EUROCALCE, May 3rd- 4th, TOULOUSE

80

DFR: Dual Vt, Tox … (1) ■ Leakage Optimization using Dual Threshold Voltage ■ Off-State Leakage Current Š Subthreshold Leakage (ISUB) Š Gate Induced Drain Leakage (IGIDL) Š Edge Directed Tunneling Leakage (IEDT) Š Band to Band Tunneling Leakage (IBTBT)

■ On-State Leakage Current Š Gate Leakage (IGON)

EUROCALCE, May 3rd- 4th, TOULOUSE

81

DFR: Dual Vt, Tox … (2) ■ Gate Delay and Leakage Tradeoff

T pd ∝

CV dd

(V dd

− V th )

α

■ Propagation delay has the above dependence on Vt Š Higher Vt means slower gate (larger propagation delay) Š But higher Vt means smaller subthreshold leakage (exponential dependence!)

■ Tradeoff between delay versus leakage done at design level (fast gate or low consumption gate) ■ The other possibility is to increase gate thickness Š Trade-off between delay (Idsat driven) and leakage EUROCALCE, May 3rd- 4th, TOULOUSE

82

Dual Vt Results ■ Results for ISCAS benchmark circuits

EUROCALCE, May 3rd- 4th, TOULOUSE

[Wei, et al., DAC98] 83

DFR: Low Voltage (1) ■ Voltage derating Š Many delay-causing defects have much greater impact at reduced VDD. • Voltage variation has a much greater impact on delay at lower VDD than high VDD. • Latent defects likely to be more pronounced at low VDD – may be larger numbers than 1-2%

EUROCALCE, May 3rd- 4th, TOULOUSE

84

DFR: Low Voltage (2)

■ Reliability Implications ŠReduced VDD will reduce some wear-out mechanisms • Oxide breakdown • NBTI • Some thermal effects (due to reduced heat)

ŠOthers will get worse • Latent delay defects • Some thermal effects (due to increased thermal cycles)

■ Rob Aitken, IOLTS 2006

EUROCALCE, May 3rd- 4th, TOULOUSE

85

DFR: Lifetime Reliability-Aware µP (1) ■ Ensuring long processor lifetimes by limiting failures due to wear-out related hard errors is a critical requirement for all microprocessor manufacturers Š average increase of 316% in processor failure rates when scaling from 180nm to 65nm Š some performance and/or die area (and resultant cost) will have to be sacrificed for reliability.

■ microarchitecture-level model Š RAMP electromigration, stress migration, time dependent dielectric breakdown, and thermal cycling, + NBTI Š Dynamically tracks processor lifetime reliability, accounting for the behavior of the executing application.

■ dynamic reliability management (DRM) Š Processor scaling and increasing power densities Š Increasing transistor count • More transistors result in more failures which results in lower processor lifetimes. • Hence, not only is the reliability of individual transistors decreasing, the number of transistors that can fail is also increasing. EUROCALCE, May 3rd- 4th, TOULOUSE

86

DFR: Lifetime Reliability-Aware µP (2) ■ Architectural awareness of lifetime reliability ■ Workload ■ Over-designed processors Š Current reliability qualification is based on worst case temperature and utilization; Š however, most applications will run at lower temperature and utilization resulting in higher reliability and longer processor lifetimes Š than required. If the processor cooling solution can handle it, this excess reliability can be utilized by the processor to increase application performance.

■ Under-designed processors. Š Beneficial to commodity processors where increasing yield and reducing cooling costs would have significant impact on profits, even if they incur some performance loss. EUROCALCE, May 3rd- 4th, TOULOUSE

87

DFR: Lifetime Reliability-Aware µP (3) ■ J. Srinivasan, University of Illinois, P. Bose, IBM T.J. Watson Research Center ■ two methods for structural redundancy to enhance Lifetime Reliability ■Structural Duplication Š Certain redundant microarchitectural structures added to the processor Š Spare structures can be turned on when the original structure fails, increasing the processor’s lifetime

■ Graceful Performance Degradation (GPD) Š replicated structures that are used for increasing performance for some high parallelism applications (Modern processors) Š replicated structures are not required for functional correctness so the processor can shut down a failed structure and still maintain functionality, thereby increasing lifetime. Š processor with GPD would fail only when all redundant structures of a type fail. EUROCALCE, May 3rd- 4th, TOULOUSE

88

DFR: Lifetime Reliability-Aware µP (4)

■ Main driver are cost and performance ■ Done to target a minimal acceptable lifetime (7 year) and is $$$$ EUROCALCE, May 3rd- 4th, TOULOUSE

89

DFR: FLAW (1) ■ Altera Starfix III CMOS 65 nm Š Power Play (Development tool Quartus II version 6.1) automatically analyze the design • 0.9 V for low power • 1.1 V for high performance and critical path

■Xilinx FPGAs Spartan-3 / UMC-12A 90 nm qualification report Š Claims more than 10 years lifetime

■ FPGA Lifetime Awareness Š FPGA is “low volume” model Š It targets also military (and) space market Š FPGA manufacturers are involved in proving long lifetime EUROCALCE, May 3rd- 4th, TOULOUSE

90

DFR: FLAW (2) ■ Test structure (65 nm) Š Look Up Tables (LUTs) in FPGAs (made with 16 x 1 multiplexer) Š Studied mechanisms TDDB, EM HC

■ Region Constrained Placement for Reliability (RCPFR) Š periodic re-mapping of the design to less used regions for increasing the lifetimes of the device

EUROCALCE, May 3rd- 4th, TOULOUSE

91

Conclusions (1) ■ Lifetime is a real issue for long term use of high performance devices Š Lifetime decrease for same surface, scaling technology (more transistor, higher frequency) Š Thermal issue, high electrical field Š Derating after design is difficult to manage • Lower voltage has to be decided at design level (optimization) • Cooling is always possible (HCI?)

■ Questionnaire Š What is the expected gain when using DSM? (low power or high performance?) Š What are the acceptable trade-off?

■ Only a review approach EUROCALCE, May 3rd- 4th, TOULOUSE

92

Conclusions (2) ■ From Testing-in Reliability Š Use of end-of-line testing and screening to measure and ensure reliability Š Multiple problems • large number of samples that need to be tested • by the time problems are discovered, a lot of product has been affected

■ To Building-in Reliability Š Control reliability by process control and control of the design process Š Emphasis on preventing problems Š Testing is used to validate physical/statistical models and to find critical process variables Š Customer has to move from demanding explicit reliability demonstrations to confidence that reliability processes are under control EUROCALCE, May 3rd- 4th, TOULOUSE

93

Suggest Documents