Integrated Circuits and Systems

Integrated Circuits and Systems 1 Integrated Circuits and Systems • High Speed Circuits • Novel Techniques for Power Reduction, Speed Increase and...
Author: Ellen Higgins
6 downloads 2 Views 455KB Size
Integrated Circuits and Systems

1

Integrated Circuits and Systems

• High Speed Circuits • Novel Techniques for Power Reduction, Speed Increase and Optical Clock Distribution in Deep-sup-micron Technologies • Evaluation of Software Energy Consumption on Microprocessors • Energy Efficient Transmitter for Wireless Microsensors • Low Power RF Front-End for Wireless Microsensor Systems • Hardware Architecture for a Power-Aware Microsensor Node • Subthreshold Circuit Design for Ultra Low Power Sensor Systems • Low Power, Application-Driven Communication Hierarchy for Wireless Microsensor Networks • Ultra Wideband Radio • Circuit Techniques for Subthreshold Leakage Reduction in a Deep SubMicron Process • Power Aware Hardware Reconfigurable Digital Signal Processing Architecture for Wireless Communications • Novel Techniques Addressing Delay and Power Issues in Deep Sub-Micron Buses • Active Optical Clock Distribution • Software Tools for Process-Sensitive Reliability Assessments of IC Designs • The Low-Power Bionic Ear Project • Intelligent Transportation Systems • Parking Assistant System • Image Sensor Network • A New Night Visionary Pedestrian Detection and Warning Systems • A Binocular Vision System for Automated Vehicle • Wireless Gigabit Local Area Network • Analog Base-band Processor for Wireless Gigabit LAN • Integration of Multiple RF Front-Ends • Biasing and Power Combining Techniques for Power Amplifiers

continued

2

continued

Integrated Circuits and Systems

• 5.8 GHz Wideband Receiver for Wireless Gigabit LAN • A CMOS-Compatible Compact Display • Smart Active-Matrix Display Drivers For Organic Light Emitting Devices • A Differential CMOS Passive Pixel Imager • Characterization Methodology of CMOS Processes for Image Sensor Applications • Optoelectronic Integrated Circuits for Diffuse Optical Tomography • Mixed-Signal Design in Deeply Scaled CMOS Technology • Substrate Noise Charaterization Shaping in Mixed-Signal Systems • RF Analog Circuit Design with Scaled CMOS Devices • Circuit Design and Technological Limitations of Silicon RFICs • Superconducting Bandpass Delta-Sigma A/D Converter • Radio Frequency Digital-to-Analog Converter • Oversampled Pipeline A/D Converters with Mismatch Shaping • Low Power Reconfigurable Analog-to-Digital Converter • A CMOS Bandgap Current and Voltage References • A Programmable, Wide Dynamic Range CMOS Imager with On-Chip Automatic Exposure Control • Spike-Based Hybrid Computers Project • The Visual Motion and Inertial Motion Sensing Project

3

High Speed Circuits

Personnel M. Perrott

Sponsorship N/A

An example system illustrating some of the research issues we encounter is shown below. Here we have a transmitter for wireless applications that consists of a high resolution frequency synthesizer that can be directly modulated by changing its divide value. Application of signal processing techniques allows the achievement of high data rates through digital compensation of the PLL dynamics. A mixture of digital and analog design techniques are used to achieve high speed operation of the frequency divider, low power operation of the digital ___ modulator, and accurate setting of the loop filter time constants. The net result is a highly integrated, low power CMOS transmitter capable of 2.5 Mbit/s GFSK modulation.

Research in the area of high speed integrated circuit design for communication applications involves an overlap of signal processing and communication theory, analog and digital circuit design techniques, and an understanding of basic device physics. Current focus areas in this effort include RF circuits for wireless systems, broadband circuits for optical networking and backplane communication, and low jitter clocking circuits. We tend to focus on problems that involve interplay between the overall system architecture and circuit design of the individual components, and use a combination of analog and digital circuits to achieve a robust implementation. Current projects include a high resolution, wide bandwidth 5 GHz frequency synthesizer, a low jitter 10 Gigasample/s clock and data recovery circuit, and an efficient power amplifier for driving transducer arrays for ultrasound applications.

Fig.1:

4

Novel Techniques for Power Reduction, Speed Increase and Optical Clock Distribution in Deep-sup-micron Technologies

Personnel P. Sotiriadis (A. P. Chandrakasan)

Sponsorship MARCO Focused Research Center on Interconnect (MARCO/DARPA)

Finally, a class of circuits that accurately compare the phase difference between periodic optical and electrical signals of the same frequency has been developed. These novel topologies have allowed the development of a new optical on-chip clock distribution architecture that bypass the disadvantages of trans-conductance amplifiers.

A data-distribution and bus-structure aware methodology for designing coding schemes for low power has been developed. A general class of coding schemes for low power, termed Transition Pattern Coding schemes, has been introduced and its energy behavior has been mathematically analyzed in detail. Two algorithms have been proposed for deriving such efficient coding schemes that are optimized for the desired bus structure and data distribution. Bus partitioning has been mathematically analyzed as a way to reduce the complexity of the encoder/decoder. In this work we asked: what is the maximum possible reduction in power consumption in deep-sub-micron buses using coding techniques? We answered the question in two steps. First we gave the minimum energy per information bit required for communicating through deep-sub-micron buses. Then, we showed that the minimum energy is asymptotically achievable using coding. In addition, a simple differential coding scheme was proposed that achieves most of the possible energy reduction. The methodology used here also applies to more general communication and computation models. In this work I also introduced the idea of using coding to increase the throughput and communication speed in deep-sub-micron buses. I used a detailed transmission line model for the bus in order to estimate the time needed for the different transitions to get completed. After classifying the transitions according to their delays, I proposed a coding appropriate for constrained communication channels to increase the communication speed. The idea is to encode the data with the goal of eliminating certain types of transitions that require a lot of time to get completed. By using proper encoding techniques, the bus delay can be reduced by a factor of 2!

5

Evaluation of Software Energy Consumption on Microprocessors

Energy Efficient Transmitter for Wireless Microsensors

Personnel

Personnel

M. Osqui (A. P. Chandrakasan)

S. Cho (A. P. Chandrakasan)

Sponsorship

Sponsorship

DARPA and Raytheon Systems Company and Miccioli Scholarship

ABB

The energy efficiency of wireless systems is an important issue. The goal of this research was to evaluate the factors that affect software energy efficiency and identify techniques that can produce energy optimal software. The following are some questions that this work addressed:

The communication module of a wireless microsensor node must consume low energy in order to maximize the battery lifetime of a sensor network. In order to save power in the radio module, the electronics must be turned off during idle periods. Unfortunately, frequency synthesizers require a significant overhead in terms of time and energy dissipation to go from the sleep state to the active state. For short packet sizes, the transient energy during the start-up can be significantly higher than the energy required by the electronics during the actual transmission. Therefore reducing the start-up time is a key issue in designing an energy efficient radio for microsensors. In addition, power consumption of a transmitter in short range transmission at GHz frequencies is dominated by the radio electronics (frequency synthesizer, mixers, etc.) and not the output transmit power. Hence reducing the power consumption and the on-time of the transmitter is also important to achieve energy efficient transmitter.

• •

• •

How much energy do various instructions consume? How much variation exists in energy consumption across instructions? (i.e., is the amount of energy consumed per instruction dependent on the functionality of the particular instruction?) How can a program be more energy efficient with the knowledge of instruction energy profiling? What portion of total energy consumption is attributed to leakage currents?

Two state-of-the-art low-power processors were used for evaluation: the Intel StrongARM SA-1100 and the Intel XScale processor. A comprehensive profiling of the energy consumption per instruction was performed for the instruction set of the processors, while taking into account the different modes of operation. The results of this extensive profiling provided insightful information into the power consumption of the two processors under consideration. The results indicated that to a first-order approximation, optimizing a program for performance also optimizes energy.

These design goals were incorporated to a 6.5GHz BFSK modulator that is capable of achieving fast startup time, high data rate and low power consumption. To reduce the start-up time, variable loop bandwidth technique was employed in a fractional-N synthesizer, which switches the loop bandwidth from a large to a small loop bandwidth. It achieves 20 µs start-up time, which is 4 times smaller than that of a fixed loop bandwidth.

The leakage current and current consumed during idle modes of the processors were evaluated and an analysis of how the impact on the overall picture of energy consumption was presented. Thus energy consumption was explored for the two processors from both a dynamic and static energy consumption perspective.

In order to reduce the power consumption of the frequency synthesizer, trade-off between the analog and digital components were exploited. In detail, loop bandwidth of the sigma-delta synthesizer is optimized, exploiting noise and power properties of divider, VCO and sigma-delta complexity. For high data rate FSK modulation, a closed loop direct VCO modulation technique was employed. The chip uses 0.25 µm CMOS in a SiGe BiCMOS technology. The block diagram and the die photo of the chip are shown in Figure 2 and 3, respectively. The summary of chip test results are shown in Table 1. continued 6

continued

Low Power RF Front-End for Wireless Microsensor Systems

Personnel A. Y. Wang (C. G. Sodini)

Sponsorship ABB and NSF Fellowship

The design of wireless microsensor systems has gained increasing importance for a variety of civil and military applications. With the objective of providing shortrange connectivity with significant fault tolerance, these systems find usage in such diverse areas as environmental monitoring, industrial process automation, and field surveillance.

PostScriptPicture fs.ps

The main design objective is maximizing the battery life of the sensor nodes while ensuring reliable operations. For many applications, the sensors need to "live" for 1-5 years without battery replacement. To achieve this goal, the microsensor system has to be designed in a highly integrated fashion and optimized across all levels of system abstraction. This also means that all the characteristics particular to the microsensor system must be exploited. One such characteristic is that the RF output power is small due to the short transmission distance, which makes the transmitter electronics the dominant source of energy dissipation.

Fig. 2: Architecture of the energy efficient transmitter

In this research, the impact of circuit non-idealities including noise, nonlinearity, and modulation errors upon system performance are analyzed, and these effects are incorporated into the design of key front-end components. In addition, the effect of increasing the RF transmit power, which is small, to compensate for the SNR loss due to circuit non-idealities is investigated. This can potentially lower the performance specification of the RF front-end circuits and reduce the over-all power consumption.

Fig. 3: Die photo of the chip

Frequency: VCO phase noise: Data rate: Start-up time: Power consumption: Size :

6.1 – 6.9GHz -112dBc/Hz @ 1MHz 5Mbps BFSK, (2.5Mbps effective) 20ms 22.2mW (VCO:17.7mW, Rest:4.5µW) 1.2mm x 1.3mm

Table 1: Summary of chip test results

7

Hardware Architecture for a Power-Aware Microsensor Node

Personnel N. Ickes, F. S. Lee, and P. Phanaphat (A.P. Chandrakasan)

Sponsorship DARPA, ARL Collaborative Technology Alliance, and Hewlett-Packard under the MIT Aliance 2.45GHz band, and is built from commercial parts. A 6level, power aware power amplifier is used to allow the radio to react to environmental and system-driven requirements. A small feature size antenna is integrated onto the PCB, as well as circuitry to provide automatic tuning of the varactor in the discriminator circuit. The maximum bit-rate of a point-to-point wireless link is 1Mbps. The link layer is integrated onto the radio PCB and acts as a memory storage block as seen by the µAMPS processor board. Extensive power routing, microstrip layout, and noise isolation techniques are used in the layout of the µAMPS radio board.

The second prototype of a power-aware microsensor node has been developed for the MIT µAMPS project. The improvements from the previous generation include smaller footprint size of the node to facilitate unobtrusive sensing, baseband and slave snap-on modules, additional hardware controls at all levels for precise power manipulation, improvements in the radio performance that allow for transmission distances up to 100 meters, and improved circuits. Additionally, a link layer was fused with the radio board, and a MAC layer has been developed to demonstrate the innovative power aware features of the µAMPS system-driven design.

The power aware features of the µAMPS board allow for thirteen relevant power consumption stages. Among those thirteen states are six states of different power amplifier gains to support transmission distances from 10 meters to 100 meters. In the off state, the radio consumes no power. In the idle state, the radio consumes 60mW, mostly due to the off-chip VCO and idle current of the commercial transceiver chip. In the receive state, the radio consumes 280mW. In the lowest transmit state, the radio consumes 330mW. In the highest transmit state, the radio consumes 1.1W. There are other minor intermediate stages that have not been described which can be intelligently used in local data management algorithms to improve power awareness. By system flexibility, low power circuit design and creative system-driven algorithms for the MAC layer, this versatile radio is able to help demonstrate some basic techniques that contribute to a power-aware wireless sensor network.

SENSORS: Microphone amplifier stages and an integrated 12-bit A/D are added to the processor board. The higher order filters of the mircrophone circuit and improved noise filtering allow for accurate data gathering. The first µAMPS application will be acoustic sensing. PROCESSOR AND MEMORY: The µAMPS processor module is a 55mm square PCB with a StrongARM CPU, SRAM, and flash ROM. The module also includes highefficiency dc/dc converters that provide regulated power for all of the digital electronics in the mAMPS node. The StrongARM core is powered by a special variable-voltage dc/dc converter that allows the core voltage to be adjusted, on the fly, to match the processor's clock speed (which also can be adjusted on the fly). This significantly reduces the leakage power dissipated in the processor. The operating system used on µAMPS node is RedHat eCos, This operating system was chosen because of its high degree of configurability, which allows the system's memory requirements to be minimized. Custom power management functionality has been developed and added to eCos, in order to exploit the power scaling capabilities of the µAMPS hardware.

MAC LAYER: In a distributed wireless sensor network, hundreds of randomly scattered microsensor nodes are capable of ubiquitous sensing and data gathering. In such a system, a need to prolong the lifetime of the network is crucial and limited by the battery capacity. As communication traffics among sensor nodes are driven by sensing events, power-aware features can be integrated in each layer of protocol stacks design.

RADIO: The second prototype of the µAMPS radio matches the 55mm square footprint for the node module stack. The radio module operates on the ISM

continued

8

continued

Subthreshold Circuit Design for Ultra Low Power Sensor Systems

Personnel A. Wang (A. P. Chandrakasan)

Sponsorship DARPA, ARL Collaborative Technology Alliance, and Lucent Fellowship Energy efficient Digital Signal Processors (DSPs) are becoming increasingly important with the growth of portable, wireless, battery-operated applicances such as cellular phones, Personal Digital Assistants (PDAs) and laptops. In the constantly changing environment of a portable device, an energy-aware DSP is required for long battery lifetimes and high system efficiency. An energy-aware DSP will be able to adapt energy consumption as energy resources of the system diminish or as performance requirements change. This is in contrast to low power design, which targets the worst case scenario and may not be globally optimal for systems with varying conditions.

Specifically, embedding power-aware features in Media Access Control (MAC), and network layer, promises to extend the lifetime of the sensor network. We have chosen TDMA as a MAC layer protocol for its inherited power-aware mechanism of radio shutdowns outside its TDMA slot and in absence of sensing events. Another level of power-aware features can be deployed in MAC ID and TDMA slot assignments. In a field of scattered sensor nodes, not all the nodes are in radio range of one another or of the base station. Hence, assigning N TDMA slots for the network of N sensor nodes that are not all in radio range of one another will waste the receiver energy and link bandwidth. To solve this problem, TDMA slot can be assigned efficiently by being mapped to each node's MAC ID, given that MAC IDs are reused in the network. As the number of different MAC IDs needed for node assignments varies with the number of nodes that are in 2-hop radio range of one another, varying the transmit power and the node density can optimize the system lifetime. Power scalability will be illustrated on µAMPS sensor node prototypes, with TDMA Media Access and vehicle tracking application.

My research into energy-aware DSPs focuses on several aspects of energy-aware design : algorithm, system, architecture and circuits. In particular, we have been exploring the use of subthreshold circuits for ultra low power computational engines. Subthreshold circuit design is the design of circuits whose voltage supply is scaled down below the threshold voltage. These circuits use the subthreshold leakage currents to switch load capacitances, currents which are orders of magnitude lower than in the strong inversion regime. We will explore different logic families and memory design for the subthreshold regime, and these techniques will be demonstrated through the design and implementation of a 1024 FFT processor operating at voltage supplies as low as 100 mV.

Fig. 4: Radio Board

9

Low Power, Application-Driven Communication Hierarchy for Wireless Microsensor Networks

Personnel R. Min (A. P. Chandrakasan)

Sponsorship DARPA, ARL Collaborative Technology Alliance, Hewlett-Packard under the MIT Aliance , and NDSEG Fellowship constraint lengths of convolutional codes. It is therefore imperative that an Application Programming Interface (API) bridge the gap between these low-level “knobs” for energy scalability and those performance parameters more accessible to an application. We have proposed a power-aware communication API that allows the application to bound four properties of the wireless transmission: range, reliability, latency, and energy. Power-aware middleware converts these applicationlevel constraints into the least-energy parameter settings for the energy-scalable hardware. In other words, the energy consumption of hardware—which is typically expressed in terms of physical parameters such as voltage—is evaluated as a function of the application level parameters of latency, reliability, and range.

Low power communication is crucial for maximizing the operational lifetime of energy-constrained microsensor networks. An energy-efficient communication system must trade-off energy gracefully in exchange for some quality metric and incorporate application-specific components that are tuned for specific tasks and operational scenarios. These guiding principles appear in design examples at three levels of the communication hierarchy: the routing protocol, the middleware/API, and digital integrated circuits, as illustrated in the figure below.

We are currently designing a communication processor for wireless sensor networks. The processor will feature an instruction set and datapath organization that is conducive to wireless media access and protocol handing. Energy-quality tradeoffs will be enabled through dynamic voltage scaling, possibly into subthreshold operation, and dynamically reconfigurable functional units.

Fig. 5: Power aware design accross the communication hierarchy.

Designing energy-efficient protocols for high-density networks of thousands of nodes can be a daunting task. One productive approach is to borrow the lesson from application-specific circuits that reducing unnecessary functionality consequently reduces energy. Applicationspecific protocols improve communication efficiency by tailoring their behavior to the expected needs of the application. We have designed an address-free forwarding scheme that offers a simple and elegant solution to the problem of forwarding sensor data to a base station, in an environment where radio receivers may shut down frequently and arbitrarily to conserve network energy. Nodes obtain a metric of their distance to the base station, and packets are forwarded dynamically based upon receiving nodes' distance to the base station rather than a specific address.

In summary, application-specific, energy-scalable design across the communication subsystem—from protocols to integrated circuits and the glue between them—will be a key enabler for energy-efficient microsensor communication.

Application and protocol designers that utilize wireless communication typically do not wish to concern themselves with processor voltages, transmit power, or the 10

Ultra Wideband Radio

Personnel R. Blazquez and P. Newaskar (A. P. Chandrakasan)

Sponsorship La Caixa Fellowship and Presidential Fellowship Architectural issues aside, there are some interesting circuit-level challenges that include high-speed, lowpower analog-to-digital conversion, low-jitter clocking, antenna co-design and pulse shaping.

Recent approval by the Federal Communications Commission of Ultra-Wideband (UWB) Radio has rekindled interest in the development of this promising technology. This wireless system uses a bandwidth greater than 1 GHz to transmit information, while keeping the average transmitted power below the noise floor so as not to interfere with existing services that already use the same band. A typical UWB signal is a collection of narrow pulses (0.2 to 0.5 ns) with a very low duty cycle (1%) as shown in Figure 6. Each user is assigned a unique Pseudo-random Noise (PN) sequence that is used to encode the train of pulses, either in position or polarity. UWB radio offers high bit rates, low probability of interception and detection, precise locationing capability (stemming from the sub-nanosecond timing resolution) and the possibility of implementation using an exceedingly simple architecture.

Fig. 7: Architectural Comparison

As a first step in this project, a behavioral model of a UWB system is being developed in Matlab to understand and quantify the different trade-offs involved. The design of a prototype using discrete, off-the-shelf components and a test-chip are underway. We intend to eventually integrate the entire system on a single chip. Fig. 6: Structure of a UWB signal

The goal of this project is the development of a novel transceiver architecture in which the signal is digitized as close to the antenna as possible. An “all-digital” architecture means low-cost, ease-of-design and all the associated benefits of CMOS technology scaling (Figure 7). Furthermore, it allows for full-programmability in terms of synchronization, signal processing and demodulation algorithms, thus approaching the vision of a software-configurable radio. 11

Circuit Techniques for Subthreshold Leakage Reduction in a Deep SubMicron Process

Power Aware Hardware Reconfigurable Digital Signal Processing Architecture for Wireless Communications

Personnel

Personnel

B. Calhoun (A. P. Chandrakasan)

F. Honoré (A.P. Chandrakasan)

Sponsorship

Sponsorship

Texas Instruments

MARCO Focused Research Center on Interconnect (MARCO/DARPA)

The trend of process scaling for CMOS technology has made subthreshold leakage reduction a growing concern for submicron circuit designers. Power consumption has become a principle design consideration as device sizes decrease and many more devices fit on a single chip. Since switching power is proportional to Vdd2 , new processes are tailored for lower supply voltages. The decrease in Vdd slows down devices which requires that the threshold voltage, Vt, must be lowered to maintain performance. This reduction of Vt produces the exponential increase of subthreshold leakage currents that has become well known.

Energy dissipation is a critical design constraint for integrated wireless systems, particularly for mobile applications. For deep submicron designs, subthreshold leakage is becoming an increasing component of over all energy consumption. This research explores the use of architecture and circuit techniques to address energy consumption in both standby and active modes. The proposed system includes a novel architecture for an energy efficient FPGA core. FPGA's have been shown to be computationally efficient for implementing signal processing functions based on Distributed Arithmetic (DA). Tuned for DA signal processing, the logic blocks in our design allow can significantly increase design density thereby relieving pressure on interconnect resources. The logic block architecture features the ability to automatically power down inactive elements to reduce current leakage. Interconnect elements, a dominant source of power consumption, are tuned to match this DA structure. The FPGA core is currently being targeted for a 1.0volt, 0.13 micron, dual voltage threshold logic process with a full custom design flow.

Field Programmable Gate Arrays (FPGAs) are one type of chip that could benefit from subthreshold leakage reduction techniques. The programmable nature of FPGA designs is amenable to standby mode leakage reduction since some of the logic blocks might be unused. Leakage reduction for programmed blocks would make FPGAs more attractive to the designers of battery-operated devices. Our research demonstrates circuit techniques to reduce subthreshold leakage for the TSMC 0.13 µm process. Although applicable to generic CMOS circuits in this process, the techniques in this work are specifically intended for use in low power FPGA design.

The FPGA core when embedded with a microprocessor core and environment monitoring hardware (shown in the figure) can be dynamically reconfigured to allow for tradeoffs in energy usage versus algorithm complexity.

A testchip uses Multi-Threshold CMOS (MTCMOS) style logic design to implement a new type of FPGA architecture for distributed arithmetic. The circuits are designed to provide maximum reduction in standby leakage current without degrading performance of the circuits by more that 10% relative to an all low-Vt design. Sections of the FPGA which are not used in a given configuration are automatically placed in a low leakage state. The design also uses flip-flops that retain state in standby mode.

continued

12

continued

Novel Techniques Addressing Delay and Power Issues in Deep Sub-Micron Buses

Personnel T. Konstantakopoulos (A.P. Chandrakasan)

Sponsorship MARCO/DARPA

In deep-submicron technologies the primary component of delay is shifting from logic gates to the interconnect network. Buses can no longer be considered as a set of independent lines that don’t interact. A more appropriate model would treat the bus as a distributed system, where a transition on a line would affect adjacent lines as well. However, the transitions on the buses can be grouped into delay classes, depending on the effective capacitance that the driver circuit needs to charge. A very effective way to reduce delay in buses is by eliminating the transitions that are relatively time consuming. We are using coding schemes to accomplish that, by increasing the number of lines in the bus, thus imposing some redundancy. In our implementation we are mapping a 4-line bus to a 6-line bus. Another significant problem in modern VLSI is the way power is distributed to the various components of a chip. The portion of the power dissipated in the interconnect network is increasing rapidly with technology downscaling. Therefore, smart power aware techniques have to be introduced in order to minimize this portion. As a consequence, this would reduce the total power consumption of the whole circuit. Our approach exploits charge recycling. The stored charge is redistributed and shared among the lines that make a transition. We are building smart and effective bus drivers that generate the control signals, implementing the charge recycling technique. The resulting partial charge conservation reduces power consumption.

Fig. 8:

13

Active Optical Clock Distribution

Personnel T. Simpkins (A. P. Chandrakasan)

Sponsorship MARCO Focused Research Center on Interconnect (MARCO/DARPA) and NDSEG Fellowship Clock distribution has become a major problem in integrated circuits. Although clock cycle times continue to decrease, the time allocated to uncertainty in the clock due to skew and jitter has remained constant. Therefore, the percentage of the clock budget devoted to uncertainty has become significant.

a low-resolution time-to-digital converter was included to measure the skew between the optical current input and the electrical clock output.

One solution to the clock uncertainty problem is to distribute the clock optically. Conventionally, this has involved using a transimpedance pre-amplifier to convert the optical current pulses from the photodetector into voltage waveforms. An inverter-based cascade is then used to amplify the clock pulses into full-swing signals that drive the local clock buffers. Past research has shown that this approach is limited by the imperfect matching of amplifiers from one block to another. Arising from process, voltage, and temperature, these variations can significantly increase the skew, thus negating the benefits of distributing a skewless optical clock.

PostScriptPicture

Our research focuses on an alternative approach to optical clock distribution. Whereas the cascaded amplifier approach attempts to convert optical current pulses into an electrical waveform, our proposed architecture uses an optical reference clock to deskew an electrical clock. The architecture resembles that of a Delay-Locked Loop (DLL) in that a voltage-controlled delay line is used to synchronize the fully-buffered electrical clock with the optical current pulses from the photodetector. The use of a feedback-based architecture allows the loop to compensate for variations due to process, voltage, and temperature, and thus minimize skew.

Fig.9:

A test chip demonstrating this concept was fabricated using the TSMC .18 µm process. The chip comprised three instances of the architecture, including both opened-loop and closed-loop versions. The photodetectors on the test chip were of the lateral PIN type and were implemented in standard CMOS, although the architecture is also compatible with other types. Finally, 14

Software Tools for Process-Sensitive Reliability Assessments of IC Designs

Personnel S. M. Alam, D. E. Troxel, and C.V. Thompson

Sponsorship MARCO Focused Research Center on Interconnect (MARCO/DARPA) Integrated circuits are often designed using simple and conservative ‘design rules’ to ensure that the resulting circuits will meet reliability goals. This simplicity and conservatism leads to reduced performance for a given circuit and metallization technology. To address this problem, we had developed a TCAD tool, ERNI, which allows process-sensitive and layout-specific reliability estimates for fully laid out or partially laid out integrated circuits (Figure 10). Fig. 10: A flowchart for a full hierarchical circuit-level reliability assessment, the basis for the prototype tool ERNI.

Circuit-level reliability analyses require reliability assessment of a large number of sometimes complexly connected interconnect trees. We have shown through modeling and experiments that the resistance saturation observed in straight via-to-via Al lines, which can lead to immunity from electromigration-induced failure, also occurs in more complex interconnect trees. We have also shown that trees will be ‘immortal’ if their effective current-density line-length product, (jL)eff, is below a critical value. The jL product that defines immortality can be determined from experimental characterization or simulation of the reliability of straight via-to-via lines. Simple tests for tree immortality can be used in a hierarchical way to eliminate trees from further more computationally intensive reliability assessments. After filtering of immortal trees, the reliability of mortal trees must be assessed. This can be done through reliability simulations with individual trees, but this computationally intensive method should be reserved for the most problematic trees, those with the least reliability, and which are least convenient to ‘fix’ through layout modifications. We have suggested computationally simple and conservative ‘default’ models for assessment of tree reliabilities based on the Korhonen analysis and have tested models and simulations through experiments on simple interconnect trees.

referred to as the three-dimensional or 3D integration of ICs. Although there has been some research on the impact of 3D integration on chip size, interconnect delay, and overall system performance, reliability issues in the 3D interconnect arrays are fairly unknown. We have extended the reliability concepts in ERNI and developed a framework for reliability analysis in 3D circuits with a novel Reliability Computer Aided Design (RCAD) tool, ERNI-3D. Using ERNI-3D, circuit designers can get interactive feedback on the reliability of their circuits associated with electromigration, 3D bonding, and joule heating. As 3D integration technology is not yet widespread, and no CAD tool supports IC layouts for such a technology, we first developed a comprehensive 3D-circuit layout methodology. The circuit on each wafer or device-interconnect layer can be laid out separately with inter-wafer via information embedded in the layout. The inter-wafer via information is generalized into three categories sufficient for defining all types of interconnection between wafers in a 3D stack (Figure 11). A strategy for layout-file management that incorporates the orientation of each wafer in the bonding process was also developed. We have implemented the layout methodology in 3D-MAGIC, an extension of MAGIC originally developed at UC Berkeley and widely used in academia. Test circuits designed with 3D-MAGIC include a 3D 8-bit adder and an 8-bit encryption processor mapped into a 3D FPGA.

Recent development in semiconductor processing technology has enabled the fabrication of a single integrated circuit with multiple device-interconnect layers or dice stacked on each other. This approach is commonly

continued

15

continued

Fig. 11: Different types of via/contact for 3D ICs.

The reliability CAD tool, ERNI-3D, parses 3D circuit layouts and extracts both conventional and 3D interconnect trees. It employs the Hierarchical Reliability Analysis approach, and filters out a group of immortal trees using their current-density length products. After the filtering process, more accurate, but more computation-intensive reliability models are applied to the remaining interconnect trees to compute their median and mean times to failures. Finally, all the different times to failures are combined using a joint probability distribution to report a single reliability figure for the whole chip. This initial version of ERNI-3D treats 3D circuits with two wafers or device-interconnect layers in the stack (see figure 12). However, the data-structures and algorithms in the tool are generic enough to make it compatible with 3D circuits with more than two device-interconnect layers and to allow the incorporation of more sophisticated reliability models in the future. Future work will also include modification of ERNI-3D to account for newly discovered differences in the geometry dependence of the reliability of Cu-based interconnects compared with Al-based interconnects.

Fig. 12: Graphical User Interface of ERNI-3D. Here ERNI-3D is run on a 2-wafer 3D 8-bit Adder layout.

16

The Low-Power Bionic Ear Project

Personnel M. Baker, C. Salthouse, J.-J. Sit, and S. Zhak (R. Sarpeshkar)

Sponsorship The David and Lucillle Packard Foundation

The aim of the project is to construct a cochlear-implant processor for the deaf that has the potential to reduce the current power consumption of such processors by more than an order of magnitude via low power analog VLSI processing. In addition, a cochlear implant processor that is based on the architecture of a silicon cochlea, i.e., on an analog electronic model of the inner ear, is being explored for its potential to revolutionize patient’s speech recognition in noise (Rahul Sarpeshkar, Lorenzo Turicchia, George Efthivoulidis, and Luc Van Immerseel, “The Silicon Cochlea: From Biology to Bionics”, accepted paper, Proceedings of The Biophysics of the Cochlea: Molecules to Models Conference, Titisee, Black Forest, Germany, July 27-August 1, 2002.) Several building block circuits for such a processor including a 100uW analog front end, a programmable bandpass filter, and a logarithmic map circuit were designed. Figure 14 shows a chip photograph of a DAC programmable fourth-order programmable bandpass filter that operates on 6uW of power consumption with over 60dB of dynamic range on a 2.8V supply (Christopher Salthouse and Rahul Sarpeshkar, “A Micropower Bandpass Filter for Use in Cochlear Implants”, accepted paper, IEEE International Symposium on Circuits and Systems, Arizona, May 2002.)

Fig. 13: Overview of a Bionic Ear System.

Figure 13 shows the overall system architecture of a current bionic ear system (cochlear implant system). Sound that is transduced from a microphone is eventually converted into electrode stimulation in surgically implanted electrodes. The aim of this project is to reduce the power consumption to levels that will enable fully implanted systems to become a reality.

Fig. 14: Capacitive Attenuation Filters built on AMI’s 1.5µm BiCMOS process.

17

Intelligent Transportation Systems

Parking Assistant System

Personnel

Personnel

M. E. Ben-Akiva, J. F. Coughlin, B. K. P. Horn, H.-S. Lee, I. Masaki, M. Parent, T. B. Sheridan, C. G. Sodini, J. M. Sussman, and J. L. Wyatt

D. Putthividhya (I. Masaki and B. K. P. Horn)

Sponsorship

Sponsorship

Intelligent Transportation Research Center at MIT's MTL

Intelligent Transportation Research Center at MIT's MTL US citizens are spending, on average, about $1,000 per year for cars, trucks, and roads. The transportation is an important infrastructure for our society. The goal of this project is to develop a technical foundation for tomorrow’s transportation systems. Currently we have a number of infrastructures which are independent from each other. Examples include infrastructures for transportation, communication, finance, health care, emergency care, and others. In the next generation, these independent infrastructures will be integrated more closely with advanced information technologies. For example, highway tolls can be charged to a driver’s bank account automatically with electronic toll gates connected to bank computers. If a car accident occurs, as another example, the accident can be detected by an air-bag sensor and reported automatically through wireless network to ambulance stations. The ambulance and hospital will have teleconference on the way from the scene to the hospital for a quick care.

The parking assistant system is a small part of a bigger project in anattempt to develop smart vehicles with various additional functionality toassist aged drivers in driving. The goal of the parking assistantproject is to develop a real-time system that aids the drivers indifferent difficult parking scenarios. With the conventional method,the drivers are forced to look back through the rear window toestimate the distance away from the obstacles. Sometimes passengerssitting in the backseat can block this view and the driver's abilityto estimate the distance from the obstacles deteriorates. The parkingassistant project proposes an attempt to solve this and other similarproblems in parking. The goal of this project is to develop a real-time system that takesin a sequence of images taken at a video rate (30frames/second) and output a 3D reconstruction of the environment in the parking lot. Avideo camera with a wide-angle lens is mounted next to the rear windowto look over the back of the vehicle. Feature points are thenextracted from the images and the correspondence problem is solved totrack the feature points along the sequence of frames. Due to the poorquality of the images extracted from a video sequence, most of the algorithms to detect feature points in grey-level images that we haveexplored do not yield satisfactory results. We then resorted to a coloredge detection method that gives a more promising result shown comparatively in Figure 15 and Figure 16.

With this project, we are working on various research topics ranging from small-scale systems to large-scale systems as well as fundamental to application oriented subprojects. Examples of small-scale subprojects are an adaptive dynamic range image acquisition chip, an array processor chip, and a time-to-collision chip. Mediumscale systems include a personal-computer-based realtime three-dimensional machine vision system, a fusion system of machine vision and radar sensors, and an image recognition system for compressed three-dimensional images without decompression. Examples of largescale systems are a network for real-time image transfer, train control architecture, policies for intermodal systems which consists of cars, trucks, trains, airplanes, and other transportation means.

Geometrical constraints are then applied to compute the distance ofthe objects in the environment from the vehicle using the disparity ofthe matching feature points that have been tracked. A sparse depth mapof the environment in the parking lot is then obtained. After thisstep, depending on the applications, we might proceed to plan a pathon how to go about parking in the space available. The implementationof the whole system needs to be robust against different lightingconditions in the different types of the parking lots–out–door

The research is being carried out at the Intelligent Transportation Research Center in MIT’s Microsystems Technology Laboratories. The center is being sponsored by several member companies.

continued

18

continued

Image Sensor Network

Personnel N. S. Love (I. Masaki and B. K. P. Horn)

Sponsorship Intelligent Transportation Research Center

Traffic control centers use several methods to monitor traffic conditions. Currently, the most popular methods involve the use of cameras (image sensors) at high-density traffic locations. The cameras are controlled at the traffic control centers. The traffic control center determines which images to view based on whether an accident has been reported, in some cases images from various cameras are cycled through or directly selected by an operator to view. Our system distributes the control to the image sensor. Each image sensor sets its own priority based on the contents of the image and uses mobile agents to decide if an image is sent to the traffic control center. This shift in the control of the images from the traffic control center to the image sensor is a more efficient method than centralizing control. The goal of this work is to reduce the transmission load of image sensor networks by distributing processing tasks and providing select images in an efficient and timely manner.

vs. indoor parking. The accuracy in estimating the distance is alsobeing studied as the acceptable range tolerance of a reliable systemshould be no more than 10 cm.

PostScriptPicture /afs/athena.mit.edu/user/p/u/putthi/thesis/fig1.eps

Fig. 15:

Process distribution involves using the image sensor network to perform object recognition and image compression on the images at the image sensor before the image reaches the control center. Figure shows the system overview. Providing select images to the user is achieved by using mobile agents. The control center dispatches mobile agents which search for images according to a user established priority criteria. At an image sensor, the mobile agent checks when the image is updated and the level of priority of the image. The mobile agent decides if an image is sent back to the dispatcher based on the priority criteria.

PostScriptPicture /afs/athena.mit.edu/user/p/u/putthi/thesis/fig2.eps

Each sensor has a processor, which acquires images and performs the 3D contour based image compression. The 3D image compression divides the image into three components: contour, color, and distance information and compresses them individually. Each component can be used together or separately to aid in object recognition without decompression of the image. Object recognition

Fig. 16:

continued

19

continued

A New Night Visionary Pedestrian Detection and Warning Systems

Personnel Y. Fang, I. Masaki, and B.K.P. Horn

Sponsorship Intelligent Transportation Research Center

In order to offer more security and safety for pedestrians and drivers at night, it is becoming more and more important to extend a driver's night vision capability, especially for older drivers or drivers with visual limitations. For this purpose, several night vision systems have been developed relying on infrared cameras, which detect heat from objects and are calibrated to be sensitive to the wavelength of humans and animals, giving the advantage of a broader and longer view than conventional headlights. Usually, such systems have separated display screens that display infrared image views (as in Figure18) when night vision systems are turned on at night.

is performed on the compressed images to determine traffic flow and to detect incidents. The image sensor assigns a priority level to each image based on its contents (i.e. traffic congestion has medium priority, accidents have high priority, etc…). Using the image sensor and mobile agents to complete processing tasks and to retrieve select images reduces the network’s transmission load. For example, a police station dispatches a mobile agent to the cameras. The police station requests images with the criteria for high priority level accidents. The mobile agent will only retrieve those images with accidents. Transmission of traffic accident images versus all available images improves the network efficiency. The research develops a demonstration where mobile agents are sent with a given criteria to several camera locations where the desired image is retrieved. Reduction of the transmission load will enable more users to obtain information in a timely manner without loss of performance. Distributed processing helps to minimize the transmission load by using the image sensors to complete normal processing tasks as opposed to processing at the control center. Mobile agents are equipped with the appropriate criteria to sift through the traffic information and to provide up to date traffic images and information to the user. The agents complement the image sensor network by providing select images based on criteria set by the user.

Fig. 18: Infrared Image at night

Fig. 19: Processed Infrared Image Edge corresponding to Fig.18.

Current systems require that a driver’s attention has to switch between normal view through a windshield and an infrared display screen, which leads to the following disadvantages. First of all, it is unnatural for drivers, and it is hard for drivers to combine the infrared display screen information with normal view. Second, switching attention back and forth is not safe when driving. Finally, it is still possible for drivers to ignore potential danger without a special warning aid because current night vision systems do not provide reliable collision warning functions.

Fig. 17:

continued

20

continued

A Binocular Vision System for Automated Vehicle

Personnel M.J. Kais (M. Parent and I. Masaki)

Sponsorship INRIA and Intelligent Transportation Research Center

Our projects are expected to overcome the disadvantages of current night vision systems. Instead of forcing drivers to look for extra information provided by infrared sensors, our systems will automatically project the information onto the windshield. It is important to reliably detect pedestrians at night. However, it is more difficult to detect pedestrians than vehicles because of vehicles’ headlights. Therefore, our focus is to successfully detect extra pedestrians whom a driver might hardly see under limited night lighting conditions, and to project the extra pedestrian information onto the windshield. Therefore, drivers can see as if they were driving under normal lighting situations. Pedestrian information can be accentuated in the format of bounded boxes or detailed contour (as in Figure 19) on the windshield screen. The improved windshield view will offer drivers clearer driving situations for their better judgment. In this way, drivers can easily recognize the potential obstacle without changing their driving habits.

In many urban environments, the usage of the private automobile has led to severe problems with respect of pollution, noise, safety and general degradation of the quality of life. Alternative solutions to the private automobile with the same flexibility now appear with a new concept of mobility: the automobile is part of the public transportation system and is used as a complement to mass transit and non-motorisedmotorized transportation. A new form of vehicle-sharing is now appearing with a new type of vehicle: the automated vehicle. These vehicles have automated driving capabilities on an existing road infrastructure where they just need a right of way, such as a dedicated bus-lane. Of course, with the existing technologies, the speed of these vehicles is limited at around 30 km/h but this is quite sufficient in many urban environments and the technology, as well as the infrastructures (with dedicated high speed sections) will certainly evolve. Some of these vehicles can also allow for traditional manual driving in order to run among normal traffic. In these cases, the vehicles are called dual-mode and their automated capabilities allow them to be put in platoons, for example in order to collect them.

For traditional optical camera systems, pedestrian detection is not an easy task and usually depends on machine learning algorithms, such as neural network, wavelet, support vector machine, etc., which are computationally heavy. For night driving situations, pedestrian detection becomes even harder. By introducing infrared images and systematically fusing both types of information, our scheme has the potential to provide better perception ability, decrease drivers’ tiredness, and reduce the loss caused by potential collisions.

We are currently working on binocular vision, which is an interesting technology since it is possible to get a map of the distance between each object in the image and the vision system. Moreover this technology allows us to perform a segmentation of the objects belonging to the plane of the road (lane marker, contact points between the obstacles and the road) and the obstacles (car, pedestrian, crash barrier …). This system is being developed to perform three key functions for an automated vehicle: lane marker detection, obstacle detection and the detection and location of specific target on the front of the vehicles to provide platooning capability.

21

Wireless Gigabit Local Area Network

Personnel A. Chandrakasan, H. S. Lee, and C. G. Sodini

Sponsorship Center for Integrated Circuits and Systems

The exploding number of electronic devices or “appliances” requiring high bandwidth communication will continue to drive the need for higher speed (Gigabitper-second, Gb/s) networking. We assume that the Next Generation Internet (NGI) will carry high-speed data to and from the home or office. However, a Local Area Network (LAN) within these structures is necessary to continue high-speed data transmission to and from end-use devices, such as cameras, displays, printers, high resolution video, mobile communicators, and novel devices. The enabling technology for this rich set of applications is a wireless Gb/s LAN, (WiGLAN), connected to the NGI.

A block diagram of the Wireless Gigabit Local Area Network, WiGLAN, is shown in Figure 20. We envision a network server being the gateway between the NGI and the local area network. Each appliance is attached to the network through a WiGLAN adapter, which is capable of providing a wireless connection to the network. This adapter should be physically small, implying a high degree of integration of the electronic functions required to interface digital data from the appliance to and from the network. The quality of service, QoS, which is a function of data rate and bit error rate, should be scaleable with power dissipation to permit battery operation of many appliances.

The WiGLAN offers several research challenges. First, there is a wide range of data rates, quality of service, and need for real time transmission to and from the appliances. For example, voice transmission over the network will not require high data rates but may require low power dissipation for portability. Interactive video transmission requires real time transmission and very high data rates especially as high resolution video and 3D graphics become available. System resources will need to be adaptive in order to support this wide range of appliances. Second, since many of the appliances will require portability, low power design techniques at the circuit, chip architecture and overall system level will be required. Third, this research requires synergy between a variety of disciplines including, communication system design at the physical layer, low power circuit and system design, digital signal processing algorithm and IC design, mixed signal IC design, and RFIC design. It also lends itself to a number of demonstration projects using some of the technology which results from this research. Besides the educational component of the PhD researchers directly involved, this program will generate a number of IC’s and algorithms which can be demonstrated by Masters student design projects.

The network requirements of high bandwidth efficiency and real time transfer led to our choice of a multi-carrier modulation, such as Orthogonal Frequency Division Multiplexing, (OFDM) using M-Quadrature Amplitude Modulation, (MQAM) signal constellations. We plan to digitize the entire signal bandwidth (150 MHz) available at the 5.8 GHz ISM band and adapt the bit rate (change M) within sub-bands according to the available Signal-to-Noise Ratio (SNR) and interference in the sub-band. A programmable digital signal processor will perform this adaptive modulation. The adaptive bit rate processor located in the network server will estimate the channel capacity by measuring the SNR and interference within sub-bands across the entire 150 MHz signal band. The channel estimation algorithm is a subject of this research. Depending on the SNR and interference, data modulation will range from simple phase shift keying (PSK) up to 256 level QAM with intermediate levels of QAM, (i.e. 4-QAM, 16-QAM, etc.) allowing for transmission of approximately 1b/Hz for PSK up to 8b/Hz for 256-QAM. In order to provide the capacity enhancements required to support the target data rates, the system to be developed will make extensive use of multiple-element continued

22

continued

Analog Base-band Processor for Wireless Gigabit LAN

Personnel M. Spaeth (H.-S. Lee)

Sponsorship SRC

The base-band analog processor performs necessary signal processing on the 150 MHz base-band signal in the transmit and receive signal paths for a wide-band wireless local area network. The individual channel characteristics depend on the RF signal fade and interference. Broadcasting to multiple appliances requires channel equalization at the receiver. In the receive (Rx) section, the wide-band amplifier amplifies the received signal from the RF transceiver network and is followed by a equalization filter. The amplitude of the signal following the channel equalization filter can very greatly, depending on the channel conditions, so a programmable gain amplifier is needed to better match the signal amplitude to the dynamic range of the subsequent analog-to-digital converter. The demodulation of the carrier is then carried out in the digital domain by a DSP.

antenna arrays for both transmission and reception. A key component of the proposed research will therefore be the development of computationally and power efficient space-time coding and space-time processing algorithms that exploit the substantial diversity benefit inherent in the use of such antenna arrays. At the implementation level, multiple-element antenna arrays require a separate receive and transmit channel for each antenna element. To efficiently meet this requirement we propose to build a system of parallel radios divided into three distinct Integrated circuits, namely RF, Mixed signal, and DSP. The WiGLAN network adapter consists of three functions, digital signal processing for multi-carrier adaptive bit rate QAM, a baseband analog processor performing data conversion and filtering, and an RF transceiver function which interfaces the modulated baseband data to a 5.8 GHz carrier. We will design and characterize integrated circuits to perform these functions.

There are tremendous technical challenges in the development of the base-band analog processor. The analog circuits in both the transmit and receive sections of the processor must handle 150MHz of signal bandwidth with high signal-to-noise ratio. These analog circuits include the Wide-Band Amplifier (WBA), the Programmable Gain Amplifier (PGA), the anti-alias filter, the channel equalization filter, the D/A converter in the Tx section, and the A/D converter in the Rx section. In order to digitize the wide 150 MHz signal band, the A/D converter must have an effective sampling rate of at least 300 MHz, and preferably above 600 MHz to ease the anti-alias and digital filtering requirements. The preliminary estimate of the A/D converter resolution needed to handle the wide dynamic range of the received signal is 12 bits. At present, such high performance is beyond the capability of monolithic silicon integrated circuits. Additionally, any harmonic and intermodulation distortion in the signal path produces spurious signals in other sub-bands, so the WBA, the PGA, the anti-alias and channel equalization filters and the A/D converter must exhibit very high SpuriousFree Dynamic Range (SFDR) in addition to wide band-

Fig. 20: Wireless Gigabit Local Area Network

continued

23

continued

Fig. 21: A top-level block diagram of the proposed A/D converter. 129 identical pipeline A/D channels are organized into 16 banks of 8 converters, with one additional converter used only as a skew timing reference. In this scheme 2 banks are pulled out at a time for calibration, so the remaining 112 converters operate at about 5.5 MHz to achieve the desired 600MHz aggregate sampling rate. 14 bit pipelines are used to generate 12 bit digitally error-corrected outputs. The converter bank that is actively digitizing the input signal receive the output of the frontend anti-aliasing filter. The converter banks that are under calibration may digitize DC values for gain and offset measurements or the fast ramp for timing skew measurements. The converter has two sets out outputs so that digitized signal samples and calibration data may be output simultaneously. The back-end DSP averages the calibration data, and generates the algebraic coefficients needed to correct the gain, offset, and timing mismatch errors.

continued

24

continued

Integration of Multiple RF Front-Ends

Personnel J. Liang (C. G. Sodini)

Sponsorship MARCO

The Wireless Gigabit Local Area Network (WiGLAN) project makes use of space-time diversity and adopts multiple antennas to improve the system capacity. Each antenna has a front-end analog circuit consisting at a minimum of an LNA, mixer and filtering before analogto-digital conversion and subsequent digital signal processing. We will implement multiple RF front-ends on the same chip; however, the coupling between these parallel radios can severely degrade the system performance, imposing major challenges for integration. The coupling can be reduced by adequate isolation among multiple signal paths.

width. In the Tx section, the D/A converter and reconstruction filter must posses similar performance levels. In order to address these technical challenges, we propose to investigate innovative techniques for the baseband analog processor. This work focuses on the implementation of the extremely high speed, high resolution, and wide-bandwidth A/D converter in the Rx section. To achieve high speed operation, some degree of parallelism is often employed. In a parallel time-interleaved converter, any mismatch in the gain, offset, or timing of the constituent channels results in undesirable harmonics in the output spectrum, related to the sampling rate of the individual channels. Therefore, the present time interleaving typically employs a small degree of parallelism, so that the harmonics either out of the signal band of interest, or below the quantization noise floor. Our approach is to use large-scale parallelism (64 or 128 channels) in a time-interleaved pipeline A/D converter. Back-end digital calibration is applied to account for static gain, offset, and timing mismatch errors between channels, so that the resulting calibrated output has sufficiently low spurious harmonics.

The focus of the research is to explore various isolation techniques and to design schemes to suppress the signal interference and RF coupling. In particular, the emphasis of the research is the interference reduction of the multiple RF front-ends. Balanced circuit configurations, which require two signal conductors for each signal of interest, can significantly reduce radiation and cross-talk, improve noise immunity, and essentially eliminate ground noise. Deliberate partitioning of the system and careful spacing of the building blocks can also reduce signal interference. Building guard rings and isolation structures around sensitive circuits, inserting ground planes or low resistivity layers, adopting heavily-doped or low resisitivity substrate materials can effectively suppress substrate coupling. We will investigate the effectiveness and limitation of different isolation methods and use them selectively with other novel techniques such as using downstream digital algorithms to achieve sufficient isolation for the integration of at least eight RF front-ends in the WiGLAN project.

Measurement and calibration techniques for gain and offset errors are performed using standard calibration techniques. By digitizing a fast ramp using one converter as a fixed timing reference for the remaining converters, the relative timing skew between channels can be discerned. The calculated timing offsets are then used to re-time the output data stream using polynomial interpolation in the DSP in the back-end. Thus all of of the calibration is performed using simple algebraic operations with minimal latency. To allow all of the calibration operations to be performed in the background, a small fraction of the available channels are systematically pulled out for calibration, while a novel tokenpassing control scheme selects which of the 'active' converters will sample the incoming signal. 25

Biasing and Power Combining Techniques for Power Amplifiers

Personnel A. Pham (C. G. Sodini)

Sponsorship SRC

Power amplifiers with conventional fixed biasing attain their best efficiency when operated at the maximum output power. For lower output levels, these amplifiers are very inefficient. This is the major shortcoming for use in recent wireless applications with an adaptive power design; where the output power is a function of the bit-error rate, channel characteristics, modulation schemes, etc. Such applications require the power amplifier to have an optimum performance not only at the peak output level, but also across the power range. One way to improve low-level efficiency is to use adaptive biasing, where the biasing circuitry senses the input signal, averages it, and adjusts the bias current based on the detected RF input level. The adaptive biasing technique is implemented using the IBM 6HP SiGe BiCMOS process for a class-A power amplifier. The simulated efficiency curve, and the test chip micrograph are shown in the following figures.

Fig. 22: Power added efficiency vs. input power with and with and without adaptive biasing.

Another way to improve efficiency performance across the output range employs multiple power amplifiers with a power combiner. In this case, a number of smaller power amplifiers, each of which is optimized for a fraction of the maximum required output power, feed their outputs to a low-loss power combiner. Depending on the input level, the controlling circuits turn on an appropriate number of power amplifiers. Such a configuration ensures that each power amplifier operates at or near its highest output level and improves efficiency across the output power range. The multiple power amplifiers combining technique also reduces constraints caused by low transistor breakdown voltages as technology scale. As a proof of concept, a four-port printed-circuit board micro-strip combiner has been fabricated for use with four identical power amplifiers. On chip combiners will be studied for the possibility of a fully integrated power amplifier using these power combining techniques.

Fig. 23: Layout of power amplifier with adaptive biasing using IBM BiCMOS 6HP process.

26

5.8 GHz Wideband Receiver for Wireless Gigabit LAN

Personnel L. Khuon (C. G. Sodini)

Sponsorship Center for Integrated Circuits and Systems (CICS) and SRC

To take advantage of space-time diversity algorithms, multiple receiver front ends are needed on a single chip. Direct conversion does not require an image reject filter and simplifies the Radio Frequency (RF) filtering requirements. The nature of multiple receivers on chip, however, implies that the homodyne’s local oscillator radiation would significantly interfere with nearby receivers since its frequency is in-band to the desired RF signal. In addition, a direct conversion receiver performs in-phase and quadrature (I/Q) demodulation in the analog domain. An I/Q phase imbalance, directly impacts the bit-error-rate performance. With a heterodyne architecture, the received signal can be digitized at the IF and the functions of I/Q demodulation along with channel selection can be performed in the digital domain.

technology (Figure 24). The filter incorporates an on chip inductor that has its quality factor enhanced through the use of a negative resistance circuit. The filter’s center frequency is tunable externally with a DC voltage. In addition, by controlling the DC current of the negative resistance circuit, the rejection response is also adjustable (Figure 25). This simple notch filter circuit performs rejection at the image frequency. This initial circuit serves as the building block for more complex responses for both the image band rejection and the signal band selection at the RF and IF stages. Besides the impact on the receiver’s noise and linearity performance, the design of integrated filters must also consider issues of stability and possible automatic frequency response adjustments to account for device tolerances.

The receiver for the WiGLAN performs amplification, filtering and downconversion of the 150 MHz signal centered at 5.8 GHz. The receiver downconverts the Radio Frequency (RF) signal to an Intermediate Frequency (IF) that is fed to the analog baseband processor where it is equalized and digitized. The design approach for the receiver is based upon block level analyses that consider the gain, noise, and linearity tradeoffs necessary for the WiGLAN’s adaptive modulation scheme. The focus of this research is the design of on chip filters within the framework of the Wireless Gigabit LAN receiver design. To reduce the effect of image frequencies for the heterodyne receiver, a dual conversion architecture is selected to allow for optimized frequency planning. As such, filters are needed for the band selection and image rejection at the RF carrier frequency, band selection at the first IF, and anti-aliasing at the low IF. The primary challenges are to obtain the necessary band and image rejection filtering with an integrated approach without severely degrading the system’s noise and linearity performance.

Fig. 24: Layout of notch filter for image rejection on IBM BiCMOS 6HP process.

An initial integrated image reject filter was designed, simulated, and fabricated in IBM BiCMOS 6HP process

Fig. 25: Tunable notch for device tolerances.

27

A CMOS-Compatible Compact Display

Personnel A. Chen (A.I. Akinwande and H.S. Lee)

Sponsorship 3M and MARCO

ting structure. Power conversion efficiency was approximately 10-6 (W/W), and we observed a broad emission spectrum peaking at 700nm. A test system consisting of the integrated circuit, microscope optics, and image intensifier has been constructed. Sample images were recorded, as shown below. Grayscaling was demonstrated at 32-levels, limited by the speed of the speed of the microcontroller.

The proliferation of portable electronic systems has created demand for high-resolution displays which are compact and highly energy-efficient. We have designed and built a proof-of-concept for a display that meets these design constraints. Our display uses a standard digital CMOS integrated circuit to produce a lowbrightness image, and an image intensifier to increase brightness to a visible level. Since a only very low light level needs to be generated from the CMOS chip the power efficiency is primarily determined by the intensifier which typically exhibit high efficiency. Moreover, exploiting high level of integration achieved by the CMOS IC, low power techniques such as pixel memory and data compression can be implemented further lowring the system power consumption. A display using our design should produce a daylight-visible image using approximately half a watt of power. Silicon devices can convert electrical energy into light, although their efficiency is very low. We use silicon lightemitting diodes to produce a very faint image which is optically coupled into an image intensifier. The image intensifier is a compact vacuum device that uses cathodoluminescence to increase the brightness of an image. It is comonly found in night vision scopes and scientific equipment. Cathodoluminescence, using a phosphor to convert electrons to photons, is an established technology used in cathode-ray tubes. Cathodoluminescent devices have high conversion efficiency (40 lumens/watt), high reliability, and can achieve very high output brightness (projection televisions).

Fig. 26: Top left: image from test system captured with CCD camera, Top right: 32-level grayscale demonstration, bottom left: circuit board including IC in 0.18um CMOS process.

In the near future, we will investigate circuit designs to support the integration of light emitters onto CMOS integrated circuits. Memory can be added to the display to eliminate the need for refreshing, thus reducing switching power. Analog, or digital multiple bits-perpixel memories are being investigated. In addition, row parallel current level addressing is being investigated. This addressing allows analog and digital calibration for precise brightness control of each pixel. High-level computation can be integrated on-chip to perform image processing, or data compression/decompression, or intelligent power management. High-resolution dis-

Our first research objective was to produce a laboratory demonstration of the system, and to quantify its performance. An integrated circuit with light-emitting arrays was fabricated in a commercial 0.18µm CMOS logic process. Each array measured 16x32 pixels and included a wordline decoder. Each pixel contained a 1-bit digital memory along with light emitter and driver circuits. We used the p+/nwell junction as a light-emit-

continued

28

continued

nal-to-noise ratio, which, combined with ultra-wideband would provide potentially very high data rate required in such data communication. Also, the host-todisplay wireless link has an added possibility of broadcasting to multiple displays.

plays require large input data bandwidth, for example computer monitors typically require over 2GHz bandwidth and interface circuits dissipate high power. For example, The Silicon Image Sil 161B digital video interface receiver dissipates 800 mW. Interface circuits using compression and/or circuit techniques such as lowswing signaling can reduce the interface power dramatically, lowering the overall system power. We will also be investigating a RF wireless link between the display and the host. For high resolution displays, even with on-chip data compression, the I/O data rate will still be very high. For this reason, the traditional narrow-band wireless link is not a suitable technology. We propose ultra-wideband data communication technology for host-to-display data communication. This technology can potentially be extended to chip-to-chip and back plane data communication as well. The ultra-wideband communication, which has been in limited use for medium-to-long range (~mile), low-data rate communication, employs a train of impulses rather than a single frequency RF carrier. The impulse train has a very wide frequency spectrum, typically DC- GHz range. Since the energy is spread in such a wide frequency range, there is negligible interference with traditional narrowband RF systems. Unlike narrowband transceivers, highly frequency selective circuits are unnecessary facilitating the integration of the entire transceiver. Also, the effect of the multipath can be mitigated, and even exploited by measuring the arrival time and the phase of the multipath signals. For this reason, the untra-wideband technology is more suitable for short-range, fixed environment communication than the application that has been in use. The host-to-display, chip-to-chip, and backplane communication can benefit from the ultra-wideband communication because they are typically short-range, fixed environment communication. The short-range nature of the host-to-display, chip-to-chip, and backplane communication could provide a reasonable sig29

Smart Active-Matrix Display Drivers For Organic Light Emitting Devices

Personnel E. Lisuwandi (V. Bulovic and C. G. Sodini)

Sponsorship MARCO Focused Research Center for Circuits and Systems (C2S2) and Center for Integrated Circuits and Systems In this project we are developing pixilated active matrix “smart drivers” for displays consisting of Organic Light Emitting Devices (OLEDs). Organic LEDs are perhaps the most promising novel technology for development of efficient, pixilated, and brightly emissive, flat-panel displays. They naturally emit over large areas, and offer the advantage of growth on lightweight and rugged substrates such as metal foils and plastic, with no requirement for lattice-matching.

that three integrated transistors control each pixel. Also, each column of 5 pixels shares one feedback circuit. The integrator compensation is used for each feedback circuit to ensure that the light output is matched to the reference input. The values of discrete components were chosen to stabilize the feedback loop. We use a digital video camera to serve the function of the integrated sensor. The digital part of the system and the image processing from the camera is done on a CPLD.

Organic LED devices, however, exhibit non-linear light output responses that complicate their implementation in an application requiring a fine control of the output light intensity. Specifically, the I-V characteristics of OLEDs depend on the cathode/anode type, organic compound layer thickness, and operating temperature. The power efficiency of pixels in a display will drift over time due operational degradation. The individual pixels in a display can then exhibit different aging, in accordance with their use. The brightness non-uniformities due to the differential aging will reduce the useful display lifetime.

The next phase of this project will involve developing an integrated circuit that interfaces to the transparent OLEDs. We can then explore the implementation of the same circuit on flexible substrates using low-cost semiconductors (such as amorphous Si, or printed organic transistors).

Our smart active matrix circuitry compensates for the OLED non-uniformities by monitoring light output and adjusting the driving conditions according to the OLED performance. The adjusted output provides a defectfree picture. In the final design a Si p-n detector integrated behind each pixel will give feedback to the driver circuits that will adjust the proper current level to derive a constant brightness output. Figure 27 shows a typical integrated structure in which a transparent OLED is used. In this design both OLED electrodes are capable of transmitting the emitted light, which is mostly observed on the top, but is also partially absorbed in the detector.

Fig. 27: An OLED pixel integrated with a “smart” Si active matrix driver. The Si photodetector monitors the intensity of the OLED pixel during the on state and provides feedback to the driving circuit to keep the light output intensity constant as the device efficiency changes with operation.

The initial demonstration of the concept uses discrete components assembled on a PC board according to the 5 x 5 pixel array design scheme drawn in Fig. 28. Notice continued

30

continued

A Differential CMOS Passive Pixel Imager

Personnel I. L. Fujimori (C. G. Sodini)

Sponsorship Lucent GRPW Fellow and Intelligent Transportation Research Center

Passive pixel sensors provide an alternative to the conventional Active Pixel Sensor (APS) for high-density CMOS imaging arrays. Similar to the history of the single-transistor DRAM cell, this one-transistor pixel cell boasts one main advantage over the APS. It can achieve a high fill-factor in a smaller area, leading to a highdensity array of pixels with high quantum efficiency. Experiments reveal a major weakness in passive pixels is a signal-dependent parasitic current that can contaminate charge signals in different parts of the array. In this project, we explain the origin of this parasitic current and demonstrate a Correlated Double Sampling (CDS) circuit in a differential architecture that removes its effects. The passive pixel consists of a high-efficiency n-well photodiode and one transistor for reset and row select. The charge difference between the output of a sensing pixel and a dummy cell kept in the dark is converted to a voltage with a sense amplifier at the bottom of every column. The differential architecture is advantageous in rejecting any common-mode signals such as ground bounce. Passive pixels are plagued with a signal-dependent parasitic current caused by photogenerated electrons collected by the reverse-biased junction of the column line at each pixel. The combined effect of the charge leakage from 256 cells on the column line can be significant and will appear as a parasitic current at each column line. This parasitic current, which is also present in active pixel arrays, is catastrophic in passive pixels because charge-to-voltage conversion does not occur within the pixel. The parasitic charge of a bright pixel can thus contaminate the output of a dark pixel in other rows in the column line and cause smear in the image. Two strategies were used to remove the effects of the parasitic current. The first was at the architectural level where a differential readout between a sensing and a

Fig. 28: (Left) A 5 x 5 pixel driver array with feedback. (Right) A single pixel detail.

continued

31

continued

Characterization Methodology of CMOS Processes for Image Sensor Applications

Personnel C.-C. Wang (C. G. Sodini)

Sponsorship Center for Integrated Circuits and Systems and TSMC

dummy column rejects the parasitic current that is common between the two columns. The second part consisted of removing the difference in parasitic currents between adjacent columns. The latter was achieved with a CDS circuit that senses the signal with the parasitic current during the first sample phase and then senses the parasitic current only during the second sample phase. The difference between the output of these two sample phases then purely corresponds to the pixel signal and is no longer dependent on the parasitic current.

CMOS image sensors have lower power consumption and better circuit integration ability compared to CCD image sensors. However, standard CMOS processes are optimized for circuit applications rather than image sensing. Modifications from standard CMOS processes are often required for better image sensor performance. In order to select the most efficient photodiode structure and diagnose the effects of the modified process parameters, a two-stage characterization methodology is developed. In the first stage, large photodiodes (~500 mm X 500 mm) with different junction structures, such as NW/Psub and N+/PW photodiodes, are implemented. This allows one to directly measure the fundamental junction properties of the diodes for image sensing. Two major parameters, quantum efficiency and leakage current are measured and compared. The large area of the diodes assures the measurement accuracy. Furthermore, the layouts of the diodes can be a bulk, strip or grid shape to study the area, edge, and corner components of the junction properties. In this first stage, the best junction type NW/Psub is identified for further investigation at the pixel level.

The improvements achieved with the differential CDS circuit were quantified in terms of column-to-column Fixed-Pattern Noise (FPN). As expected, the dark FPN values are similar at 0.4% with and without CDS. The difference becomes more pronounced as the light intensity increases and the parasitic current mismatches result in a much higher FPN in the absence of the differential CDS circuit. While this improved passive pixel imager addresses many of the problems that have plagued passive pixels in the past, it does not easily scale with increasing array sizes. The main limiting factor is the readout noise, which is proportional to the column line capacitance and inversely proportional to the pixel capacitance. The combined effect of these factors severely limits the intensity resolution that can be achieved for high-density arrays. The latest research efforts indicate that the noise may be suppressed by introducing a high load capacitance on the column amplifier. While requiring higher currents to achieve the original bandwidth, this method may allow passive pixels to remain in the imaging race for high-density arrays.

In the second stage, a small test pixel array (e.g. a 64 x 64 array) is implemented to study the pixel performance. Since this array is a miniature version of an imager, it contains all the characteristics of a large format imager. Its small size allows one to arrange several arrays with different designs on the same chip to remove wafer-to-wafer variation of the chips. Typical imager characteristics that can be measured with the test arrays include sensitivity, dark current, fixed pattern noise, random noise, and dynamic range. Process parameters, such as thermal treatment and sensor implant conditions, can also be finetuned with the test array. This two-stage approach provides a methodology to select the best junction structure and process parameters for a CMOS image sensor process. This approach has been implemented in a 0.25µm CMOS process.

32

Optoelectronic Integrated Circuits for Diffuse Optical Tomography

Personnel W. Giziewicz and C. G. Fonstad, Jr. in collaboration with S. Prasad and D. Brooks

Sponsorship NSF

adapting this technology for biomedical research; developing suitable signal processing algorithms and designing compact, high performance signal processing circuit arrays in the relevant electronics technologies to interface with the optoelectronic devices; and suitably packaging the OEIC chips for their biomedical utilization.

In August we began a three year collaborative research program with Professors Dana Brooks and Sheila Prasad of Northeastern University applying and using our technologies for monolithic optoelectronic integration to address problems and needs of biomedical research and diagnosis. Specifically we will be working to monolithically integrate light sources and detectors with complex high density, high performance electronic circuitry to realizate of a wide variety of sensors and measurement arrays for medical research and diagnostics.

The project team will be aided in this effort by its strong links with the North-eastern University Center for Subsurface Sensing and Imaging Systems, the Massachusetts General Hospital NMR Center, the Univeristy of Utah NIH/NCRR Center for Bioelectric Field Modeling, Simulations and Visualization, and the MIT Microsystems Technology Laboratory, and by integrated circuit processing support from Vitesse Semiconductor Corporation.

We have identified as an initial vehicle for applying this technology a integrated source/detector array for Diffuse Optical Tomography (DOT). The proposed unit will permit DOT observations with a resolution exceeding that of present techniques and will lead to the use of DOT in procedures and situations in which it is currently unfeasible. Stated in the most general terms, the NSF-supported effort is directed at developing, applying, and making available a technology to monolithically integrate III-V optical emitters and detectors with commercially fabricated, custom-designed integrated circuits to produce high resolution two-dimensional arrays of individually addressable smart excitor/sensor pixels tailored for biomedical research applications and studies. A representative pixel might measure 250 to 500 microns on a side, and contain, for example, a diode light emitter (LED or laser), one or more light sensors, and a significant amount of electronic signal processing circuitry. This basic unit is a building block from which a wide variety of biomedical optical measurement systems can be realized in a very rugged, compact chip-size format. It promises to lead, in the future, to totally new sensor geometries and measurement procedures. The challenges that the program will face include continuing development of the OEIC technology and 33

Mixed-Signal Design in Deeply Scaled CMOS Technology

Substrate Noise Charaterization Shaping in Mixed-Signal Systems

Personnel

Personnel

J. Fiorenza (H.-S. Lee and C. G. Sodini)

M. S. Peng (H.-S. Lee)

Sponsorship

Sponsorship

Center for Integrated Circuits & Systems and MARCO

Center for Integrated Circuits & Systems and MARCO

There are tremendous challenges in implementing mixed-signal systems on a single substrate in deeply scaled CMOS technologies primarily due to the negative impact of the technology on analog circuits. Nearly every aspect of scaling except speed goes against analog circuits. Lower power supply voltage severely restricts the signal range, requiring substantially lower circuit noise in order to keep the signal-to-noise ratio. Small geometry transistors exhibit far less voltage gain and greater threshold voltage mismatches than their predecessors. Attempts to overcome device gain limitations with conventional techniques such as cascode and regulated cascode aggravates already slim signal swing. The use of long-channel devices for higher gain inevitably compromises the circuit speed.

The basic demands of power, speed, and cost drive the ever tighter integration of all circuits in a system onto a single chip, or the so-called System on a Chip (SoC). This necessitates the integration of analog circuits with digital circuits. However, in this integration, the acute problem of substrate noise coupling arises. The noisy digital circuits inject noise into the common substrate, which can severely affect sensitive analog circuits. Improperly accounted for, this noise can degrade performance drastically, and in some cases destroy functionality. Up to now, most efforts in addressing this problem have been to ensure that analog circuits are robust enough to withstand the digital noise. These techniques include physical separation, differential architectures, and simulation. Hardly any effort has been placed on reducing the substrate noise itself.

In order to overcome the challenges, we are exploring innovative circuit techniques that avoid shortcomings of deeply scaled technologies, and actually exploit them in mixed signal systems. As the first step we have been investigating circuit techniques that overcome the device gain limitations without penalizing the signal swing or circuit speed. An innovative approach that we have developed employs two signal paths: the main path and the prediction path. The prediction path processes the signal 1⁄2 clock phase earlier than the main path at a reduced accuracy. The information obtained from the prediction phase is used to in the main path in order to compensate for the finite device gain, incomplete settling and other non-idealities. The two-path approach can be applied to many different classes of analog circuits including data converters, filters, instrumentation amplifiers, and many others. As the initial proof-of-concept, we designed a MOS sample-and-hold amplifier in a standard 0.18 µm digital CMOS process. The simulation predicts the accuracy corresponding to 100dB amplifier gain with no cascading. The chip design will be submitted for fabrication in April.

With this in mind, the focus of this research is to investigate the characteristics of the substrate noise as well as ways to cancel the injected substrate noise. We have implemented a test chip that includes a digital circuit as substrate noise generator and a delta-sigma A/D converter that samples the substrate noise. The digital circuit is operated in such way that it injects periodic noise waveform on the substrate. The delta-sigma converter is used as an accurate on-chip sampling scope to map the substrate noise as a function of time. The sampling edge of the delta-sigma A/D converter is moved relative to the digital clock edge. Figure 29 shows an example of measured substrate noise using this technique. In order to reduce the effect of the substrate noise, we propose to cancel and shape the noise in bands of interest with a feedback loop. This type of noise shaping is well suited for oversampling and bandpass type applications.

continued

34

continued

The substrate noice shaping loop is based on a deltasigma modulator loop with the substrate noise treated as quantization noise. The feedback D/A is replaced by an array of noise-injecting inverters. This has the advantage of simplicity and low power.

Fig. 30: Substrate Spec. w/out SN Shaping

Fig. 29: Measured substrate noise with an on-chip sampling scope.

Furthermore, the addition of an independent substrate noise shaping loop to the system will require little effort and little loss of performance. The analog and digital circuits can be designed as if there was no substrate noise shaping loop. As proof of concept, a prototype chip has been designed that integrates the substrate noise loop with a 16-bit delta-sigma A/D converter and a complex digital encryption engine onto the same substrate. The chip runs at 2.5V and has been fabricated in 0.25µm CMOS technology. The chip is currently being tested and characterized. Preliminary measurements show noise cancellation in desired bands of interest. Figure 30 and 31 show the substrate noise spectra before and after noise shaping. The substrate noise shaping reduces the overall substrate noise by 15 dB in 0-45kHz frequency band.

Fig. 31: Substrate Spec. w/ SN Shaping

35

RF Analog Circuit Design with Scaled CMOS Devices

Personnel T. Sepke (H.-S. Lee and C. G. Sodini)

Sponsorship MARCO C2S2 and Center for Integrated Circuits and Systems

Because of the prospect of low cost and high integration of scaled CMOS, much effort is being focused on its use for Radio-Frequency (RF) communication circuits. Low Noise Amplifiers (LNA) are essential building blocks for the design of communication systems, and some applications would benefit from the use of low cost CMOS transistor technology to build the LNA. MOS transistors are typically considered to be the noisiest transistor technology, but with the scaling of channel length, MOS transistors have fTs in the tens of gigaHertz (where fT is the transistor unity current gain frequency). High fT results in gain out to higher frequencies, and a lower noise figure at those frequencies. While the noise theory of long channel devices is well understood, and a handful of theories exist for short channel devices, complete noise characterization of short channel devices is in short supply. Our approach is to thoroughly understand the long channel noise theory, examine possible short channel theories, gather measured data on a set of scaled CMOS transistors, and interpret the results. Equipped with this new information, existing LNA designs will be evaluated for use with scaled MOSFETs and design improvements examined.

In short channel transistors, the physics of the drain current is different from the long channel case, and the expressions for ig, id and c are different from those van der Ziel derived. Based on the model from Pucel et. al. (Adv. in Electronics and Elec. Physics, 1975) for MESFETs, as the channel approaches complete velocity saturation, the correlation between the drain current noise and the gate current noise should approach unity. Pucel et. al. also considered the more general situation by dividing the channel into two regions, one where the gradual channel approximation holds and the other velocity saturated. Therefore, the results obtained under the assumption of complete velocity saturation represent a limiting case, and the actual performance should be between this solution and the long channel theory. Device measurements are required to verify any changes in the amount of drain current noise, gate current noise, and their correlation. Fortunately, the procedure to setup a high frequency, noise figure based measurement system is fairly well documented in the literature (Pucel et. al, MTT 1992), and commercial systems are available. In order to measure the high frequency performance of the transistors on chip, special coplanar waveguide probes and probing structures must be used. Care must be taken to remove the effects of the measurement equipment and the on-wafer probing structures. The results of these efforts are a complete two-port noise description of the device. These results can then be fit to the ig, id, and c model from above.

The classic long channel noise theory for intrinsic FET transistors proposed by van der Ziel (Proc. of IRE, 1962) defines two noise sources that are present at the device terminals. The first is the drain current noise (id), which originates from the conductance of the channel, and the second is the induced gate current noise (ig), which originates from the charge fluctuations in the channel when the drain current fluctuates. This description implies that the two would be completely correlated, but this is not the case. To be sure, they are dependent on each other, but due to the active nature of the channel, the two are only partially correlated (|c| = 0.395). Together, these sources give a complete description of the device noise suitable for a two-port Y-Parameter model.

Applying the results obtained from the measured data to LNA design is the final objective of this work. Depending on the significance of the correlation between the induced gate current noise and the drain current noise, the magnitudes of gate and drain noise currents, and the small-signal performance, variations or changes in the approach of LNA design in scaled CMOS technology will be investigated. 36

Circuit Design and Technological Limitations of Silicon RFICs

Personnel D. A. Hitko (C. G. Sodini)

Sponsorship SRC and HRL Laboratories, LLC

Wireless products and communications systems have thrived on the increased utility enabled by semiconductor technologies, the demand for which is necessitating ever higher communication channel frequencies to obtain wider bandwidth and to alleviate interference. This places still greater demands on the technologies used to implement the wireless systems; however, for a given application, the market determines the acceptable end product price based on convenience, functionality, and a comparison with substitutes, effectively setting a bound on the technologies that can be used. The limitations of these technologies in part determine the achievable performance, which then in turn may confine the very convenience and functionality being sought through the wireless system.

by sensing and using information about the environment, communications channel, and data to be transmitted. Incorporation of adaptability at the circuit and system levels is paramount in expanding capabilities and increasing utilization of wireless communication links, and yet remains a largely untapped resource in this field. This project entails the application of these concepts to the development of the component-level circuits required in an integrated RF front end for a 1GBit/s wireless network operating at 5.8GHz. The typical components in a receiver—Low Noise Amplifiers (LNAs), Voltage-Controlled Oscillators (VCOs), and mixers—are considered in this work. A set of VCOs and LNAs have been designed into a 0.5µm SiGe BiCMOS technology as data points to illustrate device and circuit optimization trade-offs. Direct comparisons of CMOS versus bipolar, the impact of transistor fT/fMAX, and the implications of design methodologies based upon linear time variant models are some of the issues being explored. The approach that has been developed along with information gleaned from the experimental circuits can provide a basis for shaping future integrated circuits, technologies, and system designs for wireless applications.

In this interplay of circuits and systems with technology where both price and performance are crucial, we are exploring two aspects that can yield significant improvements in the design of wireless systems. The first of these is the optimization of a broad range of RF circuits at the device level. By working through the exercise of designing the components needed to realize a 5.8GHz receiver, generalities are being sought which link technological parameters with system level performance. Using the receiver application to frame circuit constraints, device level issues are being studied to determine the physical origins of circuit limitations. Design techniques are then being investigated to mitigate these limitations to the extent possible, a procedure that aims to both optimize the circuits and underscore the degree to which the limitations are fundamental. Next, the knowledge of those factors that are limiting circuit performance are then used to devise methods of implementing circuits in which the performance— measured in terms of system level parameters—can be dynamically tuned to match real-time conditions in a power efficient manner. This facility provides the mechanism by which energy can be conserved in RF circuits 37

Superconducting Bandpass Delta-Sigma A/D Converter

Personnel J. F. Bulzacchelli (H.-S. Lee and M. B. Ketchen)

Sponsorship Center for Integrated Circuits and Systems

The direct digitization of RF signals in the GHz range is a challenging application for any circuit technology. Traditionally, flash A/D converters have been used to digitize signal frequencies above 1 GHz, but their resolution and linearity are inadequate for most radio systems, which must handle signals with a large dynamic range. Semiconductor bandpass delta-sigma modulators are used to digitize IF signals with high resolution, but their performance at microwave frequencies is limited by the speed of semiconductor comparators and the low Q of integrated inductors.

Fig. 32:

While such a high sampling rate improves the performance of the delta-sigma modulator, the challenges of high speed testing in a cryogenic environment are formidable. Even in the best cryogenic sample holders, the long cables used to connect the superconducting chip to room-temperature electronics have significant losses at frequencies above 10 GHz. Experimentally, we found two solutions for clocking the circuit at high frequencies. In the first approach (detailed in previous reports), we employ an optoelectronic clocking technique, in which picosecond optical pulses at a 20.6 GHz repetition rate are delivered (via optical fiber) to an on-chip photodetector, the current pulses from which drive a Josephson clock amplifier. In the second approach, the modulator is triggered by an on-chip clock source. An increase in bias current turns the Josephson clock amplifier into an oscillator tunable between 20 and 45 GHz. We found that surprisingly good frequency stability could be achieved with the on-chip clock source with careful adjustment of dc bias currents.

In this program, we present the design and testing of a superconducting bandpass delta-sigma modulator for direct A/D conversion of GHz RF signals. The schematic of the circuit is shown in Figure 32. The input signal is capacitively coupled to one end of a superconducting microstrip transmission line, which serves as a high quality resonator (loaded Q > 5000). The current flowing out of the other end of the microstrip line is quantized by a clocked comparator comprising two Josephson junctions. If the current is above threshold, the lower junction switches and produces a quantized voltage pulse known as a Single Flux Quantum (SFQ) pulse. If the current is below threshold, the upper junction switches instead. The pattern of voltage pulses generated across the lower Josephson junction represents the digital output code of the delta-sigma modulator. These voltage pulses also inject current back into the microstrip line, providing the necessary "feedback" signal to the resonator. At the quarter-wave resonance of the microstrip line (about 2 GHz in our design), the resonator shunts the lower junction with a very low impedance, the "feedback" current to the resonator is maximized, and the quantization noise is minimized. Because of the high speed of Josephson junctions and the simplicity of the modulator circuit, the maximum sampling rate exceeds 40 GHz.

Since the modulator output data rate exceeds the capacity of the interface to room-temperature test equipment, on-chip processing of the data is used to reduce the bandwidth requirements for readout. As explained in the 1998 MTL report, two segments of the modulator's bit stream are captured with a pair of 128-bit shift registers. The number of clock cycles skipped between acquiring the two segments is set by an on-chip procontinued

38

continued

Josephson junctions and represents one of the most complex circuits ever designed in this technology.

grammable counter (from 0 to over 8000). Cross-correlation of the two captured segments is used to provide estimates of the autocorrelation function R[n] of the modulator output, from n=0 up to a large value, such as n=8000. Fourier transformation of R[n] then yields a power spectrum with frequency resolution comparable to an 8K FFT of the original bit stream.

Fig. 34:

The test chip was fabricated at HYPRES, Inc. While the chip has been used with the 20.6 GHz optical clock, higher oversampling ratios and SNRs are attained with the on-chip clock source operating near 40 GHz. In the initial experiments, the programmable counter on the test chip was programmed so that the shift registers captured 256 consecutive bits from the modulator, so that 256-point FFTs could be calculated. The output spectra of the modulator at a sampling rate of 42.6 GHz are plotted in Figure 34. The width (about 500 MHz) of the input tone at 1.7 GHz reflects the low frequency resolution of the 256-point FFTs. The full-scale (FS) input sensitivity is -17.4 dBm (30 mV rms). Quantization noise is suppressed at 2.23 GHz and at higher frequencies corresponding to higher-order microstrip modes. The SNR (49 dB over a 20.8 MHz bandwidth) is limited by the frequency resolution of the measurements but still exceeds the SNRs of semiconductor modulators with comparable center frequencies. Other measurements, based on the correlation technique discussed above, show that the in-band noise over a 19.6 MHz bandwidth is -57 dBFS. The center frequency and sampling rate of the experimental modulator are the highest reported to date for a bandpass delta-sigma modulator in any technology.

Fig. 33:

Figure 33 shows the block diagram of the modulator test chip. As mentioned above, the bandpass modulator can be clocked either externally by a 20.6 GHz optical source or internally by an on-chip Josephson oscillator. A 1:4 demultiplexer converts the single-bit output of the modulator to 4-bit words at one-fourth the sampling rate. This allows most of the test chip, including the programmable counter and the shift register memory banks, to operate at a reduced clock rate with larger timing margins. Because of the 1:4 demultiplexing, 128-bit memory banks A and B are organized as 4 parallel rows of 32-bit long shift registers. As just discussed, the number of clock cycles skipped between loading the A and B memory banks is set by a programmable counter, which is programmed by external control currents. Once the shift registers have been loaded, a readout controller unloads the stored bits and transfers them to "high-voltage" drivers, which amplify the output signals up to about 2 mV, which is large enough to be detected by room-temperature electronics. The test chip employs over 4000 39

Radio Frequency Digital-to-Analog Converter

Oversampled Pipeline A/D Converters with Mismatch Shaping

Personnel

Personnel

S. Dacy (H.-S. Lee)

A. Shabra (H.-S. Lee)

Sponsorship

Sponsorship

Lucent Fellowship, ABB, and Center for Integrated Circuits & Systems

Center for Integrated Circuits & Systems (CICS)

Dynamic performance of high speed, high resolution, DACs is limited by distortion at the data switching instants. Inter-Symbol Interference (ISI), imperfect timing synchronization and clock jitter are all culprits. A DAC output current controlled by an oscillating waveform is proposed to mitigate the effects of the switching distortion. The oscillating waveform should be a multiple (k*fs) of the sampling frequency (fs), where k>1. The waveforms can be aligned so that the data switching occurs at the zero crossings of the oscillating current output. This makes the DAC insensitive to switch dynamics and jitter. The architecture has the additional benefit of mixing the DAC impulse response energy to a higher frequency. Instead of the conventional sinx/x DAC impulse response roll-off, there is a large high frequency lobe near the control oscillating waveform frequency (k*fs). An image of a low Intermediate Frequency (IF) input signal can therefore be output directly at a high IF or Radio Frequency (RF) for transmit communications applications.

In recent years, delta-sigma modulators and pipeline converters have been considered as possible realizations of analog-to-digital converters for wide-band signals. In comparing these converters, we recognize a few important attributes. Due to the wide bandwidth of the input signal and limited circuit speed, delta-sigma converters afford only low oversampling ratios, which makes high-resolution conversion extremely difficult. The low oversampling ratio generally nullifies the primary advantage of delta-sigma converters; the tolerance to component mismatches. In this regard, remaining potential advantages of delta-sigma converters over pipeline converters now only include ease of anti-alias filtering and low quantization noise. It must be noted that the ease of anti-aliasing is not inherent to deltasigma modulation. Rather, it is associated with oversampling. Therefore, pipeline converters can experience the same benefit of easy anti-aliasing by simply operating the converter at higher sampling rate than the Nyquist rate, i.e., oversampling. As for quantization noise in pipeline converters, the quantization noise can be made smaller by adding more stages at the end of the pipeline. Since the last stages of the pipeline do not contribute much thermal noise, they can be made extremely small and low power. Therefore, the quantization noise itself can be made arbitrarily small with negligible increase of area and power. Certainly, doing so will not improve the accuracy or thermal noise. However, it is no different in delta-sigma converters with low oversampling ratio.

A narrowband sigma-delta DAC with eight unit elements was chosen to implement the RF DAC. A sigmadelta architecture allows the current source transistors to be smaller since mismatch shaping is employed. Smaller current source transistors have a lower drain capacitance, allowing large high frequency output impedance to be achieved without an extra cascode transistor. Elimination of the cascode reduces transistor headroom requirements and allows the DAC to be built with a 1.8V supply.

Based on the above observation, we can conclude that delta-sigma converters do not possess any fundamental advantage over pipeline converters for wide-band applications that necessitate low oversampling ratios. At this low oversampling ratio many delta-sigma converters are incapable of providing good enough performance. While there are a few examples of delta sigma converters with a low oversampling ratio, we believe that a more efficient approach would be to over-

The RF DAC is currently being designed in 0.18µm, 1.8V CMOS technology. Target specifications are a 17.5MHz conversion bandwidth centered around 942.5MHz with 60dB SNR and 80dB SFDR.

continued

40

continued

Low Power Reconfigurable Analog-to-Digital Converter

Personnel K. Gulati (H.-S. Lee)

Sponsorship Center for Integrated Circuits & Systems(CICS)

sample a standard pipeline converter, and shape the distortion due to mismatch out of the signal band, where it will be removed by a subsequent digital filter. Since no attempt is made to shape the quantization noise, there are none of the concerns associated with delta-sigma converters with a low oversampling ratio.

There are applications which require Analog to Digital Converters (ADC) that can digitize signals at a wide range of bandwidth at varying resolution with adaptive power consumption. Clearly, a conventional ADC with fixed topology and parameters cannot accomplish this task efficiently. An alternate approach is to employ an array of ADCs, each customized to work at narrow ranges of resolution and input bandwidth – such a system would occupy a prohibitively large area to achieve optimal power consumption at fine granularity over bandwidth and resolution. A single ADC with reconfigurable parameters and reconfigurable topology would be able to achieve the above goal. Prior reconfigurable ADCs, however, achieve very limited reconfigurability. The proposed ADC is designed to provide a significantly larger reconfigurability space. Its target resolution ranges from 6 to 16 bits and signal bandwidth from 0 to10MHz.

A test chip was fabricated in a 0.35µm CMOS process to demonstrate a number of mismatch shaping concepts. A 77dB SFDR and 67dB SNDR is achieved at an oversampling ratio of 4 and a sampling rate of 60Msample/s. Mismatch shaping improves the converter SFDR by 12dB's and SFDR by 5dB's.

The concept of this ADC stems from the observation that certain ADC architectures such as the pipeline, cyclic and sigma-delta ADC topologies are composed of the same basic components such as opamps, comparators, switches and capacitors. The sole difference between them, from a network perspective, is the interconnection between these devices. Thus, a converter composed of these basic building blocks in conjunction with a configurable switch matrix, can be made to construct these different topologies and work at different resolutions and bandwidths. The reconfigurable ADC consists of several basic building blocks as shown in Figure 35. A user defined 'resolution word' that determines the resolution of the ADC is supplied to the main reconfiguration logic that then determines the global structure of the ADC and the state of each block. The PLL shown in the figure uses the frequency information in the clock and the resolution information from the main reconfiguration logic to determine the appropriate bias current of the opamps. continued

41

continued

The ADC was fabricated in a TSMC 0.6µm DPTM CMOS process and occupies a total die area of 10.5x7.6mm2 (Figure 36). The reconfigurable ADC intrinsically requires an area only slightly larger than a 12-bit ADC, however, the prototype layout is optimized not for area but for testability. The resolution of the ADC can be varied from 6 - 15 bits while bias current can be varied over a range of about 3 orders of magnitude corresponding to a sampling rate range of 20KHz20MHz. Table 2 provides a summary of representative measured results.

Fig. 35: ADC architectureFig.

Fig. 36: ADC micro-photograph. Table 2: Measured results at two performance points.

42

A CMOS Bandgap Current and Voltage References

A Programmable, Wide Dynamic Range CMOS Imager with On-Chip Automatic Exposure Control

Personnel

Personnel

M. C. Guyton (H.-S. Lee)

P. M. Acosta Serafini (C. G. Sodini)

Sponsorship

Sponsorship

Center for Integrated Circuits and Systems

Intelligent Transportation System Center and National Semiconductor Fellowship

Most analog circuits require reference voltages and currents that do not vary with power supply voltages and temperature. Bandgap voltage references with an output voltage around 1.2 volts have been popular for this purpose. However, producing non-integer multiples of bandgap voltage requires an operational amplifier increasing the complexity and power consumption. Bandgap current references also require an operational amplfier.

Machine vision applications that use visual information typically need an image sensor able to capture natural scenes that may have a dynamic range as high as four orders of magnitude. Reported wide dynamic range imagers may suffer from some or all of these problems: large silicon area, high cost, low spatial resolution, small intensity dynamic range, poor pixel sensitivity, small intensity resolution, etc. The primary focus of the proposed research is to develop a single-chip imager for machine vision applications which addresses these problems, but is still able to provide an ultra wide intensity dynamic range by implementing a novel pixel-by-pixel automatic exposure control. The secondary focus of the research is to make the imager programmable, so that its performance (light intensity dynamic range, spatial resolution, light intensity resolution, frame rate, etc.) can be tailored to suit a particular machine vision application.

The focus of this research is to develop simple and low power bandgap current and voltage references. We have developed a novel bandgap core circuit that produces a bandgap referenced output current directly without an operational amplifier. This simple circuit can even be operated as a 2-terminal bandgap current source. The same core circuit can also be used to generate arbitrary non-integer multiples of bandgap voltage.

The image sensing array has pixels which can be independently read and reset. The proposed brightness adaptive algorithm then predictively scales the voltage in photodiodes that would saturate under normal circumstances based on information gathered in several readout checks. The total integration time is subdivided into several integration times (called integration slots), which are progressively shorter. During any of the checks if it is determined that the pixel will saturate at the end of the integration slot, then the pixel is reset and it is allowed to once more integrate light, but for a shorter period of time. Each pixel has a small associated memory location to store an exponent that identifies the actual integration slot used. This information is used to appropriately scale the digitized pixel output.

A prototype 2-terminal band-gap current source has been designed and fabricated employing only 4 MOS transistors and 2 parasitic PNP transistors in a standard 0.35µ CMOS technology. We are presently evaluating the first silicon.

The prototype ASIC includes a 1/3” VGA (640x480) array (7.5mm square pixels), 64 cyclic analog-to-digital converters for digital pixel output, an integration controller which implements the described algorithm, 4-bit per pixel SRAM memory for exponent storage, and supporting digital logic. 43

Spike-Based Hybrid Computers Project

Personnel M. O’Halloran, A. Mevay, and H. Yang, (R. Sarpeshkar)

Sponsorship ONR

This project attempts to combine the best of analog and digital computation to compute more efficiently than would be possible in either paradigm of computation. (Rahul Sarpeshkar and Micah O’Halloran, “Scalable Hybrid Computation with Spikes”, in press, Neural Computation, 2002). This project is inspired by the duality of analog spike-time and digital spike-count codes of the brain’s neurons. It is being applied to create low-power time-based analog-to-digital converters, analog memories, and novel event-based control architectures. Several design issues that are important in mixed-signal systems including good power supply rejection are being explored. Figure 37 shows the layout of a low power analog-todigital converter that uses time as a signal variable rather than the traditional variables of voltage or current to perform quantization. A technique for achieving good power supply rejection without sacrificing the gain bandwidth product of an amplifier has been reported (Micah O’Halloran and Rahul Sarpeshkar, “A Low Open-Loop Gain High-PSRR Micropower CMOS Amplifier for Mixed-Signal Applications”, paper, IEEE International Symposium on Circuits and Systems, Arizona, May 2002).

Fig. 37: Layout of the Spiking Neuronal Analog-to-Digital Converter. Total area of chip is 4.84mm^2 in 1.5um technology.

44

The Visual Motion and Inertial Motion Sensing Project

Personnel M. Tavakoli-Dastjerdi (R. Sarpeshkar)

Sponsorship DARPA

This project maps the distributed feedback loops of biological photoreceptors to silicon to create low-power high-performance silicon photoreceptors. Such photoreceptors are useful as front ends in VLSI motion sensors, important in robotic and active-vision applications. An ultra-low-noise MEMS vibration sensor, which provides inertial information to a vibrating visual sensor being built by collaborators at Caltech, has also been built. Figure 38 shows the VLSI layout of a visual motion sensor that yields the speed and direction of a globally moving visual image along the “Y” direction. The array contains both photodiodes and analog VLSI processing circuitry that is inspired by similar circuitry in the housefly. Figure 39 shows the experimental setup for testing a capacitive MEMS vibration sensor with associated ultra low noise offset-compensating electronics on an electronics die, which is wirebonded to the MEMS die. The sensor achieved an electronic noise floor equivalent to 30ug/rt(Hz) over a 1Hz-100Hz bandwidth, a specification that appears to be 3 times better than any equivalent commercial or research system, in spite of its separate-die solution for mechanical and electrical systems. The system was able to detect a change of 1 part per 5 million in capacitance. The offset-compensating electronics has been briefly described in "A Low-Noise Nonlinear Feedback Technique for Compensating Offset in Analog Multipliers”, Maziar Tavakoli-Dastjerdi and Rahul Sarpeshkar, accepted paper, IEEE International Symposium on Circuits and Systems, Arizona, May 2002.

Fig. 38: Layout of a (modified Reichardt-like facilitate-and-sample) visual motion sensor.

Fig. 39: The test setup utilized to evaluate the performance of our combined MEMS-and-electronic vibration sensor

45

Opposite Page: Scanning electron micrograph shows a completed 1x8 array of cell traps. Each trap consists of four extruded gold electrodes fabricated by electroplating. (See Pages 51 and 52) Courtesy of R.A. Braff and J. Voldman (Professors M.L. Gray, M.A. Schmidt, and M. Toner)

Sponsor NSF, Kodak, and NIGMS

46