THE speed of serial links across copper backplanes has seen

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005 1957 A 4.8–6.4-Gb/s Serial Link for Backplane Applications Using Decision Feedb...
Author: Steven Doyle
83 downloads 0 Views 1MB Size
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005

1957

A 4.8–6.4-Gb/s Serial Link for Backplane Applications Using Decision Feedback Equalization Vishnu Balan, Member, IEEE, Joe Caroselli, Member, IEEE, Jenn-Gang Chern, Member, IEEE, Catherine Chow, Member, IEEE, Ratnakar Dadi, Member, IEEE, Chintan Desai, Member, IEEE, Leo Fang, Member, IEEE, David Hsu, Member, IEEE, Pankaj Joshi, Member, IEEE, Hiroshi Kimura, Member, IEEE, Cathy Ye Liu, Member, IEEE, Tzu-Wang Pan, Member, IEEE, Ryan Park, Member, IEEE, Cindy You, Member, IEEE, Yi Zeng, Member, IEEE, Eric Zhang, Member, IEEE, and Freeman Zhong, Member, IEEE

Abstract—In this paper, a serial link design that is capable of 4.8–6.4-Gb/s binary NRZ signaling across 40 of FR4 copper backplane traces and two connectors is described. The transmitter features a programmable two-tap feed forward equalizer and the receiver uses an adaptive four-tap decision feedback equalization to compensate for the losses in the channel at 6.4 Gbps. The transceiver core is built in LSI’s 0.13- m standard CMOS technology to be integrated into ASIC designs that require serial links. The transceiver consumes 310 mW per duplex channel at 1.2 V and 6.4 Gb/s under nominal conditions. Index Terms—Adaptive equalization, backplane transceiver, decision feedback equalization (DFE), SerDes, serial link.

I. INTRODUCTION

T

HE speed of serial links across copper backplanes has seen a steady rise over the past few years. Backplane serial links need to be able to handle increased channel losses at these higher speeds while still being capable of supporting legacy backplanes that were originally designed for 1–3-Gb/s operation. Advanced equalization techniques are required to remove intersymbol interference (ISI) due to loss mechanisms in copper traces drawn on PCBs. The loss mechanisms include those due to skin effect, dielectric loss, and reflections from impedance discontinuities. Equalization techniques that provide high frequency boost to compensate for channel losses also boost noise or crosstalk, which degrades overall performance. Traditional decision feedback equalization (DFE) architectures have both feedback and feedforward equalizers in the RX. In order to ease design and maintain backward compatibility, the feedforward equalizer is moved to the TX. The transceiver described here has both transmit (TX) equalization in the form of programmable de-emphasis filter (FF) and receive (RX) equalization in the form of DFE to compensate for channel losses. DFE uses clean decisions of previously received symbols to remove ISI in the current symbol. Since it does not boost high-frequency noise

Manuscript received December 7, 2004; revised February 18, 2005. V. Balan, J. Caroselli, C. Chow, C. Desai, L. Fang, D. Hsu, P. Joshi, C. Y. Liu, T.-W. Pan, R. Park, Y. Zeng, E. Zhang, and F. Zhong are with the Communications & ASIC Technology Department, LSI Logic Corporation, Milpitas, CA 95035 USA (e-mail: [email protected]). J.-G. Chern, R. Dadi, H. Kimura, and C. You are with Link-A-Media Inc., Santa Clara, CA 95051-0951 USA. Digital Object Identifier 10.1109/JSSC.2005.848180

such as crosstalk or wideband noise to equalize the channel, this technique can be suitable for backplane environments with high channel count. DFE is vulnerable to error propagation because an error made during a decision will influence future decisions through the feedback equalizer. However, the target bit error rate , (BER) in backplane applications is already very low and the degradation due to error propagation is acceptable in most cases. The transceiver consists of three building blocks, namely the TX, RX, and PLL. The purpose of the PLL block is mainly to serve as a clock multiplier unit (CMU) and to generate multiphase clocks that are at nominally at the four-tap (4T) clock rate. The PLL provides a fixed phase to the RX and TX. The TX serializes the data using this clock, while the RX uses the PLL clock as an initial guess for the incoming data phase and frequency. The exact phase and frequency at the RX is recovered from the data by a digital clock and data recovery loop architecture. Each PLL block is capable of driving up to four full duplex channels. The multiphase clock is distributed through a low skew clock tree to each channel. Since the PLL block is a common block for both TX and RX, it is also an ideal place for placing shared bias circuits, common circuitry for TX phase calibration, and at-speed loop back and BiST circuits. II. ARCHITECTURE Fig. 1 shows the block diagram of the architecture of the serial data link. It consists of a TX data serializer that uses the multiphase clocks from the CMU PLL to serialize the data. The clock phases are calibrated to adjust for phase mismatches before being used by the serializer. The serialized data is then passed through the programmable TX filter before being driven to the output pads. The target jitter at the output of the TX is 0.3UI including random and deterministic components mea. After traversing the channel, the sured at a BER level of eye at the input of the receiver can be completely closed due to ISI. At the receiver, the ISI in the data is first cancelled by the DFE filter and then sliced by the comparators of a 2-b ADC. The outputs of the ADC (Dk, Ek) are used to drive both the CDR loop and the digital DFE coefficient adaptation loop as shown in Fig. 1. The CDR is a dual-loop architecture consisting of the CMU PLL (loop1) which generates multiphase clocks that are

0018-9200/$20.00 © 2005 IEEE

1958

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005

Fig. 1. Block diagram of a serial link with programmable TX de-emphasis and adaptive RX equalizer.

Fig. 2. Floor plan of a four-channel full duplex system, showing the PLL, PLL2, 4TX, 4RX, mini-RX, mini-TX, and the clock tree.

used at each RX by the phase interpolators to produce the recovered clock (loop2). The interpolator is driven by a digital CDR loop that will be described later. A modular approach is adopted to facilitate scalable integration of these blocks into ASIC designs. Fig. 2 shows the floor plan of a full duplex four-channel subsystem. The design consists of a TX core, RX core, and PLL core, which can then be

used as single channels or assembled into subsystems (such as a four- or eight-channel duplex system driven by one PLL block) based on application requirements. The PLL generates four differential phases of 4T clocks, with one T separation between the clocks (where T is the bit-time at full-rate). Both the TX and RX use these 4T, 4- clocks to serialize and deserialize the data. The number of FF taps to be used is selected based on the number of significant precursor ISI samples in the pulse response. The measured pulse response for one of the target backplanes is shown in Fig. 4(a). It shows only one significant sample of precursor ISI, implying that a two-tap FIR should be sufficient for the feedforward equalizer. Fig. 4(b) shows a plot of SNR at the slicer input for the target back plane versus the number of taps employed. On the -axis, the number of feedback taps is shown. Each line represents a different number of taps in the FF equalizer at the TX. As was expected, a significant improvement occurs when going from one tap to two in the FF equalizer, but there is virtually no benefit beyond that. Similarly, while there is significant improvement in SNR as the number of feedback taps increase, there is diminishing return beyond four taps. Thus, a compromise between performance and complexity is made by choosing a two-tap FF de-emphasis filter at the TX and a four-tap feedback filter at the RX. Since the TX is far away from the RX and a back channel is not guaranteed to be available, the TX filter coefficients cannot be adapted and are only programmable through register settings. The filter can be register programmed to either act as a precursor type ( , Type I) or postcursor type ( , Type II) of de-emphasis. (Here “ ” denotes delayed or previous data, while “1” denotes present data bit in the serialized bit stream). Type-II de-emphasis is the traditional method where the signal amplitude overshoots just after a transition, and then reduces for longer run lengths. Type-I de-emphasis differs from Type-II in that the signal undershoots just before the data transition and then reduces for longer run lengths. Type-I preemphasis effectively removes precursor ISI while Type II removes postcursor ISI. Since DFE at the RX can only cancel postcursor ISI, a Type-I FF filter would be more suitable to cancel precursor ISI at the TX [as an example, Fig. 4(a) shows a pulse response before and after this type of equalization is performed]. For legacy

BALAN et al.: 4.8–6.4-GB/S SERIAL LINK FOR BACKPLANE APPLICATIONS USING DECISION FEEDBACK EQUALIZATION

Fig. 3. Block diagram showing the TX core along with the mini-RX block that is used during loopback BiST and TX calibration. The mini-RX is clocked by the PLL or PLL2 during different modes.

receivers that operate on open eyes, or those that do not have RX equalization, a Type-II filter at the TX would be more suitable to cancel postcursor ISI, as that dominates the loss mechanism in typical systems. The RX has a four-tap DFE filter with the filter coefficients being set adaptively by a sign–sign LMS algorithm [1]. The RX also features a second-order digital CDR loop that 200 ppm offset) of acquires the phase and frequency (up to the incoming data. The TX uses multiphase clocks to serialize data, and, as a result, any mismatch between the clock phase spacing will translate into periodic deterministic jitter at the TX output. In order to minimize the impact of this, the TX has phase calibration logic, which can detect and correct the phase mismatch of the clocks. The clock phase calibration at the TX is performed once during power-up and the phase mismatches correction values are stored digitally. During calibration, the TX serializes fixed 4T patterns, namely 0011, 0110, 1010 (and their complements) in a specific sequence. Fig. 3 shows a block diagram of the shared data path between loopback and calibration modes. The output of TX is looped back to the PLL block, which has a simplified RX called mini-RX to detect the data. This uses the same path as the TX loopback BiST that is part of the test strategy for the TX core. During calibration, the mini-RX is clocked with a scan clock that has a small fixed offset frequency (approximately 2000 ppm) from the TX serializer. The scan clock is generated by another PLL (called PLL2) locked to the same crystal as the CMU. The small offset frequency causes the scan clock to “walk” across the serialized data stream from the TX. The output from the mini-RX is digitally averaged to detect the duty cycle of the pattern. If all serializing phases are ideal, the duty cycle for every pattern will be exactly 50%. Depending on the actual duty cycle of each pattern sent from the TX, the phase mismatch can be detected. Each of the multi-phase TX clocks pass through a 5-b phase interpolator such that each clock can be independently adjusted to an accuracy of T/64. The phase interpolators are driven by the digital calibration logic according to the algorithm described below. In a four-phase system, we use one of the phases as reference and make three independent adjustments to get perfectly aligned phases. The duty cycle is sensitive to different clock

1959

(a)

(b) Fig. 4. (a) Pulse response measured at the far end near the RX, for a target backplane for this design. (b) SNR at the slicer input for different numbers of taps in the feed-forward and feedback equalizers.

phases’ mismatches depending on the data being serialized. For is used to serialize the first example, it is assumed that , , and data bit of the 4T pattern, followed by in that order for each of the remaining 3 b. While the pattern “0011” is serialized at the TX, the duty cycle is only a function of the clock phase mismatch between the phases and . Similarly, the “0110” pattern has sensitivity to mismatch between and , while the “0101” pattern has sensitivity to all phases. The calibration is done in three sequenare first adjusted to be 180 apart, tial steps. The are adjusted to be 180 apart. Assuming and then and that the previous steps ensured 180 between , the remaining error during the “0101” pattern is . used to correct for mismatch between In order to maximize compatibility with most designs, the RX is ac-coupled on chip after the 50- termination resistors. This gives flexibility to tolerate any common-mode voltage at the input of the RX within process limits. It also gives the freedom to set an optimal common mode at the RX input stage for best performance independent of line conditions. The data latches at the RX have offset calibration to remove the inputreferred offset of the latch and thereby improve the sensitivity

1960

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005

Fig. 5. Block diagrams of the 4:1 serializer, predriver, and pad driver in the TX. The inset also shows the timing diagram for the 4:1 serializer.

of the latch. The offset calibration at the RX is done once during power-up and the offset correction values are stored digitally. The input of the data latches consists of a double differential pair, each being driven by data and an offset DAC, respectively. The signed 3-b offset DAC generates offset voltages in the range which can be used to cancel the input-referred of 30 mV offset of the latch. During calibration, the RX inputs just after the ac coupling capacitors are shorted to the common-mode voltage so that external conditions at the RX input pins do not affect the calibration process. Then the DAC codes are swept from 7 to 7 while the data are observed at the digital portion of the chip. When a data transition from 1 to 1 occurs, the DAC codes represent the input offset of the latch. The values are digitally stored and used during normal operation. III. TRANSMITTER DESIGN It is desired to perform as much of the equalization as possible at the RX, since the feedback equalizer is adaptive and does not enhance high-frequency cross talk. The feedback equalizer is capable of removing the postcursor ISI, but not the precursor ISI. Consequently, the FF equalizer in the TX will be used solely to mitigate the precursor ISI. Fig. 5 shows the circuit detail of the TX. It consists of a data rotator which aligns data for optimal timing at the serializer, two 4:1 serializers (one each for data and 1T delayed data), predriver, and a CML 50- driver. The on-chip 50- termination is capable of being automatically trimmed at power-up to track an external 3-k resistor and thereby remove any process variation of the polysilicon resistors. The 4-b parallel data from the

digital side comes in at 4T clock rate. The data rotator takes the 4-b parallel data and shifts each data bit by integer multiples of T, to be aligned with the serializing clock edges as shown in the timing diagram of Fig. 5. The data rotator uses the 4- , 1T-spaced clocks to achieve this. As shown in the timing diagram of Fig. 5, the 4:1 mux forms a 1T-wide pulse by combining the rising edge and falling edge of two adjacent phases. The data rotator realigns the 4T parallel data such that the 1T-wide sampling pulse for each bit has a 2T setup time and a 1T hold time. The serialization circuit consists of four pulsing circuits, with each one active for a period of 1T, that sum currents at a resistor (Fig. 5). Each leg ANDs the data with two adjacent clock phases to produce a 1T data pulse that in turn drives the gate of the NMOS current source. A data of “1” will pull a current through the resistor while a “0” produces no current. This, when combined with the other complementary data side, produces a pseudodifferential CML signal to drive the predriver stage. Two 4:1 serializers are used to create two data streams that are delayed by 1T with respect to each other. The two data streams drive a segmented pre-driver and driver stages to produce the TX output at the pads. Depending on the pre-emphasis setting desired, each predriver is switched to choose either the data or delayed data. For example, in Fig. 5, four segments of the predriver , with , and driver, each with a weight , is shown. If segments 2 and 3 are chosen to select delayed data while segments 1 and 4 are chosen to select the data, then output of the TX, produces a de-emphasis waveform that is of the form , where “1” denotes present ” denotes delayed data. Type-II or Type-I de-emdata and “ phasis can be achieved by simply swapping the data streams at

BALAN et al.: 4.8–6.4-GB/S SERIAL LINK FOR BACKPLANE APPLICATIONS USING DECISION FEEDBACK EQUALIZATION

1961

Fig. 6. Rx uses a 4- architecture with each slice operating at 4T clock rate. The blow-up of each slice shows the precalculate operation in the RX.

the output of the rotator. In this example, the waveform would then be of the form . By digitally choosing various combinations of mixing the weights, , as above, eight different de-emphasis settings in the range of 0%–36% are made possible. IV. RECEIVER DESIGN As the data slicer continues to make decisions every bit period, T, the DFE filter processes the information to cancel ISI from the present incoming symbol. In order to be able to cancel ISI from the most recent bit from the present bit, the following timing constraint has to be met (Fig. 1): (1) where

is the clock to data delay of the slicer, and is the delay of the DFE filter, and is the setup time of the slicer. Meeting this constraint for operation ps in 0.13- m CMOS technology at 6.4 Gb/s may not be feasible or may require high-power dissipation. The biggest bottleneck in meeting the timing constraint without a . A major portion of the precalculate feature is the large which includes the time to budget in (1) is used by the amplify a small signal to CMOS levels. In order to ease the constraint, a precalculate architecture [6] is adopted as shown

in Fig. 6. The ISI due to every possible combination of the previous bit ( 1 in the case of binary NRZ) is first calculated in parallel and the correct choice is made once the previous data are known. This solves the critical path involving the first tap, and the new critical path is pushed to the second tap of the DFE. The new critical path now consists of the data signal passing through the first slice and reaching the latch input of the third slice within 2T. The speed of performing other operations such as multiplexing or retiming large signals is much faster delay of the slicer. Even though more steps are than the taking place in the new architecture within a time of 2T, the additional steps are fast leaving sufficient time for the slowest step to complete with margin. The RX uses a four-phase design where each slice operates at a 4T rate. Unlike previous CDR architectures [3], [4] that use 2X or 3X over sampling, a baud-rate sampling scheme is used to capture the data (D) as well as generate an error signal (E) from the center of the data eye for driving the timing, gain, and coefficient adaptation loops [1]. Phase update in traditional high-speed serial link design is performed by a “bang–bang” timing loop [3]. This requires oversampling the received signal by a factor of two relative to the transmit frequency. Typically, two samples are taken, one at the center of the data eye and one at the edge of the eye. In a DFE system, since a feedback filter is equalizing the received signal between data samples, the

1962

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005

Fig. 7. RX front-end circuit with ISI cancellation. The source-follower circuit follows the input signal and each leg of the current cancels ISI based on previously received data.

signal in front of the edge sampler may not be settled within the T/2 time. This may cause a data-dependent timing offset and result in suboptimal performance. In this design, in order to avoid the cost of over sampling, phase update is driven by baud rate spaced samples. The phase update equation (2) has been shown to update to the center of a symmetric pulse response [5]. This implementation uses a sign–sign version of (2) and performs a majority vote every 4T on four consecutive outputs of the update equation so that

In a typical DFE system, an adaptive FF filter will control the gain of the incoming signal such that the amplitude seen by the ADC is constant (AGC action). Since the FF filter is now placed at the TX and is not adapted, a “gain loop” adjusts the size of the LSB in the 2-b data slicer. As the incoming amplitude varies from channel to channel or due to TX amplitude settings, the target internal to the RX, , changes until it reaches steady state. As shown in Fig. 6, is the comparator threshold for the 2-b ADC. Large incoming amplitude will cause to be adapted to a higher value and vice-versa. This can be represented by the following equation: (6)

(3) A second-order timing loop was implemented to accommo200 ppm without severely date frequency offsets of up to impacting jitter tolerance performance [7], [8]. The frequency update equation is given by (4) The sampling position is updated according to the following equation each 4T: (5) where controls the bandwidth of the phase update and determines the bandwidth of the frequency-offset update. These are implemented in digital logic, which then drives a phase interpolator that has a T/64 resolution to obtain the recovered clock. The interpolator uses the multiphase clocks from the CMU to perform phase interpolation.

where

is the th sample of incoming signal , for are the DFE filter coefficients, is the detected data, and is the amplitude target (same as the slicer threshold of the 2-b ADC shown in Fig. 6). The coefficient adaptation loop (adjusts ) and the gain loop (adjusts ) all work together to satisfy (6), at every sample. Fig. 7 shows the circuit detail of how the ISI from four previous bits are subtracted from the incoming data signal to determine the present bit. It consists of a source follower with several legs of current sources controlled by previously recovered data and the gate of the source follower being driven by the input signal. The bias current in each leg is controlled by adaptation loops and determine the magnitude of coefficients, . The magnitude for each tap is determined by the currents in each leg being drawn from the output impedance . The source follower is operated in of the source follower, the “small signal” regime by ensuring that even when maximum ISI is cancelled, the follower circuit is still in the “linear” range. Further, the amplitude of the incoming data signal at the RX is restricted in range to keep the source follower in the linear

BALAN et al.: 4.8–6.4-GB/S SERIAL LINK FOR BACKPLANE APPLICATIONS USING DECISION FEEDBACK EQUALIZATION

1963

Fig. 8. PLL schematic showing VCO delay cell and self-bias loop.

region. In the short channel case, this is done by reducing the launch amplitude at the TX while in the long channel case the channel losses are sufficient to attenuate the signal. The target with the exact value voltage, , is nominally set at 200 mV depending being set adaptively in the range of 100–300 mV on the actual incoming voltage amplitude. If the input amplitude needs to be set outside of this range, then the is such that the performance will degrade due to suboptimal operation of the loops. In the extreme case, when the amplitude is too large or too small, the receiver will fail. The feedback taps and target level must be properly set before reliable reception of data can be achieved. This is accomplished during an initialization period where the feedback and target adaptation loops along with the timing loop are allowed to adapt and settle. It is necessary, therefore, for the TX to transmit a data pattern which the RX can use for adaptation. However, this pattern is not constrained other than it be spectrally rich. Any PRBS pattern such as a PRBS7 (or higher order) is sufficient for this purpose. The initialization is accomplished in two steps. First, the timing and target adaptation loops are activated while the feedback adaptation loop remains off. This second-order timing loop will track out the frequency offset and the phase will lock. However, the phase will not lock to the ideal sampling point if the pulse response is asymmetric due to the presence of postcursor ISI. In the next step, the feedback tap loop is switched on as well. As the feedback taps adapt, postcursor ISI is removed and the equalized pulse response becomes more symmetric. As this happens, the selected phase moves toward to the desired position. In order to minimize interaction between the loops, it is important that the bandwidth of the coefficient loop be significantly slower than that of the timing loop. V. PLL DESIGN The PLL generates multiphase clocks that are distributed through a scalable clock tree to the RX and TX channels. Each PLL is capable of driving up to eight channels of TX or RX. Fig. 8

shows a block diagram of the PLL with details of the VCO delay stage and loop filter. The delay stage consists of two invertors that are cross-coupled at the output with two weak invertors (inset in Fig. 8). The cross-coupling helps to keep the two outputs complementary. The supply voltage of the invertors is used as the control voltage for the VCO. The four-stage ring VCO runs on a regulated supply that is also the control voltage for the VCO. A self-biased regulator is used to provide current to the VCO while also serving to improve PSRR [2]. The regulator has a low-pass characteristic [2] that acts as the 3rd pole in the PLL loop to help remove high frequency noise on the control voltage. The PLL uses self-bias techniques to track the bandwidth and damping over PVT variations. This is simply achieved by letting the damping resistor in the loop filter track the VCO delay and the charge pump current track the VCO regulator current. The damping resistor is formed by a pMOS transistor in the triode region that is a scaled version of the pMOS in the delay cell. The charge pump is biased using a copy of the current in the regulator. A start-up circuit ensures that a minimum charge pump current exists during power up until the self bias loop takes over and drives it toward the optimum value. This lets the PLL natural be a fixed fraction of the , while the damping frequency factor is a fixed constant proportional to the ratio of capacitors ’s of transistors. A simplified analysis of the self-bias and loop is given below. Here it is assumed that the invertor delay and cell switches completely between the control voltage ground . Also, for simplicity, the threshold voltage of the devices is assumed to be 0 V. The delay of one delay cell is approximately given by (7) is the load capacitance at each VCO node, and is where the equivalent resistor of the transistor in triode region. Substituting expressions for (8)

1964

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005

Fig. 9. Jitter tolerance plot of the RX with sinusoidal jitter added at the TX side. The test setup consists of TX transmitting across 30 of FR4 copper trace, two backplane HmZd connectors, and 2 of cable.

we can get the expression for

as (9)

is number of delay stages. The current through the where regulator can be given as (10) The charge pump current is a fraction of the VCO regulator current, while the triode region damping transistor (shown in Fig. 8) is a fraction of the delay cell transistor. Substituting and of a second-order PLL, it for the usual expressions for can be shown that (11)

Fig. 10. RX input return loss plotted versus frequency. The TX return loss is similar to the RX.

(12) where is the feedback divider count and is the loop capacitor. From the above expression, it can be seen that to first order the damping and natural frequency of the loop are independent of PVT. In reality, there are still variations due to second-order effects and due to the fact that some of our assumptions (such as 0-V threshold) are not strictly valid. However, the variations are still within 10%–15% over various conditions. VI. RESULTS The transceiver was fabricated as part of a test-chip in LSIs 0.13- m CMOS technology and packaged in a generic fourlayer flip-chip PBGA package. The test-chip consists of several RX and TX channels sharing clocks driven by a PLL core. The TX/RX channels are surrounded by random switching logic, which emulates digital switching noise present in a real ASIC environment. The experimental setup consists of 30 of FR4 copper trace, two backplane HmZd connectors, and 2 of cable. The eye diagrams shown as inset in Fig. 9 are measured at 6.4-Gb/s operation close to TX and close to the RX, respectively. The output of the TX shows a jitter of 41 ps (peak-to-peak) of which the PLL jitter is about 26 ps (peak to peak) measured up to accuracy. The remaining jitter includes phase mismatch

and other deterministic jitter components due to ISI from 5 of FR4 trace on the evaluation board. It then traverses the backplane and the jitter adds with the ISI due to backplane losses to close the eye at the RX by approximately 0.73UI (see the inset in Fig. 9). The receiver blind-adapts the DFE taps to the channel response based on a PRBS pattern, acquires timing phase and frequency and operates error-free for over 24 h. At 6.4 Gb/s, this translates to a BER of about . The RX jitter tolerance plot for the setup is also shown in Fig. 9. The plot shows the frequency of the sinusoidal jitter added to the TX clock on the -axis versus the extra jitter amplitude added at the jitter frequency on the -axis. The sinusoidal jitter is added at the output of TX PLL clock that is used to serialize the TX data. (The sinusoidal jitter is in addition to the already existing jitter components that are present in the normal PLL output.) The plot shows a high-frequency jitter tolerance of about 0.12UI and a corner frequency of about 3 MHz. Fig. 10 shows the measured RX return loss as a function of the frequency. The RX return loss is better than 10 dB for frequencies up to 3.5 GHz. The TX output return loss is similar to the RX measurement. The TX calibration routine measures the phase offsets between multiphase clocks used at the TX and corrects them as described earlier. The standard deviation of the TX eye widths before and

BALAN et al.: 4.8–6.4-GB/S SERIAL LINK FOR BACKPLANE APPLICATIONS USING DECISION FEEDBACK EQUALIZATION

1965

[6] S. Kasturia et al., “Techniques for high-speed implementation of nonlinear cancellation,” IEEE J. Sel. Areas Commun., vol. 9, no. 5, pp. 711–717, Jun. 1991. [7] V. Stojanovic et al., “Adaptive equalization and data recovery in a dualmode (PAM2/4) serial link transceiver,” in Proc. VLSI Circuit Symp., 2004. [8] H. Ng et al., “A second-order semi-digital clock recovery circuit based on injection locking,” IEEE J. Solid-State Circuits, vol. 38, no. 12, p. 2101, Dec. 2003.

Fig. 11.

Microphotograph of test chip showing RX, TX, and PLL cores.

after calibration are 6 and 2 ps, respectively, as measured over 15 parts. Hence, the TX phase calibration helps to keep the duty cycle distortion low at the output of the TX. A microphotograph (taken with backside IR-OBIRCH) of the test chip is shown in Fig. 11. The power dissipation under nominal conditions (1.2 V, 25 C ambient) for a duplex channel is approximately 310 mW. The TX consumes about 100 mW, the RX about 200 mW, while the PLL that can shared by up to four full duplex channels consumes about 40 mW. The PLL (together with BiST, PLL2, and calibration circuits) occupies 0.78 mm (600 m 1300 m), TX occupies 0.24 mm (200 m 1200 m), and RX about 0.32 mm (200 m 1600 m), respectively. The modular design and layout of each block allows building subsystems easily at the chip level.

Vishnu Balan (S’95–M’96) received the B.S. degree in electronics and communications engineering from the Indian Institute of Technology, Madras, in 1995 and the M.S. degree in electrical engineering from Duke University, Durham, NC, in 1996. In 1997, he joined DataPath Systems, Inc., Santa Clara, CA, where he worked on analog front-end circuits for several generations of hard disk drive read channel design. In 2000, he joined LSI Logic Corporation, Milpitas, CA, where he worked on SerDes design for backplane applications. While at LSI, he led the analog design effort for several serial links at speeds ranging from 1.6 Gb/s to 12.8 Gb/s. In 2005, he joined Teranetics, Inc., Santa Clara, CA, where he is currently working on analog front-end circuits for 10GBASE-T transceivers.

Joe Caroselli (S’97–A’98–M’03) received the B.S. degree in electrical engineering and economics from the California Institute of Technology, Pasadena, in 1992 and the Ph.D. degree in electrical engineering from the University of California, San Diego, in 1998. He has worked as a Systems Architect for read channels for disk and magnetic tape drives for Quantum Corporation, Milpitas, CA, and DataPath Systems, Santa Clara, CA. He joined LSI Logic Corporation, Milpitas, CA, in 2000 as part of the DataPath acquisition and has been working on high-speed serdes as part of the HyperPhy team since 2001 where he leads the system architecture team.

VII. CONCLUSION A high-speed serial link design using a programmable FF filter at the TX and an adaptive four-tap DFE at the RX has been demonstrated. The RX implements baud-rate sampling using a 2-b ADC to recover the data and generate error signal for the adaptation of the DFE coefficients as well as acquire the phase of the incoming data stream. A test chip implemented in 0.13- m standard CMOS technology is shown to be fully functional and meet all of the target specifications. The transceivers are also capable of half- and quarter-rate mode operation to support backward compatibility. REFERENCES [1] V. Balan et al., “A 4.8–6.4 Gbps serial link for back-plane applications using decision feedback equalization,” in Proc. IEEE CICC, Oct. 2004, p. 3-3-1. [2] V. Balan, “A low-voltage regulator circuit with self-bias to improve accuracy,” IEEE J. Solid State Circuits, vol. 38, no. 2, p. 365, Feb. 2003. [3] A. Fiedler et al., “A 1.0625 Gbps transceiver with 2 -oversampling and transmit signal pre-emphasis,” in ISSCC Dig. Tech. Papers, vol. XL, Feb. 1997, p. 238. [4] S.-H. Lee and M.-S. Hwang et al., “A 5 Gb/s 0.25  CMOS jittertolerant variable-interval over sampling clock/data recovery circuit,” in Proc. ISSCC, vol. XLV, Feb. 2002, p. 256. [5] K. H. Mueller and M. Muller, “Timing recovery in digital synchronous data receivers,” IEEE Trans. Commun., vol. COM-24, no. 5, p. 516, May 1976.

2

m

Jenn-Gang Chern (S’87–M’88) received the M.S.E.E. degree from the University of California, Los Angeles, in 1988. From 1988 to 1994, he was with Silicon Systems as a Design Engineer on the peak detection channels for hard disk drives. From 1994 to 2000, he was with DataPath Systems, Santa Clara, CA, where he was involved with PRML read channel development. In 2000, he joined LSI Logic Corporation, Milpitas, CA, to work on high-speed SERDES and DVD front-end developments. In 2004, he joined Link-A-Media Corporation, Santa Clara, CA, to lead HDD read channel SOC analog frond-end development.

Catherine Chow (M’78) received the B.S. degree in engineering from the University of Michigan, Ann Arbor, in 1974 and the M.S. and Ph.D. degrees in computer science from the University of Illinois, Urbana, in 1977 and 1981, respectively. Currently, she is a Design Manager with the LSI SerDes group. She joined LSI Logic Corporation, Milpitas, CA, in 2001. Her prior experiences included ASIC methodology development and storage controller design in IBM’s Storage Division, San Jose, CA.

1966

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005

Ratnakar Dadi (S’98–A’00–M’03) was born in W. Godavari District, AP, India. He received the B.Tech degree in electrical engineering from the Indian Institute of Technology, Bombay, in 1998, and the M.S. degree in electrical engineering from University of Hawaii, Manoa, in 2000. He joined LSI Logic Corporation, Milpitas, CA, in 2000, where he worked on analog circuits for several generations of high-frequency serial-link communication over backplanes. Since 2004, he has been with Link-A-Media Corporation, Santa Clara, CA, working on mixed-signal circuits for hard-disk drive controllers. His current research interests include design of high-performance analog circuits in CMOS and BiCMOS technologies, analog and mixed-signal circuit simulation techniques, and mixed-signal design flow and verification methodologies.

Hiroshi Kimura, photograph and biography not available at the time of publication.

Cathy Ye Liu (S’96–M’99) received the B.S. degree in electronic engineering from Tsinghua University, Beijing, China, in 1995 and the M.S. and Ph.D. degrees in electrical engineering from University of Hawaii, Manoa, in 1997 and 1999, respectively. She joined DataPath Systems, Inc., Santa Clara, CA, in 2000 and is currently with LSI Logic Corporation, Milpitas, CA, where she is the System Architect for high-speed SerDes. Her current interests are high-speed clock and data recovery, adaptive decision feedback equalizer, signal processing, and error correction coding.

Chintan Desai (M’94) received the B.S.E.E. degree from the Regional Engineering College, Surat, India, in 1988 and the M.S.E.E. degree from Oklahoma State University, Stillwater, in 1992. He has over ten years of SerDes development experience at LSI Logic Corporation, Milpitas, CA, where is he is currently the Director of SerDes Development for Telecommunications Applications.

Leo Fang (S’93–M’95) received the B.S. degree in electrical engineering/computer science and material science engineering from the University of California, Berkeley, and the M.S. degree in electrical and computer engineering from Carnegie Mellon University, Pittsburgh, PA. Presently, he is the Chief Operating Officer with PyX Technologies, San Ramon, CA. Prior to joining PyX Technologies, he was Director of SERDES and USB Development at LSI Logic Corporation, Milpitas, CA. Before LSI Logic, he held a variety of senior design engineering positions within storage-centric companies such as Quantum Corporation, Milpitas, CA, and DataPath Systems, Inc., Santa Clara, CA.

David Hsu (M’95) received the B.S. degree in engineering physics from The Ohio State University, Columbus, in 1993, and the M.S. degree in electrical engineering from Purdue University, West Lafayette, IN, in 1994. He has been with the SerDes group in LSI Logic, Milpitas, CA, since 2001 and is currently a Design Manager with the group. Prior to joining the SerDes group, he was with DataPath Systems, Inc., San Jose, CA, working in DSL and read channel projects, and in telecommunication projects in the Siemens Semiconductor division.

Pankaj Joshi, photograph and biography not available at the time of publication.

Tzu-Wang Pan, photograph and biography not available at the time of publication.

Ryan Park (M’00) was born in Seoul, Korea, in 1976. He received the degree with emphasis on digital VLSI design from the University of California, Berkeley, in 2000. In 2000, he joined DataPath Systems, Inc., Santa Clara, CA, later acquired by LSI Logic Corporation, Milpitas, CA, where he has been involved in the area of high-speed digital design. He is currently with LSI Logic as a Digital VLSI Designer working in SerDes design.

Cindy You (M’00) received the B.S. degree in electrical engineering from National Tsing Hua University, Taipei, Taiwan, in 1996, and the M.S. degree in electrical and computer engineering from the University of Texas, Austin, in 1998. In 1999, she joined DataPath Systems, Inc., Santa Clara, CA, where she worked on continuous-time filters for hard disk drives. From 2000 to 2004, she was with LSI Logic Corporation, Milpitas, CA, working on mixed-signal circuits including CDR, transmitters, and DFE analog front-ends for multi-Gb/s SerDes development. She is now with Link-A-Media Devices Corporation, Santa Clara, CA, where she has been engaged in the development of analog front-ends of disk drive read channels.

Yi Zeng (M’03) received the B.S. degree from Tsinghua University, Beijing, China, in 1997 and the M.S. degree from the University of Hawaii, Manoa, in 2000. In 2001, he joined Ample Communications, Inc., Fremont, CA, where he worked on SONET/SDH Framer analog I/O design. Since December 2002, he has been with LSI Logic Corporation, Milpitas, CA, where he has been working on analog circuits for serial links at speeds ranging from 3.2 to 12.8 Gb/s.

BALAN et al.: 4.8–6.4-GB/S SERIAL LINK FOR BACKPLANE APPLICATIONS USING DECISION FEEDBACK EQUALIZATION

Eric Zhang (M’00) received the M.S. and Ph.D. degrees in electrical engineering from the University of Maryland, College Park, in 1992 and 1996, respectively. From 1996 to 2000, he was with Integrated Device Technology, Santa Clara, CA, working on advanced silicon IC device development and modeling, and later on analog and mixed-signal design of SERDES for SONET applications. In 2000 he joined LSI Logic, where he has worked on design of Ethernet PHYs and high-speed transceivers for backplane applications.

1967

Freeman Zhong (M’00) received the B.S. degree (with high honor) in physics from Guangzhou University, Guangzhou, China, in 1983, the M.S. degree in solid-state physics from Jinan University, Guangzhou, China, in 1986, and the M.S. degree in electrical engineering from San Jose State University, San Jose, CA, in 1995. From 1995 to 1997, he was with National Semiconductor Corporation, Santa Clara, CA, working on analog and mixed-signal circuit designs. From 1997 to 2000, he was with NEC Corporation, Santa Clara, working on mixed-signal circuit and SOC designs for hard-disk controllers. Since 2000, he has been with LSI Logic Corporation, Milpitas, CA, as a Senior Design Manager leading developments of Ethernet PHY and high-speed SerDes.

Suggest Documents