SPARTAN 6 Family Overview Xilinx, Inc., Product Specification, DS160 (v2.0) October 25, 2011

SPARTAN 6 Family Overview Xilinx, Inc., Product Specification, DS160 (v2.0) October 25, 2011. Document Review – text based description of important el...
Author: Timothy Powell
0 downloads 2 Views 4MB Size
SPARTAN 6 Family Overview Xilinx, Inc., Product Specification, DS160 (v2.0) October 25, 2011. Document Review – text based description of important elements! Summary of Features •









• • •



Spartan-6 Family o Spartan-6 LX FPGA: Logic optimized (our device) o Spartan-6 LXT FPGA: High-speed serial connectivity Designed for low cost o Multiple efficient integrated blocks o Optimized selection of I/O standards o Staggered pads o High-volume plastic wire-bonded packages Low static and dynamic power o 45 nm process optimized for cost and low power o Hibernate power-down mode for zero power o Suspend mode maintains state and configuration with multi-pin wake-up, control enhancement o High performance 1.2V core voltage (LX and LXT FPGAs, -2, -3, and -3N speed grades) Multi-voltage, multi-standard SelectIO™ interface banks o Up to 1,080 Mb/s data transfer rate per differential I/O o Selectable output drive, up to 24 mA per pin o 3.3V to 1.2V I/O standards and protocols o Low-cost HSTL and SSTL memory interfaces o Hot swap compliance o Adjustable I/O slew rates to improve signal integrity High-speed GTP serial transceivers in the LXT FPGAs (not our device) o Up to 3.2 Gb/s o High-speed interfaces including: Serial ATA, Aurora, 1G Ethernet, PCI Express, OBSAI, CPRI, EPON, GPON, DisplayPort, and XAUI Integrated Endpoint block for PCI Express designs (LXT) Low-cost PCI® technology support compatible with the 33 MHz, 32- and 64-bit specification. Efficient DSP48A1 slices o High-performance arithmetic and signal processing o Fast 18 x 18 multiplier and 48-bit accumulator o Pipelining and cascading capability o Pre-adder to assist filter applications Integrated Memory Controller blocks o DDR, DDR2, DDR3, and LPDDR support o Data rates up to 800 Mb/s (12.8 Gb/s peak bandwidth) o Multi-port bus structure with independent FIFO to reduce design timing issues

Summary of Features continued •









• •

Abundant logic resources with increased logic capacity o Optional shift register or distributed RAM support o Efficient 6-input look-up tables (LUTs) improve performance and minimize power o LUT with dual flip-flops for pipeline centric applications Block RAM with a wide range of granularity o Fast block RAM with byte write enable o 18 Kb blocks that can be optionally programmed as two independent 9 Kb block RAMs Clock Management Tile (CMT) for enhanced performance o Low noise, flexible clocking o Digital Clock Managers (DCMs) eliminate clock skew and duty cycle distortion o Phase-Locked Loops (PLLs) for low-jitter clocking o Frequency synthesis with simultaneous multiplication, division, and phase shifting o Sixteen low-skew global clock networks Simplified configuration, supports low-cost standards o 2-pin auto-detect configuration o Broad third-party SPI (up to x4) and NOR flash support o Feature rich Xilinx Platform Flash with JTAG o MultiBoot support for remote upgrade with multiple bitstreams, using watchdog protection Enhanced security for design protection o Unique Device DNA identifier for design authentication o AES bitstream encryption in the larger devices Faster embedded processing with enhanced, low cost, MicroBlaze™ soft processor Industry-leading IP and reference designs

Our Device:

XC6SLX16

Configurable Logic Blocks:

2278 (approximate 14,579 logic cells)

DSP48A1 Slices

32

Block RAM Blocks

32 x 18 Kb

Clock Management Tiles (CMT):

2

Memory Controller Blocks:

2

I/O Banks:

4

Configuration Spartan-6 FPGAs store the customized configuration data in SRAM-type internal latches. • • • •

The number of configuration bits is between 3 Mb and 33 Mb depending on device size and user-design implementation options. The configuration storage is volatile and must be reloaded whenever the FPGA is powered up. This storage can also be reloaded at any time by pulling the PROGRAM_B pin Low. Several methods and data formats for loading configuration are available. o JTAG configuration mode o Master Serial/SPI configuration mode (x1, x2, and x4) o Slave Serial configuration mode o Master SelectMAP/BPI configuration mode (x8 and x16) o Slave SelectMAP configuration mode (x8 and x16)

The bitstream configuration information is generated by the ISE® software using a program called BitGen. The configuration process typically executes the following sequence: • • • •



Detects power-up (power-on reset) or PROGRAM_B when Low. Clears the whole configuration memory. Samples the mode pins to determine the configuration mode: master or slave, bit-serial or parallel. Loads the configuration data starting with the bus-width detection pattern followed by a synchronization word, checks for the proper device code, and ends with a cyclic redundancy check (CRC) of the complete bitstream. Starts a user-defined sequence of events: releasing the internal reset (or preset) of flip-flops, optionally waiting for the DCMs and/or PLLs to lock, activating the output drivers, and transitioning the DONE pin to High.

JTAG or Master Serial/SPI configuration mode (x1, x2, and x4)

CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged sideby-side as part of two vertical columns. There are three types of CLB slices in the Spartan-6 architecture: SLICEM, SLICEL, and SLICEX. Each slice contains four LUTs, eight flip-flops, and miscellaneous logic. The LUTs are for general-purpose combinatorial and sequential logic support. Synthesis tools take advantage of these highly efficient logic, arithmetic, and memory features. Expert designers can also instantiate them.

SLICEM One quarter (25%) of Spartan-6 FPGA slices are SLICEMs. Each of the four SLICEM LUTs can be configured as either a 6-input LUT with one output, or as dual 5-input LUTs with identical 5bit addresses and two independent outputs. These LUTs can also be used as distributed 64-bit RAM with 64 bits or two times 32 bits per LUT, as a single 32-bit shift register (SRL32), or as two 16-bit shift registers (SRL16s) with addressable length. Each LUT output can be registered in a flip-flop within the CLB. For arithmetic operations, a high-speed carry chain propagates carry signals upwards in a column of slices.

SLICEL One quarter (25%) of Spartan-6 FPGA slices are SLICELs, which contain all the features of the SLICEM except the memory/shift register function.

SLICEX One half (50%) of Spartan-6 FPGA slices are SLICEXs. The SLICEXs have the same structure as SLICELs except the arithmetic carry option and the wide multiplexers.

SLICEM

One quarter (25%) of Spartan-6 FPGA slices are SLICEMs. Each of the four SLICEM LUTs can be configured as either a 6-input LUT with one output, or as dual 5-input LUTs with identical 5bit addresses and two independent outputs. These LUTs can also be used as distributed 64-bit RAM with 64 bits or two times 32 bits per LUT, as a single 32-bit shift register (SRL32), or as two 16-bit shift registers (SRL16s) with addressable length. Each LUT output can be registered in a flip-flop within the CLB. For arithmetic operations, a high-speed carry chain propagates carry signals upwards in a column of slices.

SLICEL

Spartan-6 FPGA, Configurable Logic Block User Guide, UG384 (v1.1) February 23, 2010 One quarter (25%) of Spartan-6 FPGA slices are SLICELs, which contain all the features of the SLICEM except the memory/shift register function.

SLICEX

Spartan-6 FPGA, Configurable Logic Block User Guide, UG384 (v1.1) February 23, 2010

One half (50%) of Spartan-6 FPGA slices are SLICEXs. The SLICEXs have the same structure as SLICELs except the arithmetic carry option and the wide multiplexers.

Look‐Up Table (LUT) The function generators in Spartan-6 FPGAs are implemented as six-input look-up tables (LUTs). There are six independent inputs (A inputs - A1 to A6) and two independent outputs (O5 and O6) for each of the four function generators in a slice (A, B, C, and D). The function generators can implement any arbitrarily defined six-input Boolean function. Each function generator can also implement two arbitrarily defined five-input Boolean functions, as long as these two functions share common inputs.

Storage Elements Each slice has eight storage elements. There are four storage elements in a slice that can be configured as either edge-triggered D-type flip-flops or level-sensitive latches. The D input can be driven directly by a LUT output via AFFMUX, BFFMUX, CFFMUX or DFFMUX, or by the BYPASS slice inputs bypassing the function generators via AX, BX, CX, or DX input. When configured as a latch, the latch is transparent when the CLK is Low. In Spartan-6 devices, there are four additional storage elements that can only be configured as edge-triggered D-type flip-flops. The D input can be driven by the O5 output of the LUT. When the original 4 storage elements are configured as latches, these 4 additional storage elements can not be used.

Distributed RAM and Memory (SLICEM only) The function generators in SLICEMs add a data input and write enable that allows the function generator to be implemented as distributed RAM. RAM resources are configurable within a SLICEM to implement the distributed RAM shown in Table 5. Multiple LUTs in a SLICEM can be combined in various ways to store more data. Distributed RAM is fast, localized, and ideal for small data buffers, FIFOs, or register files. For larger memory requirements, consider using the 18K block RAM resources Distributed RAM are synchronous (write) and asynchronous (read) resources. However, a synchronous read resource can be implemented with a storage element or a flip-flop in the same slice. By placing this flip-flop, the distributed RAM performance is improved by decreasing the delay into the clock-to-out value of the flip-flop. However, an additional clock latency is added. The distributed resources share the same clock input. For a write operation, the Write Enable (WE) input, driven by either the CE or WE pin of a SLICEM, must be set High. Table 5 shows the number of LUTs (four per slice) occupied by each distributed RAM configuration.

If four single-port 64 x 1-bit modules are built, the four RAM64X1S primitives can occupy a SLICEM, as long as they share the same clock, write enable, and shared read and write port address inputs. This configuration equates to 64 x 4-bit single-port distributed RAM.

If two single-port 128 x 1-bit modules are built, the two RAM128X1S primitives can occupy a SLICEM, as long as they share the same clock, write enable, and shared read and write port address inputs. This configuration equates to 128 x 2-bit single-port distributed RAM.

Using distributed RAM for memory depths of 64 bits or less is generally more efficient than block RAM in terms of resources, performance, and power. For depths greater than 64 bits but less than or equal to 128 bits, use the following guidelines: • • • •

To conserve LUT resources, use any extra block RAM For asynchronous read capability, use distributed RAM For widths greater than 16 bits, use block RAM For shorter clock-to-out timing and fewer placement restrictions, use registered distributed RAM

Read Only Memory (ROM) Each function generator can implement a 64 x 1-bit ROM. Three configurations are available: ROM64x1, ROM128x1, and ROM256x1. ROM contents are loaded at each device configuration. Table 6 shows the number of LUTs occupied by each ROM configuration.

Shift Registers (SLICEM only) A SLICEM function generator can also be configured as a 32-bit shift register without using the flip-flops available in a slice. Used in this way, each LUT can delay serial data anywhere from one to 32 clock cycles. The shiftin D (DI1 LUT pin) and shiftout Q31 (MC31 LUT pin) lines cascade LUTs to form larger shift registers. The four LUTs in a SLICEM are thus cascaded to produce delays up to 128 clock cycles. It is also possible to combine shift registers across more than one SLICEM. Note that there are no direct connections between slices to form longer shift registers, nor is the MC31 output at LUT B/C/D available. The resulting programmable delays can be used to balance the timing of data pipelines.

Figure 19 shows two 16-bit shift registers. The example shown can be implemented in a single LUT.

Multiplexers Function generators and associated multiplexers in SLICEL or SLICEM can implement the following: • • •

4:1 multiplexers using one LUT 8:1 multiplexers using two LUTs 16:1 multiplexers using four LUTs

Fast Lookahead Carry Logic In addition to function generators, SLICEM and SLICEL (but not SLICEX) contain dedicated carry logic to perform fast arithmetic addition and subtraction in a slice. A CLB has one carry chain, as shown in Figure 1. The carry chains are cascadable to form wider add/subtract logic, as shown in Figure 2. The carry chain in the Spartan-6 device is running upward and has a height of four bits per slice. For each bit, there is a carry multiplexer (MUXCY) and a dedicated XOR gate for adding/subtracting the operands with a selected carry bits. Typically, the carry logic allows four bits of a counter or other arithmetic function to fit in each slice, independent of the function's total size. The dedicated carry path and carry multiplexer (MUXCY) can also be used to cascade function generators for implementing wide logic functions. Figure 26 illustrates the carry chain with associated logic elements in a slice.

Interconnect Resources Interconnect is the programmable network of signal pathways between the inputs and outputs of functional elements within the FPGA, such as IOBs, CLBs, DSP slices, and block RAM. Interconnect, also called routing, is segmented for optimal connectivity. The Xilinx Place and Route (PAR) tool within the ISE Design Suite software exploits the rich interconnect array to deliver optimal system performance and the fastest compile times. Spartan-6 FPGA Interconnect Types The Spartan-6 FPGA CLBs are arranged in a regular array inside the FPGA. Each connects to a switch matrix for access to the general-routing resources, which run vertically and horizontally between the CLB rows and columns (Figure 29). A similar switch matrix connects other resources, such as the DSP slices and block RAM resources.

The various types of routing in the Spartan-6 architecture are primarily defined by their length (Figure 30). Longer routing elements are faster for longer distances. Fast Interconnects Fast connects route block outputs back to block inputs. Along with the larger size of the CLB, fast connects provide higher performance for simpler functions. Single Interconnects Singles route signals to neighboring tiles, both vertically and horizontally. Double Interconnects Doubles connect to every other tile, both horizontally and vertically, in all four directions, and to the diagonally adjacent tiles. Quad Interconnects Quads connect to one out of every four tiles, horizontally and vertically, and diagonally to tiles two rows and two columns distant. Quad lines provide more flexibility than the single-channel long lines of earlier generations.

Interconnect Delay and Optimization Interconnect delays vary according to the specific implementation and loading in a design. The type of interconnect, distance required to travel in the device, and number of switch matrices to traverse factor into the total delay. A good estimate of interconnect delay is to use the same value as the block delays in a path. Most timing issues are addressed by examining the block delays and determining the impact of using fewer levels or faster paths. If interconnect delays seem too long, increase PAR effort levels or iterations to improve performance along with making sure that the required timing is in the constraints file. Nets with critical timing or that are heavily loaded can often be improved by replicating the source of the net. The dual 5-input LUT configuration of the slice simplifies the replication of logic in the same slice, which minimizes any additional loads on the inputs to the source function. Replicating logic in multiple slices gives the software more flexibility to place the sources independently. Interconnect delays are typically improved not by changing the interconnect but by changing the placement. This is the Floorplanning process. Global Controls In addition to the general-purpose interconnect, Spartan-6 FPGAs have two global logic control signals, as described in Table 8.

Use the GSR control in a design instead of a separate global reset signal to make CLB inputs available, which results in a smaller more efficient design. The GSR signal must always reinitialize every flip-flop. The GSR signal is asserted automatically during the FPGA configuration process, guaranteeing that the FPGA starts up in a known state. Using GSR and GTS does not use any general-purpose routing resources.

Clock Management Summary The Spartan-6 FPGA Clock Management Tiles (CMTs) provide very flexible, high performance clocking. The Spartan-6 FPGA CMT blocks (Figure 2-1) are located in the center column along the vertical global clock tree. Each CMT block contains two DCMs and one PLL.

Our devices has 2 Clock Management Tiles. The clock management tile (CMT) in Spartan-6 FPGAs includes two DCMs and one PLL. There are dedicated routes within a CMT to couple together various components. Each block within the tile can be treated separately; however, there exists a dedicated routing between blocks creating restrictions on certain connections. Using these dedicated routes frees up global resources for other design elements. Additionally, the use of local routes within the CMT provides an improved clock path because the route is handled locally, reducing chances for noise coupling.

The CMT diagram (Figure 3-1) shows a high-level view of the connection between the various clock input sources and the DCM-to-PLL and PLL-to-DCM dedicated routing. The six (total) PLL output clocks are MUXed into a single clock signal for use as a reference clock to the DCMs. Two output clocks from the PLL can drive the DCMs. These two clock are 100% independent. PLL output clock 0 could drive DCM1 while PLL output clock 1 could drive DCM2. Each DCM output can be MUXed into a single clock signal for use as a reference clock to the PLL. Only one DCM can be used as the reference clock to the PLL at any given time. A DCM can not be inserted in the feedback path of the PLL. Both the PLLs or DCMs of a CMT can be used separately as stand-alone functions. The outputs from the PLL are not spread spectrum.

Phase Lock Loop (PLL) The PLL can serve as a frequency synthesizer for a wider range of frequencies and as a jitter filter for incoming clocks in conjunction with the DCMs. The heart of the PLL is a voltagecontrolled oscillator (VCO) with a frequency range of 400 MHz to 1,080 MHz, thus spanning more than one octave. Three sets of programmable frequency dividers (D, M, and O) adapt the VCO to the required application.

Phase Lock Loop (PLL) Spartan-6 devices contain up to six CMT tiles. The main purpose of PLLs is to serve as a frequency synthesizer for a wide range of frequencies, and to serve as a jitter filter for either external or internal clocks in conjunction with the DCMs of the CMT. The PLL block diagram shown in Figure 3-2 provides a general overview of the PLL components.

The pre-divider D (programmable by configuration) reduces the input frequency and feeds one input of the traditional PLL phase comparator. The feedback divider (programmable by configuration) acts as a multiplier because it divides the VCO output frequency before feeding the other input of the phase comparator. D and M must be chosen appropriately to keep the VCO within its controllable frequency range. The VCO has eight equally spaced outputs (0°, 45°, 90°, 135°, 180°, 225°, 270°, and 315°). Each can be selected to drive one of the six output dividers, O0 to O5 (each programmable by configuration to divide by any integer from 1 to 128).

DCM Summary Digital Clock Managers (DCMs) provide advanced clocking capabilities to Spartan-6 FPGA applications. Primarily, DCMs eliminate clock skew, thereby improving system performance. Similarly, a DCM optionally phase shifts the clock output to delay the incoming clock by a fraction of the clock period. DCMs optionally multiply or divide the incoming clock frequency to synthesize a new clock frequency. The DCMs integrate directly with the global low-skew clock distribution network.

DCMs solve a variety of common clocking issues, especially in high-performance, highfrequency applications: • Eliminate clock skew, either within the device or to external components, to improve overall system performance and to eliminate clock distribution delays. • Phase shift a clock signal, either by a fixed fraction of a clock period or by incremental amounts. • Multiply or divide an incoming clock frequency or synthesize a completely new frequency by a mixture of static or dynamic clock multiplication and division. • Condition a clock, ensuring a clean output clock with a 50% duty cycle. • Mirror, forward, or rebuffer a clock signal, often to deskew and convert the incoming clock signal to a different I/O standard. For example, forwarding and converting an incoming LVTTL clock to LVDS. • Clock input jitter filtering • Free-running oscillator • Spread-spectrum clock generation

The DCM provides four phases of the input frequency (CLKIN): shifted 0°, 90°, 180°, and 270° (CLK0, CLK90, CLK180, and CLK270). It also provides a doubled frequency CLK2X and its complement CLK2X180. The CLKDV output provides a fractional clock frequency that can be phase-aligned to CLK0. The fraction is programmable as every integer from 2 to 16, as well as 1.5, 2.5, 3.5 . . . 7.5. CLKIN can optionally be divided by 2. The DCM can be a zero-delay clock buffer when a clock signal drives CLKIN, while the CLK0 output is fed back to the CLKFB input.

Frequency Synthesis Independent of the basic DCM functionality, the frequency synthesis outputs CLKFX and CLKFX180 can be programmed to generate any output frequency that is the DCM input frequency (FIN) multiplied by M and simultaneously divided by D, where M can be any integer from 2 to 32 and D can be any integer from 1 to 32. Phase Shifting With CLK0 connected to CLKFB, all nine CLK outputs (CLK0, CLK90, CLK180, CLK270, CLK2X, CLK2X180, CLKDV, CLKFX, and CLKFX180) can be shifted by a common amount, defined as any integer multiple of a fixed delay. A fixed DCM delay value (fraction of the input period) can be established by configuration and can also be incremented or decremented dynamically. Spread‐Spectrum Clocking The DCM can accept and track typical spread-spectrum clock inputs, provided they abide by the input clock specifications listed in the Spartan-6 FPGA Data Sheet: DC and Switching Characteristics. Spartan-6 FPGAs can generate a spread spectrum clock source from a standard fixed-frequency oscillator.

Clock Distribution Each Spartan-6 FPGA provides abundant clock lines to address the different clocking requirements of high fanout, short propagation delay, and extremely low skew. Global Clock Lines In each Spartan-6 FPGA, 16 global-clock lines have the highest fanout and can reach every flipflop clock. Global clock lines must be driven by global clock buffers, which can also perform glitchless clock multiplexing and the clock enable function. Global clocks are often driven from the CMTs, which can completely eliminate the basic clock distribution delay.

Global Clocking Infrastructure The detailed Spartan-6 FPGA global clocking infrastructure is shown in Figure 1-1.

I/O Clocking Infrastructure I/O clocks are especially fast and serve only the localized input and output delay circuits and the I/O serializer/deserializer (SERDES) circuits, as described in the I/O Logic section.

All SelectIO logic resources (input registers, output registers, IDDR2, ODDR2, ISERDES2, and OSERDES2) must be driven by a clock coming from either a BUFIO2 located within the same BUFIO2 clocking region, one of the BUFPLLs on the same edge of the device, or one of the 16 BUFGs.

Block RAM Every Spartan-6 FPGA has between 12 and 268 dual-port block RAMs, each storing 18 Kb. Each block RAM has two completely independent ports that share only the stored data. Synchronous Operation Each memory access, whether read or write, is controlled by the clock. All inputs, data, address, clock enables, and write enables are registered. The data output is always latched, retaining data until the next operation. An optional output data pipeline register allows higher clock rates at the cost of an extra cycle of latency. Programmable Data Width • •

Each port can be configured as 16K × 1, 8K × 2, 4K × 4, 2K × 9 (or 8), 1K × 18 (or 16), or 512 x 36 (or 32). Each block RAM can be divided into two completely independent 9 Kb block RAMs that can each be configured to any aspect ratio from 8K x 1 to 512 x 18, with 256 x 36 supported in simple dual-port mode.

Memory Controller Block Most Spartan-6 devices include dedicated memory controller blocks (MCBs), each targeting a single-chip DRAM (either DDR, DDR2, DDR3, or LPDDR), and supporting access rates of up to 800 Mb/s. The MCB has dedicated routing to predefined FPGA I/Os. If the MCB is not used, these I/Os are available as general purpose FPGA I/Os. The memory controller offers a complete multi-port arbitrated interface to the logic inside the Spartan-6 FPGA. Commands can be pushed, and data can be pushed to and pulled from independent built-in FIFOs, using conventional FIFO control signals. The multi-port memory controller can be configured in many ways. An internal 32-, 64-, or 128-bit data interface provides a simple and reliable interface to the MCB. The MCB can be connected to 4-, 8-, or 16-bit external DRAM. The MCB, in many applications, provides a faster DRAM interface compared to traditional internal data buses, which are wider and are clocked at a lower frequency. The FPGA logic interface can be flexibly configured irrespective of the physical memory device. The MCB functionality is not supported in the -3N speed grade.

Software and Tool Support The Spartan-6 FPGA MCB is supported by standard software and tool flows like other soft and embedded IP blocks offered by Xilinx. For conventional (i.e., non-embedded) FPGA designs, the MCB can be integrated into a design using the Memory Interface Generator (MIG) tool, available in the CORE Generator tool. The MIG tool is used to generate memory interfaces for all Xilinx FPGAs. It produces the necessary RTL design files, user constraints files (UCFs), and script files for simulation and implementation of memory solutions offered by Xilinx. The Getting Started chapter in UG416, Spartan-6 FPGA Memory Interface Solutions User Guide, contains detailed step-by-step instructions on how to use the MIG tool to implement memory interfaces based on the MCB.

Digital Signal Processing—DSP48A1 Slice DSP applications use many binary multipliers and accumulators, best implemented in dedicated DSP slices. All Spartan-6 FPGAs have many dedicated, full-custom, low-power DSP slices, combining high speed with small size, while retaining system design flexibility. Each DSP48A1 slice consists of a dedicated 18 × 18 bit two's complement multiplier and a 48bit accumulator, both capable of operating at up to 390 MHz. The DSP48A1 slice provides extensive pipelining and extension capabilities that enhance speed and efficiency of many applications, even beyond digital signal processing, such as wide dynamic bus shifters, memory address generators, wide bus multiplexers, and memory-mapped I/O register files. The accumulator can also be used as a synchronous up/down counter. The multiplier can perform barrel shifting.

Our Device: 32 DSP48A1 slices in 2 columns

Input/Output The number of I/O pins varies from 102 to 576, depending on device and package size. Each I/O pin is configurable and can comply with a large number of standards, using up to 3.3V. The Spartan-6 FPGA SelectIO Resources User Guide describes the I/O compatibilities of the various I/O options. With the exception of supply pins and a few dedicated configuration pins, all other package pins have the same I/O capabilities, constrained only by certain banking rules. All user I/O is bidirectional; there are no input-only pins. All I/O pins are organized in banks, with four banks on the smaller devices and six banks on the larger devices. Each bank has several common VCCO output supply-voltage pins, which also powers certain input buffers. Some single-ended input buffers require an externally applied reference voltage (VREF). There are several dual-purpose VREF-I/O pins in each bank. In a given bank, when I/O standard calls for a VREF voltage, each VREF pin in that bank must be connected to the same voltage rail and can not be used as an I/O pin.

I/O Electrical Characteristics Single-ended outputs use a conventional CMOS push/pull output structure, driving High towards VCCO or Low towards ground, and can be put into high-Z state. Many I/O features are available to the system designer to optionally invoke in each I/O in their design, such as weak internal pull-up and pull-down resistors, strong internal split-termination input resistors, adjustable output drive-strengths and slew-rates, and differential termination resistors. See the Spartan-6 FPGA SelectIO Resources User Guide for more details on available options for each I/O standard.

I/O Logic Input and Output Delay This section describes the available logic resources connected to the I/O interfaces. All inputs and outputs can be configured as either combinatorial or registered. Double data rate (DDR) is supported by all inputs and outputs. Any input or output can be individually delayed by up to 256 increments (except in the -1L speed grade). This is implemented as IODELAY2. The identical delay value is available either for data input or output. For a bidirectional data line, the transfer from input to output delay is automatic. The number of delay steps can be set by configuration and can also be incremented or decremented while in use. Because these tap delays vary with supply voltage, process, and temperature, an optional calibration mechanism is built into each IODELAY2: •



For source synchronous designs where more accuracy is required, the calibration mechanism can (optionally) determine dynamically how many taps are needed to delay data by one full I/O clock cycle, and then programs the IODELAY2 with 50% of that value, thus centering the I/O clock in the middle of the data eye. A special mode is available only for differential inputs, which uses a phase-detector mechanism to determine whether the incoming data signal is being accurately sampled in the middle of the eye. The results from the phase-detector logic can be used to either increment or decrement the input delay, one tap at a time, to ensure error-free operation at very high bit rates.

ISERDES and OSERDES Many applications combine high-speed bit-serial I/O with slower parallel operation inside the device. This requires a serializer and deserializer (SerDes) inside the I/O structure. Each input has access to its own deserializer (serial-to-parallel converter) with programmable parallel width of 2, 3, or 4 bits. Where differential inputs are used, the two serializers can be cascaded to provide parallel widths of 5, 6, 7, or 8 bits. Each output has access to its own serializer (parallelto-serial converter) with programmable parallel width of 2, 3, or 4 bits. Two serializers can be cascaded when a differential driver is used to give access to bus widths of 5, 6, 7, or 8 bits. When distributing a double data rate clock, all SerDes data is actually clocked in/out at single data rate to eliminate the possibility of bit errors due to duty cycle distortion. This faster single data rate clock is either derived via frequency multiplication in a PLL, or doubled locally in each IOB by differentiating both clock edges when the incoming clock uses double data rate.