VPX for Array Signal Processing System

Full Paper NNGT Int. J. on Signal Processing and Imaging Engineering, Vol. 1, July 2014 VPX for Array Signal Processing System Xiangyang Li, Jinhua Z...

Author: Monica Lewis

10 downloads 0 Views 295KB Size

Report

Download PDF

Recommend Documents

INF5410 Array signal processing. Chapter 2.3 Attenuation

Signal Processing Challenges for Phased-Array Radar Meteorology

UFO compact - Signal Processing System

Array-OL Revisited, Multidimensional Intensive Signal Processing Specification

ARRAY AND MULTICHANNEL SIGNAL PROCESSING USING NONPARAMETRIC STATISTICS

Signal Processing for Atmospheric Radars

signal processing

SIGNAL PROCESSING FOR MEMS SENSOR BASED MOTION ANALYSIS SYSTEM

Signal processing the human circulatory system for stability and regulation

Parallel Array Processing

Datasheet Mixed-Signal Gate Array

Fast Mixed-Signal System Prototyping using Unique Programmable Analog Array

CASSETTE TAPE SIGNAL PROCESSING

Digital Signal Processing 2

Digital Hardware Signal Processing

Automated Histology Analysis: Opportunities for Signal Processing

MATLAB Simulation for Digital Signal processing

OpenVPX Backplane Profiles: Making Sense of System Interoperability For VPX

Optimum Signal Processing

Digital Signal Processing 2

Digital Signal Processing Lab

Intelligent Vibration Signal Processing for Condition Monitoring

Adaptive subband signal processing for hearing instruments

Digital Signal Processing

Full Paper NNGT Int. J. on Signal Processing and Imaging Engineering, Vol. 1, July 2014

VPX for Array Signal Processing System Xiangyang Li, Jinhua Zhou, Siyang Liu Department of Computer Science Guangdong AIB Polytechnic College Guangzhou, China {xyli, jhzhou, syliu}@gdaib.edu.cn

In recent years, CPCI bus and VME (versa module eurocard) bus combated each other and developed together. VME bus has been in a dominant position in highperformance, real-time industrial applications. Due to its open architecture system and common operating system and other advantages, CPCI comes from behind, and has been widely used in the PC, IPC, military equipment, and other fields. Unfortunately, in the times of massive data bandwidth, these 2 kinds of buses have appeared to be inadequate. Therefore, VITA (VMEbus international trade association) first introduced VPX (versatile protocol switch) technology standards (VITA46) and REDI reinforcement enhanced mechanical design specifications (VITA48) in 2006 [4]. These specifications not only break through Gigabytes transmission in bandwidth, and might become very good solution to the reinforcement, high-speed interconnect, management and other aspects of the problem, and can be widely used in aviation, aerospace, radar, underwater exploration, communications and other harsh environment fields.

Abstract—While the traditional multi-processors system can not meet the higher requirement for high performance, high bandwidth and hush environment resistant computing platform, a new high-speed parallel system based on versatile protocol switch (VPX), multiply DSPs and FPGAs is proposed. The paper introduces the VPX roadmap and architecture, discusses the system design in detail, including the overall system architecture, DSP subsystem and FPGA subsystem. The performance of the system is also evaluated. The system based on TMS320C6678 can apply widely in the voice processing, sonar, radar, communication, real-time image processing and other high-speed signal processing fields and carry out long-time test under different environment. The test result shows that the platform has a lot of advantages with high performance, high data bandwidth, excellent stability, strong anti-interference capability and easy maintenance. Keywords—parallel processing; high-speed interconnection; VPX; SRIO; TMS320C6678; management software

I.

INTRODUCTION

II.

Since 1980s array signal processing systems based on VME bus are widely used in the defense, aviation and military and other fields for its reliability, stability, scalability, etc. In recent years, with the radar [1], sonar [2,3] and other high-performance array signal processing system is developing rapidly, the rapid development of array signal processing techniques, adaptive beamforming, signal detection and other complex signal processing algorithms for arithmetic processing system speed, bus bandwidth, data throughput, work environment and other needs exploding, the calculation tasks are significantly increased which these new signal processing algorithms or techniques typically require, and the algorithms are also generally much more complicated than ever, modern array signal processing system requires more computing power, more flexible and higher data throughput for the real-time goal, and the VME bus-based array signal processing systems become stretched. Traditional PCI, VME parallel bus, etc. are gradually fade into history, facing peer to peer, serial I/O, the differential signal PCI Express, Serial RapidIO, Hypertransport serial interconnect has become a new generation of future technology trends.

© N&N Global Technology 2014 DOI : 01.IJSPIE.2014.1.1

VPX ROADMAP

VPX is composed of a serial of standards developed by VITA organization to meet the harsh environment of high reliability, high bandwidth requirements of the next generation of advanced computing platform standards. Upgrade from the VME bus VPX is compatible with the XMC, Fibred Channel, PCI-Express, RapidIO and Hypertransport other high-speed serial bus protocol. Born in 1981, the VME bus is a universal computer bus with open architecture that combines Motorola's Versa bus electrical specifications and Eurocard mechanical packaging standards. With the development of electronic technology, VME has repeatedly launched upgraded version, such as VME64, VME64x, VME320, and VPX. VPX (VITA 46) was started in March 2003 and ratified in 2007 [4-8]. OpenVPX (VITA 65) was started in January 2009 (outside of VITA) and brought into VITA at the end of 2009. The first revision of VITA 65 was released in 2010 and revision 2 (current rev) was released in 2012 [9]. VME64 extends the data width from 32 bits to 64 bits, the connector from the 3 lines 96 pins to 5 lines 160 pins, the bandwidth is up to 80 MB/s. In addition, VME64 supports for hot-swappable. VME64x adds an additional connector

1

Full Paper

NNGT Int. J. on Signal Processing and Imaging Engineering, Vol. 1, July 2014

designed and implemented. The calculation daughter board (processing node) in the system uses two high-performance network multimedia processors TMS320C6678 or C6678 and a Xilinx Virtex6 Series FPGA XC6VLX240T-FF1759 chip to build an on-board pipelined structure as a processing node. Each processing node has the following characteristics:  Two C6678 processor with 16 cores and the total floating point capability up to 320GFLOPS;  Each TMS320C6678 connects 4Gbits DDR3 SDRAM memory; supports NOR Flash, Ethernet interface, HyperLink between the DSP.  FPGA supports fiber 2.5Gbitsps input and output, 2× DDR3 SDRAM, 256MB (32-bit width), a PCIe bus interface, a GMII Ethernet interface, a NOR Flash interface, and four RS422 interface.  DSP and FPGA directly interconnected through Serial RapidIO (SRIO) × 4, TSIF0, I2C, SPI, and Uart Internet.  Front panel outputs 8-channel SFP fiber, 1 Ethernet interface, the reset button and light.  A SRIO switch connected to all processors, each processor data bandwidth up to 20Gbits/s, 4-port × 4 SRIO connected to the backplane, with total switch data transmission bandwidth of 80Gbits/s. The number of processing nodes can be increased or decreased according to the sensor array and the required processing speed. By increasing the number of processing nodes physically, the signal processing capabilities of the system can be exponentially improved.

and the data rates is up to 160 MB/s. VME320 uses dualedge source synchronous transfer protocol (2eSST) and theoretical bandwidth can be up to 320 MB/s. But users are still not satisfied with the bandwidth of the VME. In addition, a substantial increase in performance also brings a rapid increase in the heat and reduces the reliability. In recent years, in order to meet the requirements of higher bandwidth and greater cooling capacity, VITA has launched VXS (VITA41), VPX (VITA46) and REDI (VITA48) and a series of standards. VPX used in all MultiGig RT2 [10] connector has a tight, low insertion loss and bit error rate, etc. Combining with REDI standard, the VPX-REDI [6, 7] platform can meet the requirements of harsh environments and high bandwidth. High-speed serial data transmission technology has the advantages of wide bandwidth, low error, and high flexibility. This makes it to be the mainstream bus technology of high-performance array signal processing platform. The traditional shared parallel bus standards, such as PCI, VME and other bus specifications, has been unable to meet the requirements of high precision, high-resolution rate, real-time of array signal processing system. VPX is one of the new array signal processor bus standard. Its main features are:  Using MultiGig RT2 connector, this type of highspeed differential connector has silicon wafer (similar to the memory of the finger) structure with advantages of close connecting, insertion loss small and low error rate, etc. The silicon wafer has the ESD grounding layer and the contact layer, to prevent from the accidental discharge during operation.  The backplane is multi-board communications interconnect structure, applying PCI-Express, 10G Ethernet, Serial RapidIO (SRIO) and other interconnect specification, fully interconnect traffic, and centralized switch.  The board supports 200W of power, and with reinforcement cooling standard.  Define the air-cooled, conduction, water, reinforced thermal structure.  Defines a central switching, distributed switching backplane structure, the analog signal and optical signal standard backplane interconnect modules, power supplies standard IPMI-based intelligent management. III.

IV.

The array signal parallel processing system complies with VITA46 standards. All of the host and calculation daughter boards assemble in a chassis equipped with a VPX backplane. There are seven RT2 connectors numbered from P0 to P6 for each daughter board in the VPX chassis. P0 connector is for power supply, reference clock, system reset, system management and debug. P0 connector provides +5V and +12V power, the reference clock of 25MHz for the system, JTAG interface, system management and monitoring interface based on the I2C bus. 32 pairs of differential signals are defined in P1 to P6 connectors, to be used as high-speed serial communication paths. 16 pairs of differential signals in P1 port are for 4 Lane of the SRIO high-speed channel, each channel with bandwidth to 20Gbits/s, to be used for the full interconnection or centralized exchange in the form of communication with other boards. P2 port provides a single channel 2 Lane PCIe interface. P3 port is for the GTX communication of FPGA high-speed serial interface. P4 port provides for 4-way GMII Ethernet, and 4-way UART for Ethernet interconnect and debugging. P5 to P6 ports are for custom-defined interface. We will consider an OpenVPX DIS06 [18] backplane that allows for 5 payload (processor) slots and one switch slot (Figure 1). This is a uniform backplane topology that including four kinds of planes: data plane, control plane, management plane (IPMI), and utility plane. OpenVPX predefined four kinds of pipes: UTP (Ultra Thin Pipe), TP

RELATED WORK

Apart from the official standard specification for VITA [4-9], numerous technical articles can be found introducing the reader to the standard [11], comparing it to other popular interconnects technologies for embedded systems [12]. Several works target the implementation of VMEbus [13], SRIO [14], PCIe [15] in VPX. Some articles compare the performance of 10 GB Ethernet, Serial Rapid IO, and InfiniBand in VPX [16]. A few technical reports introduce the applications of VPX [17]. Our work is according to the requirements of an actual project, a scalable array signal parallel processing system based on VPX architecture is

© N&N Global Technology 2014 DOI : 01.IJSPIE.2014.1.1

SYSTEM ARCHITECTURE

2

Full Paper

NNGT Int. J. on Signal Processing and Imaging Engineering, Vol. 1, July 2014

(Thin Pipe), FP [Fat Pipe], DFP (Double Fat Pipe), QFP (Quad Fat Pipe), OFP (Octal Fat Pipe). Utility plane is a bus topology architecture, providing various powers, and functions of reset, configuration. IPMI plane implements redundant bus topology architecture by two buses, where the sender transmits simultaneously the same data in the two channels, and the receiver monitors the receiving state. When it is working properly, IPMI uses only one channel, when there is delay or link is broken, the receiver automatically switches to another channel. Data plane implements full mesh topology design in 1 to 5 slots, used to achieve high-speed data transmission, where each slot as a node is achieved four full-duplex connections with the other four slots. Control plane achieves double star redundant topology through the sixth slot (swap slot). The switch board as a central switching unit, there are two Gigabit Ethernet channels for each payload (processing) slot interconnected with the central switching unit, and the central switching unit connects each channel to a switching matrix, to achieve double star topology.

Consider system adaptability and fault tolerance, each daughter board has a common structure, and a unified electrical characteristics. The calculation daughter board has an ability of multi-directional access and accessed. The calculation daughter boards and the backplane constitute a crossbar switch network structure through SRIO interconnection. All the calculation daughter boards, host, and receiving sensor array constitute a ring network structure together. Calculation daughter board complies with the VITA 46 standard. The size of board is 6U, with high-speed serial exchange structure. By the constraints of power and size, each daughter board has two processor C6678 chips clocked at 1.25GHz organized by pipeline structure to constitute a processing node called a cluster. Each DSP’s external memory includes 64-bit DDR3S SDRAM and 128-bit NOR Flash for self-booting. The daughter board is inserted in backplane’s slots. The signal forms, the mechanical structure, heat, etc. used in daughter board are defined according to VITA 46 standard to meet a wide range of embedded computing applications. V.

A. Design Methodology Complex array signal processing algorithms need so large amount of computation that a single DSP cannot complete all the operations and control tasks alone. Otherwise the system will not meet the requirements of realtime and reliability. Complying with the modular design of the system, DSP plus FPGA structure is applied in the hardware design of the system. Because FPGA itself has a lot of logic resources, it is flexible for programming, and acts as a co-processor responsible for pre-processing and control. DSP is mainly responsible for the computing tasks, in order to improve the flexibility of the whole system. As shown in Figure 2, the computing daughter board has two chips of C6678 connected with the HyperLink bus [19]. C6678 includes the HyperLink bus for companion interfaces. HyperLink bus is a four-lane SerDes interface designed to operate at up to 12.5Gbaud per lane [20]. The supported data rates include 1.25 Gbaud, 3.125Gbaud, 6.25Gbaud, 10Gbaud and 12.5Gbaud. The interface is generally used to connect with external accelerators.

Figure 1. OpenVPX DIS06 backplane topology

The trends of the array signal processing system is a high-precision and high-resolution. The signal processing system is characterized by hard real-time and large volumes of data. The array signal processing platform must have capability of high-performance computing and high-speed data transmission. Use of the multiprocessor parallel technology and high-speed serial data transmission technology to build an array signal processing system, shown in Figure 1, is the current technology trends. While the array signal parallel processing system complies with VITA 46 specification, all the host and computing daughter boards are assembled in a chassis equipped with a VPX backplane, to constitute a distributed parallel processing system. The calculation daughter board uses two DSP chips to constitute a cluster as one processing node in the parallel processing system. The calculation daughter board is connected to the backplane by on-board VPX J0 to J6 connectors. The whole array signal parallel processing system consists of a backplane, a control host board, a function board, and some calculation daughter boards.

© N&N Global Technology 2014 DOI : 01.IJSPIE.2014.1.1

PARALLEL PIPELINED DAUGHTER BOARD DESIGN

B. TMS320C6678 The calculation is mainly done by the DSP, so the performance of DSP determines the performance of the whole system. C6678 is a multicore fixed and floating-point digital signal processor with eight C66x ™ DSP Core Subsystems (C66x CorePacs), each with 1.0GHz or 1.25GHz C66x fixed/floating-point CPU core, 32KB L1P memory, 32KB L1D memory, 512KB Local L2 memory. Its Multicore Shared Memory Controller (MSMC) includes 4096KB MSM SRAM Memory shared by Eight DSP C66x CorePacs, and memory protection unit for both MSM SRAM and DDR3_EMIF. Multicore Navigator includes 8192 multipurpose hardware queues with queue manager, packetbased DMA for zero-overhead transfers. Network

3

Full Paper

NNGT Int. J. on Signal Processing and Imaging Engineering, Vol. 1, July 2014

coprocessor packet accelerator enables support for transport plane IPsec, GTP-U, SCTP, PDCP, L2 user plane PDCP (RoHC, Air Ciphering), 1 Gbps wire-speed throughput at 1.5 M packets per second. Peripheral interface of TMS320C6678 includes four lanes of SRIO 2.1, PCIe Gen2, HyperLink, Gigabit Ethernet Switch Subsystem, 64-bit DDR3 Interface (DDR3-1600), 16-bit EMIF, two telecom serial ports (TSIP), UART, I2C, 16 GPIO pins, SPI, semaphore module, sixteen 64-bit Timers, and three on-chip PLLs. Four lanes of SRIO 2.1 supports 1.24/2.5/3.125/5 Gbaud operation per Lane, direct I/O, message passing, supports four 1x, two 2x, one 4x, and two 1x + one 2x link configurations. A single port of PCIe Gen2 supports 1 or 2 lanes, supports up to 5 Gbaud per lane. HyperLink supports connections to other KeyStone architecture devices providing resource scalability, supports up to 50Gbaud. GbE switch subsystem supports two SGMII Ports, 10/100/1000 Mbps operation. 16-bit EMIF supports for up to 256MB NAND flash and 16MB NOR flash, supports for asynchronous SRAM up to 1MB. [21-22] 1xRJ45

LEDs

JTTAG

PHY

PCIe @5Gbaud I2C/SPI/Uart

DDR3 SDRAM

Xilinx Virtex6 FF1759 LX240T

DDR3 SDRAM

D. SRIO: Interconnection between DSPs and FPGA The device external interface of universal high-speed buses includes 1394, USB, Gigabit Ethernet, and so on. The interconnection speed between the devices is limited by versatility and cost. This has become a technical bottleneck of high-speed processing system. It is very important to select a suitable device to interconnect two processors. To ensure the system performance, you should try to consider the selection of higher-bandwidth connection technology, at the same time to take into account the complexity and cost of the system design. To ensure to meet the performance requirements of the system bandwidth, giving priority to the scheme of low complexity, mature technology and low cost. According to the above considerations, the system selects TI's high-performance network multimedia processor C6678 and Xilinx Virtex6 FPGA XC6VLX240T-FF1759 chip. The former integrates an internal SRIO module, and the latter can take advantage of the internal logic resources and GTP transceiver to structure a SRIO module. The data transmission rate between DSP and FPGA is up to 3.125Gbits/s, and the payload data bandwidth is up to 2Gbits/s or more. [24-25] The SRIO port on the C6678 is a high-performance, low pin-count interconnect aimed for embedded markets. Each processor has two SRIO connections where each node is connected via Gen. 2 Serial RapidIO (SRIO) to an onboard SRIO switch, which in turn has four SRIO connections to the backplane, shown in Figure 3.

TI DSP TMS320C6678

EMIF

EEPROM

128bit NOR Flash PHY

HyperLink

LEDs

EEPROM I2C/SPI/Uart PCIe @5Gbaud

DDR3 SDRAM TI DSP TMS320C6678

EMIF

128bit NOR Flash PHY

4xSRIO @5Gbaud Power clock reset

J0

4xSRIO PCIe

J1

J2

Gigabit Ethernet Switch GTX

4xRJ45 GMII,UART

J3

J4

CPS-1616 IDT SRIO Switch

CPU 2

SRIO

SRIO

Payload Switch

SRIO

The use of the RapidIO interconnect in a daughter board design can create a homogeneous interconnect environment, providing even more connectivity and control among the components. RapidIO is based on the memory and device addressing concepts of processor buses where the transaction processing is managed completely by hardware. This enables the RapidIO interconnect to lower the system cost by providing lower latency, reduced overhead of packet data processing, and higher system bandwidth. The data transceiver uses two pairs of independent signals. According to SRIO specification, 1x mode uses two differential pairs to send and receive data, one differential pair for sending data, and another differential pair for receiving data. The transmission device is connected to the receiving device through AC-coupled connection. And a 0.01uF capacitor needs to be added near the receiving end for DC isolation [25]. The SRIO module of C6678 can support full-duplex operation at 1.25/2.5/3.125Gbits/s. The total data rate is up to 25Gbits/s. It facilitates the cascade of multi-chip DSP, and the data transmission between and DSP and FPGA.

J6

Figure 2. The hardware block diagram of the computing daughter board with one FPGA chip and two DSP chips

C. Virtex-6 FPGA Virtex-6 FPGAs [23] are the programmable silicon foundation for Targeted Design Platforms that deliver integrated software and hardware components to enable designers to focus on innovation as soon as their development cycle begins. In addition to the highperformance logic fabric, Virtex-6 FPGAs contain many built-in system-level blocks. These features allow logic designers to build the highest levels of performance and functionality into their FPGA-based systems. Built on a 40 nm state-of-the-art copper process technology, Virtex-6 FPGAs are a programmable alternative to custom ASIC technology. Virtex-6 FPGAs offer the best solution for addressing the needs of high-performance logic designers, high-performance DSP designers, and high-performance embedded systems designers with unprecedented logic, DSP, connectivity, and soft microprocessor capabilities.

© N&N Global Technology 2014 DOI : 01.IJSPIE.2014.1.1

SRIO

Figure 3. SRIO board connectivity

User defined interface J5

CPU 1

VPX Data Plane

Flash

DDR3 SDRAM

4

Full Paper

NNGT Int. J. on Signal Processing and Imaging Engineering, Vol. 1, July 2014

VI.

For Intelligent Platform Management control, each board includes a separate processing module used to implement the IPMI management function for the entire chassis, this module is independent of the main processing module, can implement automatic temperature control, module power up and down functions.

MANAGEMENT SOFTWARE

The VPX standard for system management (VITA 46.11) leverages the IPMI infrastructure for administrative functions at each level in a VPX system, down to individual mezzanines installed on the modules. These functions are essential in order to avoid manual interventions, which would be prohibitive in embedded systems and large-scale installations. In a flexible configuration, a system controller can automatically retrieve the identity of all the installed modules, and dynamically configure the software accordingly. In a static system, it can verify the presence and revision of the required modules, and retrieve their self-test status and alarm history. Dynamic information, like temperature, voltage and current monitoring can also be reported. The block diagram [26] of the platform management software is shown in Figure 4, where according to the function, from top to bottom, VPX system management software can be divided into three layers: system management layer, platform management layer, and system driver layer. The system management layer is the upper management software, to provide users with a system management interface. The Platform Management layer, locating in the middle of the software architecture, is the core part of VPX systems management software, responsible for management of the entire platform and communication with the System Manager and VMC controller. The system driver layer includes operating system and drivers.

VII. PERFORMANCE EVALUATION The system performance is mainly including the data transmission bandwidth and the computing power. The available computing power of whole system is decided by the performance of DPSs used in the computing daughter boards and the number of DSPs. It is assumed that the system mounts 5 daughter boards, each daughter board mounts two C6678, and the overall computing capacity is 5 × 2 × 320 GFLOPS = 3200 GFLOPS. Next, we will analyze the data transmission bandwidth of whole system in detail. Assuming that the switches are nonblocking, Gen2 SRIO runs at 5GHz with the chipsets commonly in use. A 4 lane connection, with the overhead of 8b10b encoding yields a raw rate of 4 × 5Gbps × 0.8 = 16Gbps. A. Expanded System Architecture Let us consider a system built from 5 such boards in an OpenVPX chassis with a backplane that conforms to the BKP6-DIS06-11.2.10-n profile. This supports 5 payload (i.e. processing) boards and 1 central switch boards, and yields a nominal interconnect diagram as shown below in Figure 6 for the SRIO case. SRIO

CPU 1

SRIO

CPU 2

SRIO

CPU 3

SRIO

CPU 4

SRIO

SRIO

Payload Switch

SRIO

SRIO

SRIO SRIO

Payload Switch

SRIO

SRIO

SRIO Central Switch

Payload Switch

SRIO SRIO SRIO

Payload Switch

SRIO Payload Switch

SRIO SRIO SRIO

CPU 9 CPU 10 CPU 7 CPU 8 CPU 5 CPU 6

Figure 5. SRIO system topology – simple model

B. System Bandwidth Calculations When evaluating network architectures, there are two dataflow models commonly considered: All-to-all Case and Pipeline Case 1) All-to-all Case All-to-all exchange of data is interesting as it represents a common problem encountered in embedded processing systems: a distributed corner turn of matrix data. This is a core function in synthetic aperture radars, for instance, where it is termed a corner-turn. It is commonly seen when the processing algorithm calls for a two (or higher) dimensional array to be subjected to a twodimensional (or higher) Fast Fourier Transform (FFT). In order to meet system time constraints, the transform is often distributed across many processor nodes. Between the row

Figure 4. Architecture of VPX system management software

The core of the software is made up of a Message Handler and a number of management functions modules. These management modules can be classified into two parts: Chassis Management and Board Management, wherein Chassis Management including SEL, PEF, FRU Discover & Control, Cooling Management and Power Supply Management; Board Management including SDR Management, FRU Information Management, FRU State Management, Sensor Management (FRU Health Monitor, Threshold Sensor Polling and Self & Payload Test). To facilitate debugging and management, an on-board CLI interface is designed via the CHMC serial port to achieve simple query and manipulate.

© N&N Global Technology 2014 DOI : 01.IJSPIE.2014.1.1

5

Full Paper

NNGT Int. J. on Signal Processing and Imaging Engineering, Vol. 1, July 2014

FFTs and the column FFTs the data must be exchanged between nodes. This requires an all-to-all exchange of data which can tax the available bandwidth of a system. A simple analysis of this topology might make the following assumptions – there are links between nodes on each board via the onboard switch, there are links to nodes on adjacent cards via links between the onboard switches, and there are 22 connections made via the central switches. In this approach, the overall performance for an all-to-all exchange might be assumed to be decided by the lowest aggregate bandwidth of these three connection types i.e. that of a single link divided by the number of connections. This equates to 4 lanes × 5Gbps × 0.8 encoding / 22 nodes = 0.73Gbps. 2)Pipeline Case Another dataflow model commonly considered is a pipeline, where data streams from node to node in a linear manner. For example, a task is composed of Task 1, Task 2, and Task 3, and data streams from Task 1 to Task 2, then to Task 3. When designing such a dataflow, it is normal to map the tasks and flow to the system in an optimal manner. This can include using different fabric connections for different parts of the flow. For simplicity, we assume that the input and output data sizes at each processing stage are the same. So the rate of the slowest link dictates the overall achievable performance. Generally, if Task 1 is mapped to CPU1, Task 2 to CPU2, and Task 3 to CPU3, for the SRIO system, the available paths between CPU1 and CPU2 and between CPU2 and CPU3 are as shown is Figures 7 and 8. In both cases, two separate links are available, so 32Gbps is available for both legs. CPU 1

SRIO

CPU 2

SRIO

CPU 3

SRIO

CPU 4

SRIO

SRIO

ACKNOWLEDGMENT This work is sponsored by the Chinese National Spark Plan, 2013GA780004, and by Guangdong Agency of Education grant, 20120202029. REFERENCES [1] [2] [3] [4] [5] [6]

[7]

[8]

Payload Switch

SRIO

SRIO

between modules and nodes. Double-star switching network structure ensures both high transmission bandwidth and high reliability. The software can dynamically change the route, and the data flow path that can greatly improve the design flexibility. The system will have many applications in voice processing, sonar, radar, communication, real-time images processing and other high-speed signal processing fields.

[9] Central Switch

[10] [11]

Payload Switch

SRIO

[12]

Figure 6. SRIO pipeline CPU1 to CPU2 CPU 1

SRIO

CPU 2

SRIO

CPU 3

SRIO

CPU 4

SRIO

SRIO

[14]

Payload Switch

SRIO

SRIO

[13]

[15]

Central Switch

[16] Payload Switch

SRIO

[17]

Figure 7. SRIO pipeline CPU2 to CPU3 [18]

Therefore the limiting bandwidths for the pipeline case are 32 Gbps.

[19] [20]

VIII. CONCLUSION

[21]

In this paper, a new high-performance processing system based on OpenVPX standard is designed. The system uses high-speed Serial RapidIO as high-speed data channel

© N&N Global Technology 2014 DOI : 01.IJSPIE.2014.1.1

6

Wang Dechun. Wideband phased array radar system analysis. Modern radar 2008, 30(3):1-6. ( In Chinese ). Li Qihu. Sonar singal processing introduction ( Second Edition ). Beijing: Beijing Ocean Press, 2003:12-15. ( In Chinese ). David Boulinguez. 3-D underwater object recognition. Oceanic Engineering, 2002,27(4):814-827. Mercury Computer Systems Inc. Technology Overview: VITA 46(VPX). Mercury Computer Systems Inc, 2007. American National Standard for VPX Baseline Standard. 46.0-2007 ANSI/VITA[S]. America: VITA, 2007: 18-48. American National Standard for Environments, Design and Construction, Safety, and Quality for Plug-In Units Standard. 472005 ANSI/VITA. America: VITA, 2007: 12-20. American National Standard for VITA Mechanical Specification for Microcomputers Using REDI Air Cooling 48.1-2007,. America: VITA,2007. American National Standard for VITA 48.2, Mechanical Specification for Microcomputers Using REDI Air Cooling Applied to VITA 46. America: VITA American National Standard for VITA 65, Open VPX System Specification Revision 1.08. America: VITA, 2010. Tyco Electronics Corporation. Application specification of MultiGig RT signal connectors. 2010. ELMA Your Solution Partner Overview VITA Standards & Technology. pdf /www.e lmachina. com. GE Fanuc Embedded Systems，Inc． VPX: VMEbus for the 21st Century. 2007． Mercury Computer Systems Inc.VITA 46.1 VMEbus Signal Mapping on VPX, 2007. Mercury Computer Systems Inc. VITA 46.3 Serial RapidIO™ on VPX, 2007. Mercury Computer Systems，Inc．VITA 46.4 PCI Express on VPX Fabric Connector, 2007. GE Intelligent. OpenVPX System Bandwidth: A Comparison of 10Gb Ethernet Performance, Serial Rapid IO, and InfiniBand. 2012. www.ge-ip.com. Creative Electronic System. VPX for High-Performance Avionic Computers. 2012.www.ces.com. ELMA BUSTRONIC 6U OpenVPX 6-Slot BKP6-DIS06-11.2.10-n Backplane, October 2011. www.elmabustronic.com Texas Instruments. KeyStone Architecture HyperLink User Guide USA:Texas, October 2012. Texas Instruments. Application Report: SerDes Implementation Guide for KeyStone I Devices. USA:Texas, October 2012. Texas Instruments. TMS320C6678 Multicore Fixed and FloatingPoint Digital Signal Processor Data Manual. USA: Texas, 2012.

Full Paper

NNGT Int. J. on Signal Processing and Imaging Engineering, Vol. 1, July 2014

[22] Texas Instruments. Application Report: Throughput Performance Guide for C66x KeyStone Devices. USA: Texas, July 2012. [23] Xilinx. Product Specification: Virtex-6 Family Overview. USA, 2012.

© N&N Global Technology 2014 DOI : 01.IJSPIE.2014.1.1

[24] Xilinx Inc. Virtex-6 FPGA GTX trasceivers. USA, 2010 UG366: 217-255. [25] Association R T. RapidIO™ Interconnect Specification. USA, 2008. [26] Intelligent Platform Management Interface Specification v1.5. America: Intel 、 Hewlett-Pachard 、 NEC 、 Dell, 2002: 40-68.

7