Maximizing Application Acceleration with FPGAs

Maximizing Application Acceleration
 with FPGAs Shreyas Shah Xilinx, Inc Santa Clara, CA Agenda ▪ ▪ ▪ ▪ Market background and Data Center trends C...
0 downloads 4 Views 2MB Size
Maximizing Application Acceleration
 with FPGAs Shreyas Shah Xilinx, Inc

Santa Clara, CA

Agenda ▪ ▪ ▪ ▪

Market background and Data Center trends Change in compute architectures after 75 years! Workload specific acceleration: Xilinx FPGA Increased Ethernet Network Performance : Xilinx FPGA ▪ Summary Santa Clara, CA

Markets and Data Center Trends : ▪

Software Defined Data Center



Physical Evolving Standards :



Need for Workload specific Acceleration : mentioned publicly by companies like Baidu, Microsoft, JPMorgan and others …

Santa Clara, CA

Virtual

Cloud

Exponential Growth: Servers, Storage, Network Servers Storage

Network

Acceleration

Source: ONS2014 Keynote, Microsoft / Azure

Santa Clara, CA

Big Data : Big Impact on DC infrastructure Video Analytics, Speech to text, Targeted advertisements, OCR NVDIMM, SSD, In memory DB, Acceleration in SSD Key value store, DNN

Santa Clara, CA

Acceleration in Appliances NIC card acceleration Storage HBA acceleration ToR switch acceleration Core Network acceleration

Acceleration in NAS Servers Flash as a cache : FileIO Acceleration in Flash based SSD Caching, De-dup, Comp/De-comp Big data analytics

Evolving Architectures Tr

▪ Power/thermal density is limiting Fmax scaling

max

• End of Dennard scaling ⇨ End of Moore’s law

▪ CPU performance scaling problematic

F

P

Source : Intel, Wikipedia

• Difficulties in exploiting task-level parallelism with multicore ⇨ Dark silicon

▪ Heterogeneous computing⇨

Best of both worlds

• Higher performance and lower power • Increased compute density

Santa Clara, CA

Source : ISCA2011 CPU / GPU

Source : Xilinx

Xi li FP nx GA

Application Acceleration in Data Center
 Cloud computing and Big data analytics


▪ ▪

TCO optimized processor architectures are emerging Clusters of workload specific computing •

HPC in cloud – –



Big data analytics, Event stream processing, Data mining – –



Personalized medicine Oil and Gas exploration Data base acceleration (In-memory data bases) Personalized Advertisements

Other applications include – – –

Video analytics & Image processing Ticker symbol processing Machine learning and Analytics – –

▪ ▪

IoT C-RAN

Santa Clara, CA

Image recognition, speech recognition Neural network and deep learning

Traditional & Emerging Computer Architectures ▪ Computer Architectures • • • • • •

Main Processor Bridge chip (South, North, IO bridges) IO Slots for Graphics IO controllers DRAM Memory DIMMs Hard disk

▪ Emerging Architectures evolution •

Main Processor: SoC – Processor w Integrated Bridge chips – IO controllers – Graphics Processing units



Memory – Processors in Memories – Memory appliances w large amount of DRAM – DRAM Memory module w Flash



Flash replaces hard disk

Application Acceleration with FPGA : Hottest New Trend

Santa Clara, CA

Workload specific acceleration : Xilinx FPGA

Processor PCIe

Mem

PCIe

FPGA

PCIe IO

IO

IO Cntl r

▪ Main Processor w FPGA : Two models emerging : • •

Inline Model (Inline acceleration, Pre-processing) Co-processor Model

Santa Clara, CA

Mem

Processor

FPGA

PCIe

9

Application Acceleration : Xilinx FPGA Processor PCIe

DM FPGA A PCIe

Mem

Mem

▪ Co-processor Model •

IO Bus based: – DMA based programming model

Santa Clara, CA

Processor

Mem

CCI

Mem

FPGA

CCI CCI : Cache Coherent Interfac

▪ Co-processor Model •

Cache coherent Interface: – Load/Store programming model

Network Acceleration : Xilinx FPGA

PCIe

FPGA

NIC FPG ASS A P PCIe

PCIe



Inline Model (Inline FPGA w Network acceleration, Pre-processing)

Santa Clara, CA

NIC ASSP PCIe Ethernet IO

Ethernet IO

▪ ASSP w FPGA :

Mem

Processor

Mem

Processor

▪ ASSP w FPGA : •

Co-processor Model : ASSP w FPGA on a side interface

Xilinx Chips Used in Data Centers COMPUTE Graph processing 10-100x Perf/W

String/Pattern matching 10-20x Perf/W

Image/Signal processing 50x throughput

DNN

STORAGE Hybrid memory Latency hiding 10x power saving

Key-Value Stores 36x RPS/Watt 10x-100x latency reduction

Compression/Encryption Customize algorithms Latency sub 5us Encryption rate 10x

NETWORKING Secure socket Latency sub 5us Encryption rate 10x

Santa Clara, CA

TCP endpoint Latency sub 2us 10x virtual circuits

Packet switch Latency sub 100ns Protocol choices

Source : Xilinx

Application acceleration w Xilinx FPGA ▪ FPGA value proposition : • •

High speed IO and serdes (33 Gbps) High speed memory connectivity – – – –

• •

DRAM, QDR SRAM, RL3 Graphics memory HBM HMC, Mosys/GSI BE

Large amount of on chip memory Flash interfaces with error correction – ONFi, Toggle, eMMC, SAS, SATA



PCIe IO Bus : G1/G2/G3/G4 and future G4Overclocked

Santa Clara, CA

▪ FPGA value proposition (con’t) • • • • •

Ethernet Connectivity: 100 Mbps to 400Gbps Interlaken: 150 Gbps – 600 Gbps Processor blocks for optimized applications Large pool of DSPs Support for higher level abstractions – C/C++/OpenCL

• Application library components to serve acceleration market • Variety of other protocol support

13

Software Defined Development Environments ✓ SDAccel for OpenCL, C, C++ enables up to 25x

better performance per watt

✓ Provides C/C++/OpenCL

programming to bit files

✓ SDSoC : ASSP-like programming experience ✓ SDNet allows creation of ‘Softly’ Defined

Networks

✓ Higher level language to define the “Fields”

of interest to perform Packet processing Tasks

Expand Users to Broad Community of Software and Systems Engineers

Summary ▪ Processors hitting the wall : Increase performance at reasonable power • Stuck between 2-3 GHz for more than a decade

▪ Specific applications require specialized hardware blocks to be optimal: •

Power, performance and scalability -- > Xilinx FPGA is the answer

▪ TCO (OPEX) is the main focus of Data Center for profitability • TCO optimized architectures: – CPU + FPGA on PCIe/Cache coherent bus will drive the application acceleration

• NIC ASSP + FPGA architectures evolving – Addressing the performance challenge servers and Ethernet network connectivity

• Ethernet switch ASSP + FPGA evolving to solve network performance Santa Clara, CA

Thank for your attention!

Santa Clara, CA