Technische Universität München
A Network Interface Card Architecture for I/O Virtualization in Embedded Systems Holm Rauchfuss Thomas Wild Andreas Herkersdorf
Institute for Integrated Systems
Theresienstr. 90 D-8290 Munich, Germany www.lis.ei.tum.de
Technische Universität München
Outline • • • •
Motivation State of the Art for I/O virtualization Specific requirements for embedded systems Proposed architecture – ES-VNIC – – – –
•
Concept overview Exemplary Rx packet processing Preliminary performance estimation Key components: Queue-Allocation and Management
Future work and summary
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
2
Technische Universität München
Motivation •
Virtualization is mainstream in High Performance Computing / Data Center – XEN, KVM, VMWare, Intel, AMD, … – I/O virtualization (IOV) is under research
•
Virtualization is emerging topic for embedded systems (ES) – Multiprocessor System-on-Chips (MPSoCs) – Consolidation of different, dynamic workloads on shared platform e.g., automotive head unit
Do current concepts for I/O virtualization fit for embedded systems? H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
3
Technische Universität München
State of the Art for I/O virtualization – Overview
SW solutions
Extensions in Network Interface Card (NIC)
Virtual Machine Monitor
Multi-queue network cards
Driver domain
Self-virtualized network cards (VNICs)
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
4
Technische Universität München
State of the Art – Virtual Machine Monitor and driver domain •
Hypervisor itself (VMWare) – Increased complexity and trusted computing base – Needs own drivers
•
Driver domain (XEN) – Driver domain in critical data path – Latency and complex scheduling – Needs dedicated Management Apps. & resources Central I/O
Apps.
Apps.
OS (domo) OS
OS (domU) OS
OS (domU) OS
OS (domU) OS
Backend Device Driver
Frontend Device driver
Frontend Device driver
Frontend Device driver
Control-IF
Secure HW-IF
Event-Channel
Virtual CPU
Virtual MMU
Xen Virtual Machine Monitor
[1]
Hardware: CPUs, Memory,(IO-)Devices (IO -)Devices
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
5
Technische Universität München
State of the Art – Multi-queue NICs and VNICs [2]
•
[3]
Multi-queue NIC (VMDq): – Fixed number of queue pairs – Relying on driver domain – Rx scheduled by packet arrival (head of line blocking), Tx as roundrobin System Bus NIC
VNICs (RiceNIC, SV-VNIC)
CPU
[4]
DMA-Mgmt. Header-Parsing Queueing Scheduling
NIC Internal / Instruction Memory
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
6
System Memory
P/C Lists
Signaling DMA
NIC-CPU
DMA
Management MAC Tx
– Embedded systems on their own (IXP2400, RiceNIC) – NIC-CPU/SW centric (RiceNIC) – More memory on NIC for each additional interface
CPU
Rx
•
Rx/Tx Rings
Packets
Technische Universität München
Specific requirements for a NIC regarding ES and IOV Extend (only) goal of maximum throughput • • •
Low latency Real-time processing (for certain domains) Differentiated service levels with signaling – Prioritization of packets and interfaces – Bandwidth guarantees
• •
Limited HW extensions on NIC for I/O virtualization and reasonable size compared to actual embedded system Offloaded I/O virtualization from VMM and domains i.e., spare CPU power/processing
New concepts/architectures are needed for I/O virtualization in embedded systems H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
7
Technische Universität München
ES-VNIC – Proposed Architecture System Bus NIC
Local Cache for Contexts, P/C Lists, Rx/Tx Queues CPU
CPU
System Memory
P/C Lists Rx/Tx Rings Packets
DMA
FSMs
Signaling
Queue-Alloc
Contexts
Scheduling
Header-Parsing
Rx MAC Tx
Management
NIC Buffer
• Pipelined, (re-)configurable and multithreaded FSMs for packet processing • System memory primary storage for configuration and data; cache on NIC Scalable and flexible resource sharing between interfaces H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
8
Technische Universität München System Bus
Header parsing and buffering
NIC
Local Cache for Contexts, P/C Lists, Rx/Tx Queues CPU
CPU
FSMs
System Memory DMA
Signaling
Queue-Alloc Queue-
Contexts
Scheduling
• •
Parsing of packet header to determine receiving domain (minimum: MAC, VLAN) Buffering packet on NIC as a whole Arbitrarily access to any packet to allow out of order processing i.e., for high-priority packets
Header-Parsing Parsing
•
Rx MAC Tx
Management
NIC Buffer
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
9
P/C Lists Rx/Tx Rings Packets
Technische Universität München System Bus
(Re-)Configuration via managment
NIC
Local Cache for Contexts, P/C Lists, Rx/Tx Queues CPU
CPU
• •
FSMs
System Memory DMA
Signaling
Queue-Alloc Queue-
Contexts
Scheduling
•
Context define interface (handling, priority, base addresses, …) Stored in system memory and cached for active interfaces on the ES-VNIC Pinning for critical interfaces Parallel handling of packets to decrease stalling
Header-Parsing Parsing
•
Rx MAC Tx
Management
NIC Buffer
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
10
P/C Lists Rx/Tx Rings Packets
Technische Universität München System Bus
Queue-Allocation and scheduling
NIC
Local Cache for Contexts, P/C Lists, Rx/Tx Queues CPU
CPU
FSMs
System Memory DMA
Signaling
Queue-Alloc Queue-
Contexts
Scheduling
•
DMA descriptors for packets stored in system memory and cached on ES-VNIC Transfer of packet scheduled based on active packets and priority (from context)
Header-Parsing Parsing
•
Rx MAC Tx
Management
NIC Buffer
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
11
P/C Lists Rx/Tx Rings Packets
Technische Universität München System Bus
DMA packet to system memory
NIC
Local Cache for Contexts, P/C Lists, Rx/Tx Queues CPU
CPU
FSMs
System Memory DMA
Signaling
Queue-Alloc Queue-
Contexts
Scheduling
Transfer of packet to system memory with DMA
Header-Parsing Parsing
•
Rx MAC Tx
Management
NIC Buffer
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
12
P/C Lists Rx/Tx Rings Packets
Technische Universität München
Preliminary performance estimation System Bus
System Bus NIC CPU
NIC Internal / Instruction Memory
•
•
CPU
FSMs
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
System Memory
Contexts
DMA
Packets
CPU Management
Signaling
Queueing Scheduling
Rx/Tx Rings
Firmware on NIC-CPU with sequential trail of tasks Potential bottleneck Data cache (re-)loading and instruction fetching not optimal for packet processing related tasks FSMs better suited Pipelined architecture with serveral stages for ES-VNIC Same throughput with lower frequency
Queue-Alloc Alloc
Signaling
Header-Parsing
•
Local Cache for Contexts, P/C Lists, Rx/Tx Queues
Scheduling
P/C Lists
DMA
NIC-CPU NIC
DMA
DMA-Mgmt. System Memory
Rx
MAC Tx
Management
Rx MAC Tx
CPU
Header-Parsing Parsing
NIC
NIC Buffer
13
P/C Lists Rx/Tx Rings Packets
Technische Universität München
Preliminary performance estimation (cont.)
TNIC Buffer
THeader − Parsing
TScheduling
TManagement
TQueue− Alloc TDMA
TDelayRX = max(TNIC Buffer,THeader − Parsing ) + TManagement + max(TScheduling,TQueue− Alloc ) + TDMA
TDeltaRX = max(max(TNIC Buffer,THeader − Parsing ), TManagement , max(TScheduling , TQueue− Alloc ), TDMA )
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
14
Technische Universität München
Preliminary performance estimation (cont.) TDeltaRX = max(max(TNIC Buffer,THeader − Parsing ), TManagement , max(TScheduling , TQueue− Alloc ), TDMA ) Worst case: 64 Bytes packets back-to-back for 1Gb Ethernet
• •
THeader-Parsing : Depend on header size TNIC Buffer: Depend on packet size, dominant regarding THeader-Parsing
•
TManagement : Depend on cache hit and system bus/memory (only few cycles if cached)
•
TQueue-Alloc : Depend on cache hit and system bus/memory (only few cycles if cached) TScheduling : Only a few cycles
• •
(64 + 20) ∗ 8bit = 672 nanoseconds 1Gbit/s 672 nanoseconds = 84 cycles! 125MHz
TDMA : Depends on packet size and system bus/memory
ES-VNIC scales with system (bus/memory) Pinning/Prefetch for real-time interfaces TQueue-Alloc and Tmanagement are the most critical elements
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
15
Technische Universität München System Bus
ES-VNIC – Queue-Allocation
A
System Memory
[n]
FSMs
DMA
From P/C Lists Assignable Queues
D
Signaling
C
Contexts Queue-Alloc Queue-
[m]
[o] NIC Buffer
To Scheduling
• • • •
CPU
Management
Scheduling
NIC
B
CPU
Tx Rings
Rx MAC Tx
A
Local Cache for Contexts, P/C Lists, Rx/Tx Queues
Header-Parsing Parsing
Rx Rings
System Memory
NIC
Rx/Tx Rings with DMA descriptors are held in system memory Assignable queues provide cache for DMA descriptors of active interfaces Sharing of queues between and reserving for real-time interfaces Uneven number of Rx/Tx interfaces possible for broadcast and service differentiation
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
16
P/C Lists Rx/Tx Rings Packets
Technische Universität München System Bus
ES-VNIC – Management
NIC
Local Cache for Contexts, P/C Lists, Rx/Tx Queues
Contexts
System Memory
CPU
Local Cache
To Queue-Alloc
FSMs
System Memory DMA
A
Signaling
To P/C Lists
Queue-Alloc Queue-
NIC
Contexts
Scheduling
[m+n]
Z
Rx MAC Tx
…
B
Header-Parsing Parsing
Management
A
X X X
A
Multithreaded [w] FSMs
To Scheduling
NIC Buffer
[v]
From Header Parsing
• • • •
CPU
Contexts are held in system memory Fetched/cached on NIC for active interfaces Pinning for real-time interfaces Multithreaded FSM at the core
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
17
P/C Lists Rx/Tx Rings Packets
Technische Universität München
Multithreaded, (re-)configurable FSMs •
Fast switching for handling different packets/interfaces
• •
Based on memory-based FSMs Extended for changing by context – Default behavior – Adding/removing states and transitions e.g., polling instead of signaling
•
InputRegisters Registers Input Input Registers
Contexts
Memory-based FSM
Multiple sets of input/output registers for holding information even when context has been removed InputRegisters Registers Input Output Registers
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
18
Technische Universität München
Future work and summary •
ES-VNIC addresses I/O virtualization requirements for embedded systems – Work in progress
•
Validate architecture by simulation of key components – Scenarios with high dynamic (contexts, DMA descriptors, …) – Dimensioning of cache size, packet buffers, queues – Number of multithreaded FSMs and their functional verification
•
Implement as part of an MPSoC demonstrator in an FPGA
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
19
Technische Universität München
Discussion
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
20
Technische Universität München
References for pictures [1] M. Malhalingam, I/O Virtualization (IOV) for Dummies, VMWorld 2007 [2] Intel Virtualization Technology for Connectivity, IDF 2008 [3] H. Raj, K. Schwan (2007) High Performance and Scalable I/O Virtualization via Self-Virtualized Devices, HPDC 2007 [4] J. Shafer and S. Rixner, "A Reconfigurable and Programmable Gigabit Ethernet Network Interface Card", Rice University Electrical and Computer Engineering Technical Report TREE0611
H. Rauchfuss – A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
21