Performance of Packet Capturing Systems Hardware Selection for Monitoring Fabian Schneider
[email protected] Technische Universtit¨ at Berlin Deutsche Telekom Laboratories
11.12.2006
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
1 / 34
Introduction
Motivation
Motivation
• high speed networks → high data and packet rate • network security tools need to capture this traffic • 2 Choices: • expensive special hardware • cheap commodity systems
⇒ Is it feasible to capture the traffic with commodity hardware?
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
2 / 34
Introduction
Outline
Outline
1 2
3
Monitoring 10 Gigabit Measurement Setup Systems under Test Topology Procedure Profiling Workload Workload Generation Packet Size Distribution Output
Fabian Schneider (TU Berlin/DT Labs)
4
5
Results Using multiple processors? Increasing the buffer size Additional filtering Additional copy operations mmaped pcap Linux write to disk Further Results Conclusion Summary Future Work Resources
Performance of Packet Capturing Systems
11.12.2006
3 / 34
Monitoring 10 Gigabit
Monitoring 10 Gigabit
• monitoring 10 Gigabit of traffic needs app. 2500 MBytes/s (both
directions) • no recent bus or disk system can handle this! • need to split up traffic: • use a switch: e.g. link bundling feature (Cisco: Etherchannel) • use specialized hardware
• But be careful: do not split up data that belongs together
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
4 / 34
Measurement Setup
1
Monitoring 10 Gigabit
2
Measurement Setup Systems under Test Topology Procedure Profiling
3
Workload
4
Results
5
Conclusion
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
5 / 34
Measurement Setup
Systems under Test
Systems under Test
Opterons: 2x AMD Opteron 244 (1 MB Cache, 1.8 GHz), 2 GB RAM, Intel 82544EI optical GigE, Disk System: ATA-RAID on 3ware 7000 Controller Xeons: 2x Intel Xeon (512 kB Cache, 3.06 GHz), 2 GB RAM, Intel 82544EI optical GigE, Disk System: ATA-RAID on 3ware 7000 Controller Dual-Core Opterons: 2x2 AMD Opteron 270 (1 MB Cache, 2.0 GHz), 2 GB RAM, Intel 82544EI optical GigE, Disk System: SCSI-RAID on Compaq Smart Array 64xx & external RAID (easyRAID, SATA based) attached via SCSI. 2 examples of any of the systems: one installed with Linux and the other with FreeBSD Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
6 / 34
Measurement Setup
Topology
Topology eth0
gen eth2
SNMP Interface Couter Queries
eth1 Cisco C3500XL Workload ->
Splitter
swan
moorhen
flamingo
snipe
Control Network Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
7 / 34
Measurement Setup
Procedure
Procedure Measurement categories: • Capturing Rate • System Load
Measurement Sequence: 1
Login to the four sniffers → Start the capturing and profiling applications. (Save process ID’s)
2
Login to gen → Read SNMP packet counters of the switch.
3
Login to gen → Start packet generation.
4
Login to gen → Read SNMP packet counters of the switch.
5
Login to the four sniffers → Stop the applications (via saved process ID’s).
Measurement Specification
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
8 / 34
Measurement Setup
Profiling
Profiling
Goal: record CPU usage while capturing • based of the mechanisms used by top • CPU accounting information (user, system, idle, interrupt, . . . )
written twice per second to file • additional minimum/maximum/average identification • ”under load” condition and resulting averages identified by awk script.
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
9 / 34
Workload
1
Monitoring 10 Gigabit
2
Measurement Setup
3
Workload Workload Generation Packet Size Distribution Output
4
Results
5
Conclusion
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
10 / 34
Workload
Workload Generation
Workload Generation
• Requirements:
Speed: Line Speed (1 Gbit/s) is desired Reproducibility: of the load and to avoid unrepeatable failures Realness: at least packet sizes should match • Checked different existing tools → none was sufficient • Best: Linux Kernel Packet Generator can only generate packets of
fixed size ⇒ Necessity to add generation of different packet sizes → Identify distributions.
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
11 / 34
Workload
Packet Size Distribution
Observed Packet Size Distribution 109
40 52
75 % of all packets in the 13 most frequent sizes!
1500
108
106
105
100 cumulated percentage percentage of packets of size
95
104
90 103
85 80
102
75 100
200
300
400
500
600
70
700 800 900 1000 1100 1200 1300 1400 1500 packet size
65 60
Only few frequent sizes Implementation
55 50 45 40 35 30 25 20 15 10 5 rest
1460
1470
57
1454
44
1452
1480
1440
60
1400
64
576
1300
48
1492
552
52
1420
0 40
0
1500
101
number of packet of size (24h trace)
percentage
number of packets per size
107
packets of size (sorted by percentage descending)
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
12 / 34
Workload
Output
Output: Packet Size Distribution 109
originally captured generated
108
number of packets
107
106
105
104
103
102
101
107
packets classified by size (sorted descending by quantity of packets)
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
13 / 34
Workload
Output
Output: Data and Packet Rate Generator: median generation rate (with min/max errors)
700
875
600
750
500
625
400
500
300
375
200
250
100
125
kpps
1000
0
Mbit/s
packet rate (kpps) data rate (Mbit/s)
800
0 min pktsize (40 bytes)
Fabian Schneider (TU Berlin/DT Labs)
distribution
max pktsize (1500 bytes)
Performance of Packet Capturing Systems
11.12.2006
14 / 34
Results
1
Monitoring 10 Gigabit
2
Measurement Setup
3
Workload
4
Results Using multiple processors? Increasing the buffer size Additional filtering Additional copy operations mmaped pcap Linux write to disk Further Results
5
Conclusion Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
15 / 34
(32) no-improvement: no SMP, no HT, 1 app, traffic: generated, no filter, no load
Capturing Rate [%]
Only one processor
Using multiple processors?
Linux/AMD - swan Linux/Intel - snipe FreeBSD/AMD - moorhen FreeBSD/Intel - flamingo Capturing Rate [%] CPU usage [%]
100 90 80 70 60 50 40 30 20 10 0
100 90 80 70 60 50 40 30 20 10 0
CPU usage [%]
Results
50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 Data Rate [Mbit/s] Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
16 / 34
(31) no-improvement: SMP, no HT, 1 app, traffic: generated, no filter, no load
Capturing Rate [%]
Multiprocessor (SMP)
Using multiple processors?
Linux/AMD - swan Linux/Intel - snipe FreeBSD/AMD - moorhen FreeBSD/Intel - flamingo Capturing Rate [%] CPU usage [%]
100 90 80 70 60 50 40 30 20 10 0
100 90 80 70 60 50 40 30 20 10 0
CPU usage [%]
Results
50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 Data Rate [Mbit/s] Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
17 / 34
(17) increased-buffers: SMP, no HT, 1 app, traffic: generated, no filter, no load
Capturing Rate [%]
increased buffers
Increasing the buffer size
Linux/AMD - swan Linux/Intel - snipe FreeBSD/AMD - moorhen FreeBSD/Intel - flamingo Capturing Rate [%] CPU usage [%]
100 90 80 70 60 50 40 30 20 10 0
100 90 80 70 60 50 40 30 20 10 0
CPU usage [%]
Results
50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 Data Rate [Mbit/s] Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
18 / 34
Results
(21) filter: SMP, no HT, 1 app, traffic: generated, 50 BPF instr., no load
Additional filtering
Linux/AMD - swan Linux/Intel - snipe FreeBSD/AMD - moorhen FreeBSD/Intel - flamingo Capturing Rate [%] CPU usage [%]
100 90 80 70 60 50 40 30 20 10 0
100 90 80 70 60 50 40 30 20 10 0
CPU usage [%]
Capturing Rate [%]
additional filtering (BPF/LSF)
50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 Datarate [Mbit/s] Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
19 / 34
(27) memcpy-50: SMP, no HT, 1 app, traffic: generated, no filter, no load
Capturing Rate [%]
50 additional copy ops
Additional copy operations
Linux/AMD - swan Linux/Intel - snipe FreeBSD/AMD - moorhen FreeBSD/Intel - flamingo Capturing Rate [%] CPU usage [%]
100 90 80 70 60 50 40 30 20 10 0
100 90 80 70 60 50 40 30 20 10 0
CPU usage [%]
Results
50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 Data Rate [Mbit/s] Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
20 / 34
mmaped pcap Linux
(19) mmaped pacp: SMP, no HT, 1 app, traffic: generated, no filter, no load
Capturing Rate [%]
mmap Patch (only Linux)
Linux/AMD mmap - swan Linux/AMD - swan alt Linux/Intel mmap - snipe Linux/Intel - snipe alt Capturing Rate [%] CPU usage [%]
100 90 80 70 60 50 40 30 20 10 0
100 90 80 70 60 50 40 30 20 10 0
CPU usage [%]
Results
50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 Datarate [Mbit/s] Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
21 / 34
write to disk
(2-8) write to disk: SMP, 1 app, traffic: generated, no filter, no load
Capturing Rate [%]
Dual Core: writing to disk
32bit FreeBSD/Opteron 64bit FreeBSD/Opteron 32bit Linux/Opteron 64bit Linux/Opteron Capturing Rate [%] CPU usage [%]
100 90 80 70 60 50 40 30 20 10 0
100 90 80 70 60 50 40 30 20 10 0
CPU usage [%]
Results
50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 Datarate [Mbit/s] Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
22 / 34
Results
Further Results
Further Results
• running multiple capturing applications concurrently leads to bad
performance. • Measurement with additional compression show some advantage for
Intel Systems • Intel Hyperthreading does not change the performance • FreeBSD 5.4 performs better than FreeBSD 5.2.1 (no comparable
measurements for FreeBSD 6 at the moment). • Using 4 processors (2x Dual Core) is minimal better than 2 Processors
(Dual Core)
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
23 / 34
Conclusion
1
Monitoring 10 Gigabit
2
Measurement Setup
3
Workload
4
Results
5
Conclusion Summary Future Work Resources
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
24 / 34
Conclusion
Summary
Summary
• FreeBSD/AMD Opteron combination in general performs best • choosing the right buffer size is important • filtering is cheap with respect to its benefit • using the memory-map patch from Phil Woods does help • 64bit systems drop more packets • capturing full traces to disk is feasible up to about 600 Mbit
bandwidth.
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
25 / 34
Conclusion
Future Work
Future Work
• 10 Gigabit Ethernet • future operating system versions / direct comparison of different
versions on the same machine (e.g.: FreeBSD 4.x 5.x 6.x) • New Intel I/O Acceleration Technology • implement a mmaped packet reception for FreeBSD • (Windows platform)
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
26 / 34
Conclusion
End
Questions?
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
27 / 34
Conclusion
End
Thanks for the attention!
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
28 / 34
Conclusion
Resources
Software Profiling • cpusage: Available at http://www.net.in.tum.de/~schneifa/sources/cpusage-0.2.tar.gz ,
• trimusage.awk Script: http://www.net.in.tum.de/~schneifa/sources/trimusage.awk
Capturing • createDist: Available at http://www.net.in.tum.de/~schneifa/sources/createDist-0.1.tar.gz ,
• tcpdump: Available at www.tcpdump.org
Workload • patched LKPG: Available at http://www.net.in.tum.de/~schneifa/pktgen-lkpg-dist-0.1.tar.gz Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
29 / 34
Conclusion
Resources
Further Reading
F. Schneider. Best Packet Capture System http://www.net.t-labs.tu-berlin.de/research/bpcs/ F. Schneider. Performance evaluation of packet capturing systems for high-speed networks. Diplomarbeit, http: // www. net. in. tum. de/ ~schneifa/ papers/ da. ps
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
30 / 34
Measurement Setup
Measurement Specification
Measurement Specification
• seven similar measurements → to avoid errors • a million packets per run • 26 different inter-packet gaps per measurement
→ increasing data and packet rate • average of different runs with errorbars for min and max values • no filter to capture all the packets Return
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
31 / 34
Workload
Implementation
Workload – Implementation
Return Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
32 / 34
OS insights
FreeBSD
Packet Reception in FreeBSD
• interrupt context • double buffer as
interface to userspace • one buffer pair per
capturing session • 3 packet copy operations
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
33 / 34
OS insights
Linux
Packet Reception in Linux
• soft-interrupts used • central memory block for
all packets handled in kernel • pointer queue as
interface to userspace • 2 packet copy operations
Fabian Schneider (TU Berlin/DT Labs)
Performance of Packet Capturing Systems
11.12.2006
34 / 34