Syslog Connector performance tuning

Syslog Connector performance tuning Girish Mantry, Moehadi Liang Technical Solutions Consultants © Copyright 2013 Hewlett-Packard Development Company,...
Author: Rodney Dawson
31 downloads 1 Views 486KB Size
Syslog Connector performance tuning Girish Mantry, Moehadi Liang Technical Solutions Consultants © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Agenda In this session we will take a look at • • • • • • •

2

Syslog connector variants Connector components and operation Stages in the event flow Performance bottlenecks and tuning at each stage Out of memory problems and tuning Customer cases General recommendations

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning

Syslog Connector variants, components, operation and event flow

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Syslog Connector variants Network listeners

Syslog Daemon UDP Raw TCP Default port 514

File readers

Syslog NG Daemon

ArcSight CEF Encrypted Syslog (UDP)

UDP Raw TCP TLS Default port 1999

UDP Symmetric Key Encryption Default port 514 Only CEF format

• Supported on all platforms • Configurable interfaces and ports 4

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Pipe

Syslog File

Unix Pipe

Regular File

• Supported only on unix platforms • Work in conjunction with the native Syslog Daemon

Syslog Connector performance tuning Syslog Connector components

Destination flow

Device type 1 Device type 2

C2

ESM Transport

Main flow

Queue

Subagent

Raw events

Device type N

C1

Subagent

C1 Parsed events

Subagent

Processed events

C2

Cache

Destination flow C1

C2

Logger Transport

Cache

Note: Queuing only applies to network listeners and not for file readers 5

ESM

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Logger

Syslog Connector performance tuning Event flow Event reception

Event queuing

Event parsing

Event processing

Event transport

• Receives network packets on UDP/TCP sockets

• Raw events are written to a queue of files on the file system in the order in which they are received

• Raw events are picked up from the file queue in a FIFO manner and parsed using regular expressions

• Normalized events are categorized and processed in many ways useful for correlation and asset modeling

• Enriched Arcsight events are sent to ESM/Logger destination

• Information from device log formats normalized into Arcsight event format

• Events are batched, filtered or aggregated as required for efficiency

• Extracts human readable syslog raw events from network packets

6

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

• Events cached when destinations are down and resent when they are back up

Syslog Connector performance tuning

Performance bottlenecks in the event flow and tuning

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Event reception Choice of transport protocol –

UDP performs better on reliable networks



Use Raw TCP on unreliable networks



Use TLS for encrypted transport with Syslog NG

Bottleneck (when dealing Raw TCP or TLS) –

Java applications do not know when a client closes the connection with a FIN



Connections remain idle in a CLOSE_WAIT state until closed explicitly by the application



Idle connections can grow over a period of time and can exceed the connector limit or OS limit



Happens faster with large number of devices or with devices that create new connections frequently

Tuning

8

Parameter

Default

Recommendation

tcppeerclosedchecktimeout

-1

Set it to 30000 msec or higher to tell the connector to check for connections closed by peer proactively and close them on the connector side as well

tcpmaxsockets

1000

Increase it higher as required to accommodate simultaneous connections from a large number of devices

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Event queuing Raw events received over the network are written to a file queue consisting of a certain number of files of fixed size

Bottleneck –

With high event volumes, file queue can build up faster leading to significant delays



When file queue becomes full, connector starts dropping events

Tuning

9



Enable syslog parser multithreading (may need to follow up with memory increase if required)



Increase the file queue size

Parameter

Default

Recommendation

syslog.parser.multithreading.enabled

false

Set it to ‘true’ to enable multithreading

syslog.parser.threadcount

-1

Set it to a specific number on a single processor machine. You can do the same on a multiprocessor machine or leave it for connector to decide based on the number of processors

syslog.parser.threadsperprocessor

1

Takes effect only when the threadcount is set to -1. Leave it at 1 or increase it as required. Total number of threads = number of processors * threadsperprocessor

filequeuemaxfilecount

100

Increase this parameter to increase the number of files in the file queue

filequeuemaxfilesize

100000

Specified in bytes. Increase this parameter to increase the size of each file in the file queue

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Event parsing Inspection and device type detection –

Multiple subagents with one subagent per device type with a parser that has a regex to match something unique in the log



Subagent parsers are ordered such that specific regexes come ahead of generic ones to detect device types accurately



Connector inspects messages from senders applying regexes in the order to detect the device type and associates the subagent with the sender when a match is found. A single sender could be associated with multiple device types and subagents



Associated subagent parsers are used to parse messages from a sender and inspection process is not reapplied unless a message from a new device type is encountered from the same sender



Syslog senders and their associated subagent types can be seen in current/user/agent/syslog.properties

Bottleneck –

Inspection process involving regex matching could be expensive because connector has more than 100 subagents

Tuning –

10

If you are sure of device types in your environment, you can restrict the subagent list by following properties

Parameter

Default

Recommendation

usecustomsubagentlist

false

Set it to true to make the connector consider the customized subagent list

customsubagentlist

List of subagents (>100)

Set it to the restricted subagent list based on device types in your environment. Preserve the original relative order of subagents not to affect the accuracy of subagent detection

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Event parsing – continued • Regular expressions in parsers –

Bottleneck A badly written regular expression in the parser can be a big performance hit on the connector



Optimization For supported device types, development went through optimizing the regular expressions in the respective parsers. If you are authoring your own syslog flex connector parsers, consider the following guidelines • Make your regexes generic only as much as needed. Specific regular expressions perform better than generic ones • Use generic greedy expressions like .* and .+ at the end and not in the beginning or middle of a regular expression. Replace them with nongreedy equivalents like .*? And .+? with a clear character or token marking the boundary • Use of greedy expressions with more specific characters or meta characters is okay, ex:- \s+ for a continuous string of whitespace characters or \d+ for a continuous string of numerals or \w+ for a continuous string alpha numerals

• Maximum number of devices –

Bottleneck Connector allows up to a max of 5000 devices and does not process events from newer devices once this limit is reached



Tuning

Parameter

Default

Recommendation

syslog.max.device.count

5000

Increase it as required to match the number of devices in your environment

11

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Event processing • Agent batching –

Batch size controls how many events go together from component to component in the event flow and eventually to the destination



Doubling or tripling default size of 100 could help improve the performance internally as well as over networks with latency



Do not increase beyond that because it could have a negative impact by increasing memory requirements to hold the batches

• Categorization –

Categorization files for different device types are loaded into memory and some of those can be big



Connector base memory usage can be high when dealing with a large number of device types



Java heap space may need to be bumped up

• External map file processing –

External map file query is executed for every batch of events



Make sure the query is simple and returns fast, if you are using this feature

• Connector filtering –

12

Make sure that the filter condition is optimized and not extremely complex

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Event processing - continued • Field-based aggregation –

Groups events with same values in specified fields into buckets and produces aggregated events on time interval expiry or reaching event threshold



Restrict the field set to minimum required and choose an optimal event threshold value to keep the number of event field comparisons low



Choose an optimal time interval not to block the event flow for too long



Avoid using ‘preserve common fields’ setting in a high event volume environment

• Name resolution

13



Name resolutions are done in background threads and the event flow is not normally blocked for the answers to come back



If the ‘Wait For Name Resolution’ feature is enabled, then the event flow is blocked for a certain timeout period for the answers to come back



Do not enable ‘Wait For Name Resolution’ feature in an environment requiring frequent resolutions

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Event transport Event caching can occur for a number of reasons - network latency and problems in the destination are the common reasons Bottleneck –

Excessive caching can cause delays in events reaching the destination



When cache becomes full, connector starts dropping events

Tuning

14



Enable transport multithreading only when network latency is the cause of caching



Do not enable multithreading for other reasons, as it can make caching even worse



Increase the cache size to hold events for longer in the cache and prevent loss of events

Parameter

Default

Recommendation

http.transport.threadcount

1

Applies only to the ESM transport. Increase it by small increments as required.

transport.loggersecure.threads

1

Cache Size

1GB

Applies only to the logger secure transport. Increase it by small increments as required. Increase it as required up to a limit of 50GB. This is a destination setting which can be configured using ESM console, connector appliance GUI or local connector setup wizard.

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning

Out of memory problems and tuning

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Java process memory and management Memory allocated to a java process consists of heap space and native memory

Process memory

• Connectors ship with 32 bit JRE even on 64 bit systems

Young generation

• Memory size of a 32 bit process

Newly created objects



Total addressable space is 4GB, Kernel space ranges from 1GB to 2GB depending on OS



User space available for process is 2GB to 3GB depending on OS

• Heap space is allocated as instructed by java run time parameters –

-Xms (Initial heap size), -Xmx (Maximum heap size), 256 MB by default on connectors



Due to several factors including those related to OS, limits exist on max heap space • 1GB (connector appliances), 1.5 GB (Windows), 2 GB (Unix)

• Native memory size = process memory size – size of heap space • Garbage collection reclaims the memory of unused objects

Old objects surviving minor GCs

Classes, methods, etc Code Generation Socket Buffers

Minor collections (GC), reclaims memory in young generation and moves survivors into old

Thread Stacks



Major collections (Full GC), reclaims memory in all of the heap space, takes much longer

Direct Memory Space



JVM stops the application threads during GC or Full GC

JNI Code



Frequent Full GCs affects application performance severely

Garbage Collection

• A clear indicator for the need to increase the maximum heap size

JNI Allocated memory

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Heap space

Permanent generation



• Out of memory errors can happen in any of these memory areas 16

Old generation

Native memory

Syslog Connector performance tuning Dealing with Java out-of-memory errors Errors java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Requested array size exceeds VM limit java.lang.OutOfMemoryError: PermGen space

java.lang.OutOfMemoryError: Unable to create a new native thread Out of Memory Error (allocation.cpp:211), pid=16950, tid=1855142800 17

Root cause and recommendation Garbage collection is unable to free up more space and memory could not be allocated for new objects • Increase the maximum heap size using -Xmx option in increments as required up to the limit • If this still does not help, there could be a potential memory leak or a bug – open a support incident supplying the logs and heap dumps Permanent generation area has become full due to loading many classes statically or creating dynamic classes or creating too many interned strings • Default max size of PermGen space is 64 MB. Increase it in small increments using -XX:MaxPermSize option JVM is low on native memory and unable to create a new VM thread. Make more native memory available by • Reducing the heap space using –Xms and –Xmx options • Reducing the stack space of using –Xss option

Displayed in the fatal error logs when the JVM crashes due to a malloc failure. The system could be out of physical RAM or swap space or the process size limit was hit on a 32 bit system. Take one or more of the following actions • Reduce memory load on the system or increase physical memory or swap space • Decrease the number of application threads, reduce the java heap space and stack space

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Adjusting memory options On a software connector, add or edit settings in a file under current/user/agent folder • agent.wrapper.conf when running as a service • setmem.sh (Unix) or setmem.bat (Windows) when running as a standalone application. This file may have to be created if it does not already exist. set ARCSIGHT_MEMORY_OPTIONS="-Xms256m -Xmx256m“ (Example only. Add or remove options as required inside the double quotes) export ARCSIGHT_MEMORY_OPTIONS (only on Unix)

On a connector appliance • Only heap space can be changed using a container command ‘Configure Memory Settings’ • Other settings can be changed using SSH or diagnostic tools file editor using the same mechanisms as for a software connector Memory type Heap space

Perm gen space

Stack space 18

Running as service

Running as a standalone application

wrapper.java.initmemory=256 (initial heap size) wrapper.java.maxmemory=256 (maximum heap size)

-Xms256m –Xmx1024m” It is recommended to increase only the max heap size

Add additional java parameters with adjusted indexes wrapper.java.additional.7=-XX:PermSize=64m wrapper.java.additional.8=-XX:MaxPermSize=128m

-XX:PermSize=64m -XX:MaxPermSize=128m It is recommended to increase only the max perm size

Add an additional java parameter with adjusted index wrapper.java.additional.9=-Xss=64k

-Xss=64k Default stack size is OS dependent. Adjust and observe. Too low a value can cause StackOverflowError

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Customer cases

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Troubleshooting • Changing the transport protocol to UDP or Raw TCP did not help • Could not reproduce the problem in house • Customer captured tcpdump packets and analyzed them using Wireshark –

Large number of “TCP Window Full” messages



SEQ/ACK analysis showed that at times there is more than 10KB data in flight indicating that the receiver is too slow to process the incoming flood of packets



TCP receive buffer and window sizes got reduced over time which contributed to the slow reception



Further enquiries revealed that the Syslog NG connector is receiving TLS data from 2 other sources



With this new discovery of customer environment, problem could also be reproduced in house



Observed a high memory usage and Increased the heap space to1024 MB, but it did not help

Root cause • Destination Syslog NG connector did not close TCP connections when sources closed connections • Growing TCP connections forces receive buffer size to be reduced causing slower reception

Solution • Set the ‘tcppeerclosedchecktimeout’ parameter to 30000 msec (half a minute) • This parameter tells the connector to proactively check and close any TCP sockets 20

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Customer case 1 – problem CEF Syslog TLS destination was caching at only at 200eps, while ESM and Logger destinations did not cache for the same event rate

ESM Source Connector Syslog NG Source 1 Syslog NG Source 2

Logger CEF SyslogTLS Syslog NG Connecto r TLS

Syslog Connector performance tuning Observations • Incoming event rate was much higher than the processing rate and connector was queuing heavily • During peak hours, queuing has exceeded the size limit and dropped a huge number of events • Caching observed during peaks hours and some events were dropped when cache size limit is exceeded

Customer case 2 – problem Huge difference of event counts found between Fortigate Firewall and Logger via Syslog connector

• High memory usage and frequent Full GCs were observed affecting the performance of the connector

Fortigate Firewall queuing

Syslog Connector Queue Rate(SLC) vs Events/Sec(SLC)

Queue Drop Count

Memory usage (Total vs Used) agent.out.wrapper.log:INFO agent.out.wrapper.log:INFO agent.out.wrapper.log:INFO agent.out.wrapper.log:INFO agent.out.wrapper.log:INFO agent.out.wrapper.log:INFO agent.out.wrapper.log:INFO agent.out.wrapper.log:INFO

Events/Sec(SLC) vs Throughput(SLC) 21

Cache Size and Current Drop count

| jvm 1 | jvm 1 | jvm 1 | jvm 1 | jvm 1 | jvm 1 | jvm 1 | jvm 1

| 2012/12/05 11:35:29 | [Full GC | 2012/12/05 11:37:08 | [Full GC | 2012/12/05 11:38:52 | [Full GC | 2012/12/05 11:40:30 | [Full GC | 2012/12/05 11:42:06 | [Full GC | 2012/12/05 11:43:47 | [Full GC | 2012/12/05 11:45:32 | [Full GC | 2012/12/05 11:47:10 | [Full GC

Frequent Full GCs

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

caching

Logger

Syslog Connector performance tuning Customer case 2 – solution Machines hosting the connectors were very powerful (64 bit Linux, 48 core CPU, 128 GB RAM, 600 GB hard disk)

Actions taken • Increased the java heap size to 2048 MB to reduce the frequency of full GCs • Enabled syslog parser multi-threading to keep up with the queuing rate • Increased the file queue size from 100 to 2000 files of 10MB equivalent to 20 GB in total size to prevent dropping of events from file queue • Increased the cache size from 1GB to 10GB to prevent dropping of events from cache during peak hours • The above measured helped the performance of the connector significantly – Where it did not help solve the problem completely, we asked the customer to split the event volume among multiple syslog connectors

22

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Observations • Time per Batch = roundtrip time taken for a batch of events to travel from logger to ESM and the acknowledgment for the batch to come back from ESM logger • The US logger took an average of 40 msec/batch and the UK logger took an average over 500 msec/batch • This large difference in the round trip time is indicative of network latency due to geographical distance and is the root cause of caching in the UK logger

Customer case 3 – problem Customer had loggers in UK and USA forwarding events to an ESM manager in US. Only the UK loggers were experiencing caching and event loss

Logger in UK Onboard Connector caching

Logger in USA

USA Logger: Time per Batch ~ 40 msec

UK Logger: Time per Batch > 500 msec

Solution • Enabled multithreading on the ESM transport with a thread count of 2, this showed an improvement in throughput • Increased the thread count to 7 (number of processors in the CPU) and caching went away completely 23

Onboard Connector

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

ESM In USA

Syslog Connector performance tuning Recommendations

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Syslog Connector performance tuning Some recommendations • Evaluate the number of devices, device types and event volume early in your deployment cycle • Split the load among multiple connectors when the incoming event rate exceeds 2000 events/sec • When splitting the load, consider grouping the devices of same type to one connector and another type to a different connector • Evaluate the total capacity of your machine and other processes running to determine the number of connectors to install on a single machine • Cumulative heap size allocated to connectors and other java processes should be well below the total memory available on the system

25

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank you

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Security for the new reality © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.