Tuning SCHED_ULE on FreeBSD George Neville-Neil
[email protected]
May 7, 2009
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
1 / 29
Introduction
Outline
I
BSD Scheduler History
I
SCHED_ULE
I
Tuning Hooks
I
Testing Methodology
I
Effects
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
2 / 29
Introduction
BSD Scheduler History
I
BSD written for uni-processor machines
I
No SMP
I
No HTT
I
No multicores
I
Up through FreeBSD 5 only modified not wholesale rewritten
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
3 / 29
Introduction
Why SCHED_ULE?
I
SMP and multi-core
I
SMP is NOT multi-core
I
Cache effects
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
4 / 29
Introduction
Why keep SCHED_BSD?
I
One size does not fit all
I
There are still uniprocessors
I
Embedded systems
I
A baseline to compare against
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
5 / 29
Introduction
Scheduler Responsibilities and Goals
I
Arbitrate amongst competing processes
I
Adhere to the will of the administrator
I
Stay out of the way
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
6 / 29
Introduction
Why tune the scheduler?
I
Can change overall performance of the system
I
Favor one type of job over another
I
Not all workloads are interactive
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
7 / 29
Introduction
Don’t Panic
I
The scheduler is one of the most important components of the kernel
I
You (probably) cannot destroy your system via scheduler tuning
I
Proceed with caution
I
Measure, modify, measure, modify
I
All of the tunables can simply turned off if they cause trouble
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
8 / 29
Tuning Hooks
Interactivity Tunables
name Name of scheduler, ULE or 4BSD interact Interactivity score threshold slice Time slice for timeshare threads (100ms)
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
9 / 29
Tuning Hooks
SCHED_ULE Tuning Hooks
steal_thresh Minimum load on a remote CPU before we’ll steal work. steal_idle Attempt to steal idle work from other CPUs before this CPU goes idle. steal_htt Steals work from another core on idle.
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
10 / 29
Tuning Hooks
Stealing
I
Stealing in SCHED_ULE can be virtuous
I
Cores can steal work from each other
I
It is a way of balancing work in an SMP/multi-core system
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
11 / 29
Tuning Hooks
SCHED_ULE Tuning Hooks
balance Enable the long term load balancer. balance_interval Average frequency in stathz ticks to run the long term load balancer (below). affinity Number of ticks to keep a thread from changing CPU.
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
12 / 29
Tuning Hooks
SCHED_ULE Tuning Hooks
idlespinthresh Threshold before idle spinning can occur idlespins Number of times the idle thread will spin waiting for new work static_boost Assign static priorities to sleeping threads preepmt_thresh Minimum priority for preemption, lower priorities are more likely to be picked.
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
13 / 29
Testing
Testing Methodology
I
We introduce a dummy load on the system
I
Read data from another process
I
Do some math in a loop
I
Should have few or no voluntary context switches
I
Wish to reduce involuntary context switches
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
14 / 29
Testing
Context Switching
I
Changing the process which is executing on a core Voluntary Process takes an action that blocks or calls sched_yield() Involuntary with Preemption On exiting a critical section or interrupt service routine a process may be pre-empted. Involuntary without Preemption
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
15 / 29
1 2 3 4 5 6 7 8 9 0 1
Testing
The output of top(1)
l a s t p i d : 1023; l o a d averages : 0 . 9 6 , 0 . 5 3 , 0.25 up 0+00:08:21 1 4 : 4 0: 2 8 100 processes : 10 running , 58 s l e e p i n g , 32 w a i t i n g CPU: 12.5% user , 0.0% nice , 0.0% system , 0.0% i n t e r r u p t , 87.5% i d l e Mem: 17M A c t i v e , 9848K I n a c t , 106M Wired , 68K Cache , 16M Buf , 7785M Free Swap : 8192M T o t a l , 8192M Free PID 1019 982 1015 1011
USERNAME gnn gnn gnn gnn
VCSW 0 0 0 0
IVCSW 21 0 0 0
George Neville-Neil (
[email protected])
READ 0 0 0 0
WRITE 0 0 0 0
FAULT 0 0 0 0
TOTAL PERCENT COMMAND 0 0.00% dummy2 0 0.00% t c s h 0 0.00% dummy1 0 0.00% usdlogd
Tuning SCHED_ULE on FreeBSD
May 7, 2009
16 / 29
Testing
Tuning Tests
I
Turn off balancing
I
Change the time slice
I
Test system has eight cores total
I
Each test was run for 15 minutes while observing top.
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
17 / 29
Testing
Turn off Balancing
I
The CPU balancer runs every 133 ticks
I
In a system that is being hand tuned why run the balancer?
I
What’s the effect of turning off the balancer
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
18 / 29
1 2 3 4 5 6 7 8 9 0 1
Testing
With Balancing
l a s t p i d : 1023; l o a d averages : 0 . 9 6 , 0 . 5 3 , 0.25 up 0+00:08:21 1 4 : 4 0: 2 8 100 processes : 10 running , 58 s l e e p i n g , 32 w a i t i n g CPU: 12.5% user , 0.0% nice , 0.0% system , 0.0% i n t e r r u p t , 87.5% i d l e Mem: 17M A c t i v e , 9848K I n a c t , 106M Wired , 68K Cache , 16M Buf , 7785M Free Swap : 8192M T o t a l , 8192M Free PID 1019 982 1015 1011
USERNAME gnn gnn gnn gnn
VCSW 0 0 0 0
IVCSW 21 0 0 0
George Neville-Neil (
[email protected])
READ 0 0 0 0
WRITE 0 0 0 0
FAULT 0 0 0 0
TOTAL PERCENT COMMAND 0 0.00% dummy2 0 0.00% t c s h 0 0.00% dummy1 0 0.00% usdlogd
Tuning SCHED_ULE on FreeBSD
May 7, 2009
19 / 29
1 2 3 4 5 6 7 8 9 0 1
Testing
Without Balancing
l a s t p i d : 1024; l o a d averages : 0 . 9 8 , 0 . 6 1 , 0.30 up 0+00:09:21 1 4 : 4 1: 2 8 100 processes : 10 running , 58 s l e e p i n g , 32 w a i t i n g CPU: 12.4% user , 0.0% nice , 0.1% system , 0.0% i n t e r r u p t , 87.5% i d l e Mem: 17M A c t i v e , 9852K I n a c t , 106M Wired , 68K Cache , 16M Buf , 7785M Free Swap : 8192M T o t a l , 8192M Free PID 1019 982 1015 1011
USERNAME gnn gnn gnn gnn
VCSW 0 0 0 0
IVCSW 20 0 0 0
George Neville-Neil (
[email protected])
READ 0 0 0 0
WRITE 0 0 0 0
FAULT 0 0 0 0
TOTAL PERCENT COMMAND 0 0.00% dummy2 0 0.00% t c s h 0 0.00% dummy1 0 0.00% usdlogd
Tuning SCHED_ULE on FreeBSD
May 7, 2009
20 / 29
Testing
Balancing Results
I
A slight increase in load average (0.96 to 0.99)
I
The load average remains slightly higher
I
The number if involuntary context switches does not change
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
21 / 29
Testing
Time Slice
I
The default time slice is 13 ticks
I
Increase the time slice to 64, 128, and 256 ticks
I
At each level run for 15 minutes
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
22 / 29
Testing
Time Slice Evaluation 22
20
18
16
14
IVCSW
12
10
8
6
4
2
0
0
20
40
60
80
100
120
140
160
180
200
220
240
260
Time Slice (ticks)
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
23 / 29
Testing
How long does a switch take?
I
A set of scheduler stats are available
I
Need to build the kernel with SCHED_STATS
I
Locally added calls to rdtsc to mi_switch
I
Store the difference between these values on each switch
I
Crude but effective
I
Reading the sysctl every 3 seconds
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
24 / 29
Testing
Switch Timing Results 28,000
26,000
Slice 13 (Default)
24,000
22,000
Slice 256 20,000
18,000
Cycles
16,000
Using cpuset
14,000
12,000
10,000
8000
6000
4000
2000
0
0
2
4
6
8
10
12
14
Time (Samples)
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
25 / 29
Concluding Remarks
Scheduler Statistics
preempt Pre emptions anywhere in the system owepreempt Were in a critical section and should have pre-empted turnstile Switches due to mutex contention sleepq Switches due to sleep relinquish Called a yield function needresched Pre emption of user processes on exit from the kernel
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
26 / 29
Concluding Remarks
Turning All This Off
I
Sometimes you know what must be done
I
Assigning processes to cores is also possible
I
See cpuset(4) man page
I
See also Brooks Davis’ presentation
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
27 / 29
Concluding Remarks
Further Reading
I
/usr/src/sys/kern/sched_ule.c
I
/usr/src/sys/kern/sched_switch.c
I
“ULE: A Modern Scheduler for FreeBSD”, by Jeff Roberson
I
“The Design and Implementation of the FreeBSD Operating System”, by McKusick and Neville-Neil
I
R. Jain, “The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling,”
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
28 / 29
Concluding Remarks
Questions?
George Neville-Neil (
[email protected])
Tuning SCHED_ULE on FreeBSD
May 7, 2009
29 / 29