Tuning SCHED_ULE on FreeBSD

Tuning SCHED_ULE on FreeBSD George Neville-Neil [email protected] May 7, 2009 George Neville-Neil ([email protected]) Tuning SCHED_ULE on Fre...
Author: Felix Rich
5 downloads 0 Views 230KB Size
Tuning SCHED_ULE on FreeBSD George Neville-Neil [email protected]

May 7, 2009

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

1 / 29

Introduction

Outline

I

BSD Scheduler History

I

SCHED_ULE

I

Tuning Hooks

I

Testing Methodology

I

Effects

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

2 / 29

Introduction

BSD Scheduler History

I

BSD written for uni-processor machines

I

No SMP

I

No HTT

I

No multicores

I

Up through FreeBSD 5 only modified not wholesale rewritten

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

3 / 29

Introduction

Why SCHED_ULE?

I

SMP and multi-core

I

SMP is NOT multi-core

I

Cache effects

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

4 / 29

Introduction

Why keep SCHED_BSD?

I

One size does not fit all

I

There are still uniprocessors

I

Embedded systems

I

A baseline to compare against

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

5 / 29

Introduction

Scheduler Responsibilities and Goals

I

Arbitrate amongst competing processes

I

Adhere to the will of the administrator

I

Stay out of the way

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

6 / 29

Introduction

Why tune the scheduler?

I

Can change overall performance of the system

I

Favor one type of job over another

I

Not all workloads are interactive

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

7 / 29

Introduction

Don’t Panic

I

The scheduler is one of the most important components of the kernel

I

You (probably) cannot destroy your system via scheduler tuning

I

Proceed with caution

I

Measure, modify, measure, modify

I

All of the tunables can simply turned off if they cause trouble

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

8 / 29

Tuning Hooks

Interactivity Tunables

name Name of scheduler, ULE or 4BSD interact Interactivity score threshold slice Time slice for timeshare threads (100ms)

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

9 / 29

Tuning Hooks

SCHED_ULE Tuning Hooks

steal_thresh Minimum load on a remote CPU before we’ll steal work. steal_idle Attempt to steal idle work from other CPUs before this CPU goes idle. steal_htt Steals work from another core on idle.

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

10 / 29

Tuning Hooks

Stealing

I

Stealing in SCHED_ULE can be virtuous

I

Cores can steal work from each other

I

It is a way of balancing work in an SMP/multi-core system

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

11 / 29

Tuning Hooks

SCHED_ULE Tuning Hooks

balance Enable the long term load balancer. balance_interval Average frequency in stathz ticks to run the long term load balancer (below). affinity Number of ticks to keep a thread from changing CPU.

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

12 / 29

Tuning Hooks

SCHED_ULE Tuning Hooks

idlespinthresh Threshold before idle spinning can occur idlespins Number of times the idle thread will spin waiting for new work static_boost Assign static priorities to sleeping threads preepmt_thresh Minimum priority for preemption, lower priorities are more likely to be picked.

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

13 / 29

Testing

Testing Methodology

I

We introduce a dummy load on the system

I

Read data from another process

I

Do some math in a loop

I

Should have few or no voluntary context switches

I

Wish to reduce involuntary context switches

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

14 / 29

Testing

Context Switching

I

Changing the process which is executing on a core Voluntary Process takes an action that blocks or calls sched_yield() Involuntary with Preemption On exiting a critical section or interrupt service routine a process may be pre-empted. Involuntary without Preemption

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

15 / 29

1 2 3 4 5 6 7 8 9 0 1

Testing

The output of top(1)

l a s t p i d : 1023; l o a d averages : 0 . 9 6 , 0 . 5 3 , 0.25 up 0+00:08:21 1 4 : 4 0: 2 8 100 processes : 10 running , 58 s l e e p i n g , 32 w a i t i n g CPU: 12.5% user , 0.0% nice , 0.0% system , 0.0% i n t e r r u p t , 87.5% i d l e Mem: 17M A c t i v e , 9848K I n a c t , 106M Wired , 68K Cache , 16M Buf , 7785M Free Swap : 8192M T o t a l , 8192M Free PID 1019 982 1015 1011

USERNAME gnn gnn gnn gnn

VCSW 0 0 0 0

IVCSW 21 0 0 0

George Neville-Neil ([email protected])

READ 0 0 0 0

WRITE 0 0 0 0

FAULT 0 0 0 0

TOTAL PERCENT COMMAND 0 0.00% dummy2 0 0.00% t c s h 0 0.00% dummy1 0 0.00% usdlogd

Tuning SCHED_ULE on FreeBSD

May 7, 2009

16 / 29

Testing

Tuning Tests

I

Turn off balancing

I

Change the time slice

I

Test system has eight cores total

I

Each test was run for 15 minutes while observing top.

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

17 / 29

Testing

Turn off Balancing

I

The CPU balancer runs every 133 ticks

I

In a system that is being hand tuned why run the balancer?

I

What’s the effect of turning off the balancer

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

18 / 29

1 2 3 4 5 6 7 8 9 0 1

Testing

With Balancing

l a s t p i d : 1023; l o a d averages : 0 . 9 6 , 0 . 5 3 , 0.25 up 0+00:08:21 1 4 : 4 0: 2 8 100 processes : 10 running , 58 s l e e p i n g , 32 w a i t i n g CPU: 12.5% user , 0.0% nice , 0.0% system , 0.0% i n t e r r u p t , 87.5% i d l e Mem: 17M A c t i v e , 9848K I n a c t , 106M Wired , 68K Cache , 16M Buf , 7785M Free Swap : 8192M T o t a l , 8192M Free PID 1019 982 1015 1011

USERNAME gnn gnn gnn gnn

VCSW 0 0 0 0

IVCSW 21 0 0 0

George Neville-Neil ([email protected])

READ 0 0 0 0

WRITE 0 0 0 0

FAULT 0 0 0 0

TOTAL PERCENT COMMAND 0 0.00% dummy2 0 0.00% t c s h 0 0.00% dummy1 0 0.00% usdlogd

Tuning SCHED_ULE on FreeBSD

May 7, 2009

19 / 29

1 2 3 4 5 6 7 8 9 0 1

Testing

Without Balancing

l a s t p i d : 1024; l o a d averages : 0 . 9 8 , 0 . 6 1 , 0.30 up 0+00:09:21 1 4 : 4 1: 2 8 100 processes : 10 running , 58 s l e e p i n g , 32 w a i t i n g CPU: 12.4% user , 0.0% nice , 0.1% system , 0.0% i n t e r r u p t , 87.5% i d l e Mem: 17M A c t i v e , 9852K I n a c t , 106M Wired , 68K Cache , 16M Buf , 7785M Free Swap : 8192M T o t a l , 8192M Free PID 1019 982 1015 1011

USERNAME gnn gnn gnn gnn

VCSW 0 0 0 0

IVCSW 20 0 0 0

George Neville-Neil ([email protected])

READ 0 0 0 0

WRITE 0 0 0 0

FAULT 0 0 0 0

TOTAL PERCENT COMMAND 0 0.00% dummy2 0 0.00% t c s h 0 0.00% dummy1 0 0.00% usdlogd

Tuning SCHED_ULE on FreeBSD

May 7, 2009

20 / 29

Testing

Balancing Results

I

A slight increase in load average (0.96 to 0.99)

I

The load average remains slightly higher

I

The number if involuntary context switches does not change

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

21 / 29

Testing

Time Slice

I

The default time slice is 13 ticks

I

Increase the time slice to 64, 128, and 256 ticks

I

At each level run for 15 minutes

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

22 / 29

Testing

Time Slice Evaluation 22

20

18

16

14

IVCSW

12

10

8

6

4

2

0

0

20

40

60

80

100

120

140

160

180

200

220

240

260

Time Slice (ticks)

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

23 / 29

Testing

How long does a switch take?

I

A set of scheduler stats are available

I

Need to build the kernel with SCHED_STATS

I

Locally added calls to rdtsc to mi_switch

I

Store the difference between these values on each switch

I

Crude but effective

I

Reading the sysctl every 3 seconds

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

24 / 29

Testing

Switch Timing Results 28,000

26,000

Slice 13 (Default)

24,000

22,000

Slice 256 20,000

18,000

Cycles

16,000

Using cpuset

14,000

12,000

10,000

8000

6000

4000

2000

0

0

2

4

6

8

10

12

14

Time (Samples)

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

25 / 29

Concluding Remarks

Scheduler Statistics

preempt Pre emptions anywhere in the system owepreempt Were in a critical section and should have pre-empted turnstile Switches due to mutex contention sleepq Switches due to sleep relinquish Called a yield function needresched Pre emption of user processes on exit from the kernel

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

26 / 29

Concluding Remarks

Turning All This Off

I

Sometimes you know what must be done

I

Assigning processes to cores is also possible

I

See cpuset(4) man page

I

See also Brooks Davis’ presentation

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

27 / 29

Concluding Remarks

Further Reading

I

/usr/src/sys/kern/sched_ule.c

I

/usr/src/sys/kern/sched_switch.c

I

“ULE: A Modern Scheduler for FreeBSD”, by Jeff Roberson

I

“The Design and Implementation of the FreeBSD Operating System”, by McKusick and Neville-Neil

I

R. Jain, “The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling,”

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

28 / 29

Concluding Remarks

Questions?

George Neville-Neil ([email protected])

Tuning SCHED_ULE on FreeBSD

May 7, 2009

29 / 29