Empirical Virtual Machine Models for Performance Guarantees

Empirical Virtual Machine Models for Performance Guarantees Andrew Turner, Akkarit Sangpetch, Hyong S. Kim Carnegie Mellon University LISA 2010 11th...
Author: Frederick Snow
1 downloads 0 Views 2MB Size
Empirical Virtual Machine Models for Performance Guarantees

Andrew Turner, Akkarit Sangpetch, Hyong S. Kim Carnegie Mellon University

LISA 2010 11th November 2010

1

Introduction Good application performance

Hosts run multiple VMs

Performance bottlenecks Resource allocation levels

New You

=

You Now

Automatic resource allocations levels

2

Overview • Current problems • Our approach • Method • Results • Conclusion 3

Current Problems

• • • • • •

Proxy 1

Web 1

DB 1

Web 2

DB 2

Proxy 2

Multiple application tiers on different hosts Resources needs F(resource allocation) = performance? Needs change throughout the day Over-provisioning wastes energy and resources Unhappy users 4

Our approach • Observe performance

• Create online model

• Calculate required resources 5

Our approach Measured error SLO target

Resource allocation Physical Machine

Our system

Measured performance

Application performance

Host and Application Monitoring

Control loop constantly checks performance and recalibrates resource allocation levels 6

Benefits of our system • Automatically identifies performance bottlenecks • Automatically sets resource allocation levels • Provides more performance per resource allocated • Reduces energy and hardware usage • Allows SLOs to be met

7

Assumptions • We can monitor application performance • We can control resource access or scheduling • Application performance is convex

8

Data used T – SLO target

E – Probability T achieved

R – Real performance

C – Contention level

W – Workload level

A – Resource allocation

M – Performance Model Find A and guarantee that:

P( R ≥ T ) ≥ E P( R ≥ T ) = P( M (W , C , A) ≥ T ) P( R ≥ T ) =

∫∫ P(W = w) P(C = c) P(M (w, c, A) ≥ T )dwdc 0..100%

9

Creating the model • Created online • Use previously observed data • Curve fit to fill unobserved areas CPU contention effect on response time

1200 1000 800 600 400 200 100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0 0%

Response time (ms)

1400

CPU contention 10

Deciding resource assignment • Hyperplane at target performance • Choose allocation that crosses plane TPC-W response time 400 40% contention 30% contention 20% contention 10% contention

350

Response Time (ms)

300

250

200

Target

150

100

50 10

20

30

40 50 60 70 Web Server CPU allocation (%)

80

90

100

11

Deciding resource assignment • Resource allocation is vector of allocations • E.g. (proxy = 65%, web = 55%) or (proxy = 80%, web = 35%) TPC-W response time while changing proxy and web CPU allocation

600

250

200

Response Time (ms)

500 400

150

Target

300 200 100 100 0 0

50 20

0 40

20 40

60

Proxy server share

60

80

80 100

Web server share

0

100

12

Deciding resource assignment • Resource allocation is vector of allocations • E.g. (proxy = 65%, web = 55%) or (proxy = 80%, web = 35%) TPC-W response time 40% contention 250

10 20

200

Proxy server share

30 40

150 50

Target

60 100 70 80

50

90 100 100

90

80

70

60 50 40 Web server share

30

20

10

0

13

Deciding resource assignment • Resource allocation is vector of allocations TPC-W response time 30% contention 250

10 20

200

Proxy sever allocation

30 40

150 50

Target

60 100 70 80

50

90 100 100

90

80

70 60 50 40 Web sever allocation

30

20

10

0

14

Deciding resource assignment A – the potential resource allocations X – chosen resource allocations

Hosts

App 1 solutions App 2 solutions

• •

30 50 60 70 20 40

0 0 0 20 20 80

Q – Priority level of application

Priority Start X

60 50 30 50 90 50

0.8 0.8 0.8 1 1 1

? ? ? ? ? ?

End X

0 0 1 0 0 1

Minimize: XTA 1TQ Subject to: XTA = 1, X >= 0

15

Reducing model dimensions • Which resources are important to model? • Use regression to find impact of each resource

Time

CPU Contention

Disk Contention

Performance

1

10%

10%

130ms

2

40%

12%

180ms

3

14%

90%

135ms

4

12%

50%

132ms

5

30%

75%

160ms

6

10%

40%

130ms

Disk has no affect

CPU does have affect 16

Experimental Evaluation • We test TPC-W and a dynamic web page • Measure response time

Apache Server

Apache Server

Apache Server

TPC-W Proxy

TPC-W Web

TPC-W DB

Host 1

Host 2

Host 3 17

Experimental Evaluation 150

SLO 100ms

100

SLO 150ms

50% resource allocation

Response time (ms)

50

0

100

200

300

400

500

0

100

200

300

400

500

0

100

200

300

400

500

0

100

200

300

400

500

200 150 100 50 300 200

100

300

10% resource allocation

100

CPU (%)

CPU contention levels

200

100

SQL Contention Proxy Contention

50

Web Contention 0

0

100

200

300

400

500

• System keeps response time close to target 18

Experimental Evaluation • Dynamic resource assignment helps meet SLOs • Use less resources that static allocation Test

RT average 89ms

Resource allocation average 48%

Apache VM average 125ms

SLO = 100ms SLO = 150ms

127ms

35%

107ms

50% resource allocation 10% resource allocation

150ms

50%

120ms

355ms

10%

83ms

19

SLO 100ms

SLO 150ms

Total TPC TPC-W resource allocation (%)

Experimental Evaluation 100 80 60 40 20 0

0

100

200

300

400

500

0

100

200

300

400

500

100 80 60 40 20 0

CPU contention levels

CPU (%)

80

60

40

SQL Contention Proxy Contention

20

Web Contention

0

0

100

200

300

400

500

• No increase in resource allocation as DB is not bottleneck 20

Experimental Evaluation 150 100

SLO 100ms

50% resource allocation

Response time (ms)

SLO 150ms

50

0

20

40

60

80

100

120

140

160

180

0

20

40

60

80

100

120

140

160

180

0

20

40

60

80

100

120

140

160

180

0

20

40

60

80

100

120

140

160

180

0

20

40

60

80

100

120

140

160

180

200 150 100 50 400 300 200 100 0 400 300

10% resource allocation

200 100

Number of users

Users

0 400 300 200 100

• Meets target time despite changes in workload 21

Conclusion • We automatically calculate required resources • Works on generic multi-tiered applications • Helps to meet SLOs • Better performance per resource assigned • Simplifies resource management

22