Progress and Challenges for Real-Time Virtualization*

Progress and Challenges for Real-Time Virtualization* Chris Gill Professor of Computer Science and Engineering Washington University, St. Louis, MO, ...

Author: Bruno Warner

0 downloads 4 Views 5MB Size

Report

Download PDF

Recommend Documents

Network Virtualization Challenges

Challenges in Virtualization and SLA Management

End site challenges: Router virtualization and lightpaths

REALTIME 4D CAD + RFID FOR PROJECT PROGRESS MANAGEMENT

Network Function Virtualization: Challenges and Directions for Reliability Assurance

Industrial dynamics and innovation: progress and challenges

Virtualization brings new security challenges for large companies

Challenges in Resource Allocation in Network Virtualization

Power Management Challenges in Virtualization Environments

Progress and Challenges in Finding a Cure for Asthma

Adult Stem Cell Therapy for Stroke: Challenges and Progress

Host virtualization: a taxonomy of management challenges

Safeguard Information System for REDD+ in Indonesia Progress and Challenges

Hybrid Choice Models: Progress and Challenges

Islamic Banking in Iran - Progress and challenges

Legume proteomics: Progress, prospects, and challenges

Exotic emerging viral diseases: progress and challenges

States Progress and Challenges in Implementing Common

Primary progressive multiple sclerosis: progress and challenges

5 Key Virtualization Management Challenges and How to Overcome Them

Service Virtualization for Dummies

Yet, despite this progress, many challenges remain:

Virtualization Challenges Whitepaper, January Mark Huff, Technologent Practice Manager

Challenges in Urdu Stemming (A Progress Report)

Progress and Challenges for Real-Time Virtualization* Chris Gill

Professor of Computer Science and Engineering Washington University, St. Louis, MO, USA [email protected]

VtRES Workshop Keynote at RTCSA 2013

National Taiwan University, Taipei, Taiwan, Wed Aug 21, 2013 *Our research described in this talk is supported in part by the US NSF and the US ONR, and has been driven by numerous contributions from Sisu Xi, Justin Wilson, Chong Li, and Chenyang Lu (Washington University in St. Louis) and from Jaewoo Lee, Sanjian Chen, Linh Phan, Insup Lee, and Oleg Sokolsky (University of Pennsylvania)

Two Key Uses of Virtualiza3on •  Use fewer compu3ng resources and/or pla;orms to (consolidate or) integrate systems via virtualiza3on •  Provide elas3c cloud services on-‐demand and at scale to mul3ple tenants via virtualiza3on

2

Challenges for Real-‐Time Virtualiza3on •  Real-‐Time System Integra3on –  How to schedule resources feasibly among compe3ng domains –  How maintain 3ming guarantees as diﬀerent components and systems are composed –  How to preserve guarantees across mul3ple shared resources •  Real-‐Time Cloud Services –  How to analyze 3ming and provide guarantees in the face of resource elas4city and mul4-‐tenancy

3

RT Virtualiza3on for System Integra3on •  Some key challenges for real-‐3me (especially safety-‐cri4cal) systems –  Temporal isola3on as dedicated cores become shared ones –  Preserving isola3on as components and systems are composed –  Maintaining end-‐to-‐end 3ming guarantees as networked communica2on becomes inter-‐domain communica2on spanning both computa3on and communica3on resources Virtualization Platform Hypervisor

Legacy System

Legacy System Domains 4

A Brief Survey of Other Related Work (please see our publica3ons for references) •  Improving VMM Scheduling (Credit, SEDF) and Domain 0 in Xen –  OUen helps with isola3on, predictability, etc. but without real-‐3me guarantees

•  Improving Inter-‐domain communica3on in Xen –  E.g., XWAY, XenLoop, Xensocket: involve modifying guest OS or applica3ons

•  Approaches targe3ng other virtualiza3on architectures –  CucinoZa et al. [COMPSAC 2009] applied hierarchical real-‐3me scheduling to KVM, e.g., towards suppor3ng Real-‐Time Service Oriented Architectures –  Fiasco and L4 (TU Dresden) oﬀer precise virtualiza3on capabili3es for systems ranging from small embedded systems to large complex systems 5

Tradi3onal Virtualiza3on in Xen •  Good for system integra3on, cost reduc3on, etc. App   Real-time aware

X  Not real-time aware Domains are scheduled round-robin with NO prioritization of OS instances

App OS

App

App OS

Xen Hypervisor HardwareHardware Hardware

Time

Problem: Some RT Applications CANNOT benefit from this kind of Virtualization 6

RT Virtualiza3on I: Real-‐Time Scheduling of Domains in RT-‐Xen App

App

OS Sched

App

…

App

OS Sched

Xen Scheduler

Basic Solution: Incorporate Hierarchical Scheduling into Xen

App

App

Leaf Sched

App

…

App

Leaf Sched

Root Scheduler Leaves are implemented as Servers (Period, Budget, Priority)

7

Basic Server Design (Deferrable & Periodic) •  Servers have 3 parameters (Period, Budget, Priority) S1 (5, 3, 1) with Two Tasks

T1 (１０, ３) T2 (１０, ３)

Time ０

Deferrable Server

Periodic Server

2

5

１０

１５

Actual Execution

back-to-back

３

Budget in S1

０

Actual Execution

2

5

１０

１５

Time

IDLE ３

Budget in S1

Time ０

2

5

１０

１５

8

Evalua3on Setup Scheduling Algorithm (Deferrable, Polling, Periodic, Sporadic) (Period, Budget, Priority) for Dom1 (Period, Budget, Priority) for Dom2 …

IDLE

Use Rate Monotonic within each Domain For each task: shorter period -‐> higher priority App

App

Dom0

Dom1

VCPU

VCPU

App …

App

Dom5

VCPU

RT-Xen Schedulers (Deferrable, Polling, Periodic, Sporadic) Core 0

Core 1

9

Xen Credit vs. Real-‐Time VM Scheduling 0.8 Deferrable

Deadline Miss Ratio

0.7

Credit scheduler  poor real-time performance

Sporadic

0.6

Polling

0.5

Periodic Credit

0.4

SEDF 0.3 0.2 0.1 0 50

Real-time VM scheduling helps! 60

70

80

90

100

Total CPU Load “RT-‐Xen: Towards Real-‐Time Hypervisor Scheduling in Xen”, ACM Interna2onal Conferences on Embedded SoLware (EMSOFT), 2011 10

10

RT Virtualiza3on II: Incorpora3ng Composi3onal Scheduling •  Composi3onal Scheduling Framework (CSF) –  Provides temporal isola4on and real-‐4me guarantees –  Computes components’ minimum-‐bandwidth resource model •  Mind the gap between CSF theory and system implementa4on –  Realizing CSF though virtualiza3on can bridge that gap Parent component Resource Model

Periodic Resource Model (period, budget) Rate Monotonic

Scheduler Resource Model

Resource Model

Scheduler

Scheduler

Workload

Workload

Child components

Periodic Tasks

Component 11

Composi3onal Scheduling in RT-‐Xen •  Component  domain •  Periodic Resource Model (PRM)  Periodic Server (PS) •  Task model: independent, CPU-‐intensive, periodic task –  Scheduling algorithm: rate monotonic Compositional Scheduling (Theoretical Framework) Task

Task

Component PRM

Task

Task

Component PRM

Root Component

CSF in RT-Xen (System Implementation)) App

App

App

App

Domain

Domain

PS

PS

Hypervisor Hardware

12

First Need to Extend CSF to Deal with Quantum-‐based Scheduling Pla;orms Quantum-based resource model

of resource model

•  Find the minimum-‐bandwidth resource model for workload W

Min-BW resource model

Non-decreasing B/P:

the upper bound of the period to find min-BW resource model?

Real-number-based resource model

infeasible bandwidth Necessary condition for schedulability

1 2

P:

13

of resource model

13

Then Can Improve Periodic Server Design •  Purely Time-‐driven Periodic Server (PTPS) –  If currently scheduled domain is idle, its budget is wasted –  Not work-‐conserving t Δ Current Domain

Task Release Task Complete

DH Budget

time

Execution of tasks in DH

DL

Budget Execution of tasks in DL 14

Periodic Server Re-‐Design I •  Work-‐Conserving Periodic Server (WCPS) –  If currently scheduled domain is idle, the hypervisor picks a lower-‐priority domain that has tasks to execute –  Early execu3on of the lower-‐priority domain during idle period does not aﬀect schedulability t Δ

Task Release Task Complete

Current Domain

DH

Budget

time

Execution of tasks in DH

DL

Budget Execution of tasks in DL 15

Periodic Server Re-‐Design II •  Capacity Reclaiming Periodic Server (CRPS) –  If currently scheduled domain is idle, we can re-‐assign this idled budget to any other domain that has tasks to execute –  Early execu3on of the other domain during idle period does not aﬀect schedulability t Δ

Task Release Task Complete

Current Domain

DH

Budget

time

Execution of tasks in DH

DL

Budget Execution of tasks in DL 16

Interface Overhead: Synthe3c Workload UW: 90.4%, URM: 114.3%, Dom5: (22, 1)

CDF Plot, Probability

1

0%

deadline miss

0.8

CRPS ≥ WCPS ≥ PTPS

0.6 60%

0.4 0.2 100%

0 0

0.5

CRPS_dom5 (DMR: 0.0622) WCPS_dom5 (DMR: 60.5) PTPS_dom5 (DMR: 100)

1 1.5 2 Response Time / Deadline

2.5

3

“Realizing Composi2onal Scheduling Through Virtualiza2on”, IEEE Real-‐Time and Embedded Technology and Applica2ons Symposium (RTAS), 2012 17

RT Virtualiza3on III: Inter-‐Domain Communica3on Dom 2

Dom 3 Dom 4 Dom 0 Linux 3.4.2 100% CPU

…

Dom 9

0.8 CDF Plot

sent pkt every 10ms 5,000 data points

1

Dom 1

0.6 0.4 0.2

RT−Xen, Original Dom 0 Credit, Original Dom 0

Dom 10 0 50

100

150 200 250 Micro Seconds

300

VMM Scheduler: RT-‐Xen VS. Credit

When Domain 0 is not busy, the VMM scheduler C 5 higher priority dominates the IDC performance for C 4 C 0 C 1 C 2 C 3 domains (i.e., adding real-‐4me scheduling already helps) 18

But, is Real-‐Time Scheduling Enough??? CDF Plot

1 RT−Xen, Original Dom 0 0.5

0 0

5000 10000 Micro Seconds

Dom 0

Dom 1

15000

Dom 2

Dom 5

…

Dom 4

…

Dom 3

…

100% CPU VMM Scheduler

C 0

C 1

C 2

C 3 C 4

C 5

19

A LiZle Background on Xen’s Domain 0 Domain 1 …

A

netback[0] { rx_action(); tx_action(); }

ne4f

TX

… Domain m

ne4f

…

neWront

RX

ne4f

…

ne4f

Domain n …

D

soYnet_data ne4f

…

B

netback

neWront

C

Domain 2

Domain 0

ne4f

neWront

neWront

Packets are fetched in a round-robin order Packets share one queue in softnet_data 20

RTCA: Reﬁning Domain 0 for Real-‐Time IDC Domain 1 …

A

ne4f

TX

Domain m …

neWront

RX

ne4f

…

ne4f

Domain n …

B

soYnet_data ne4f

…

A

netback

…

neWront

netback[0] { rx_action(); tx_action(); }

ne4f

neWront

B

Domain 2

Domain 0

ne4f neWront

Packets are fetched by priority, up to a batch size Queues are separated by priority in softnet_data 21

Eﬀects on IDC Latency IDC Latency between Domain 1 and Domain 2 in presence of low priority IDC (us)

By reducing priority inversion in Domain 0, RTCA mi4gates impacts of low priority IDC on latency of high priority IDC

“Priori2zing Local Inter-‐Domain Communica2on in Xen”, ACM/IEEE Interna2onal Symposium on Quality of Service (IWQoS), 2013 22

Preserving Domain 0 Throughput in RTCA iPerf Throughput between Dom 1 and Dom 2

12

Gbits/s

10 8

RTCA, Size 1 RTCA, Size 64 RTCA, Size 238 Original

6 4 2 0

Base

Light

Medium Heavy

A small batch size leads to signiﬁcant reduc4on in high priority IDC latency and improved IDC throughput under interfering traﬃc 23

What Next? Time to ShiU Gears

Real-‐Time System Integra4on is clearly important, but Real-‐Time Cloud Compu4ng may prove even more so 24

Towards Real-‐Time Cloud Services •  Key challenge: how to analyze 3ming and provide guarantees in the face of resource elas4city and mul4-‐tenancy

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” – Leslie Lamport A virtualized system is one in which the failure of a computer that doesn’t actually exist can render your en4re applica4on unusable.

25

How to Address this Issue? •  Need to shiU our assump3ons about system design to give precise real-‐3me seman3cs within resource elas4city and mul4-‐tenancy “… the Java pla;orm's promise of "Write Once, Run Anywhere,” … oﬀer[s] far greater cost-‐savings poten3al in the real-‐3me (and more broadly, the embedded) domain than in the desktop and server domains.” … “The real-‐3me Java pla;orm's necessarily qualiﬁed promise of "Write Once Carefully, Run Anywhere Condi3onally" is nevertheless the best prospec3ve opportunity for applica3on re-‐usability.” – Foreward to the Real-‐Time Speciﬁca4on for Java

26

Clouds are not Real-‐Time Today •  Virtualiza3on technology underlying clouds is not real-‐3me   Xen: virtual machine monitor for Amazon EC2 •  CPU: propor3onal-‐share scheduling App   Real-time

X  Not real-time

App VM

App

App VM

Virtual Machine Monitor Hardware HardwareHardware

•  If anything, I/O is worse –  Vague “performance indicators”: low/medium/large –  Or you can pay a lot to get dedicated physical network resources 27

Mo3va3on to Make Clouds Real-‐Time •  Hard to provide 3ming guarantees –  Simple Interface -‐> no 3ming informa3on –  Consolida3on ra3o keeps increasing -‐> more compe33on –  Live migra3on without no3ﬁca3on -‐> unstable performance •  Why are 3ming guarantees important? –  If the steal 3me exceeds a given threshold, Ne;lix shuts down the virtual machine and restarts it elsewhere [sciencelogic], [scout] –  “Xbox One may oﬄoad computa3ons to cloud…” [MicrosoU Blog] –  “Energy eﬃcient GPS sensing with cloud oﬄoading” [Sensys’12] –  … also, smart grids, earthquake early warning, etc. in CPS 28

28

Towards Improving the Current State of the Art •  Func3ons of the cloud management system are an essen3al focus –  Interface to the end users –  VM ini3al placement –  VM live migra3on (load balance, host maintenance, etc) •  Commercial management systems are mostly close-‐source: Amazon EC2 (Xen), Google Compute Engine (KVM), MicrosoU Azure (Hyper-‐V), VMware vCenter (vSphere), Xen Center (XenServer) •  Open source alterna3ves –  OpenStack (HPCloud, RackSpace, etc), CloudStack, OpenNebula, … –  All compa3ble with XenServer, vSphere, KVM, etc. 29

29

Limita3ons and Opportuni3es – Interface •  VMware vCenter –  Reserva3on: minimum guaranteed resources, in MHz –  Limita3on: upper bound for resources, in MHz –  Share: rela3ve importance of the VM •  OpenStack –  # of VCPUs

30

30

Limita3ons and Opportuni3es – Ini3al VM Placement •  Filtering –  VM-‐VM aﬃnity / an3-‐aﬃnity, VM-‐Host aﬃnity / an3-‐aﬃnity, etc –  When is a host ‘full’? •  VMware vCenter: based on reserva3on of VMs •  OpenStack: pre-‐conﬁgured ra3o (default is 16) •  Ranking –  VMware vCenter: try each host; turn on stand-‐by hosts –  OpenStack: spread and packed

31

31

Limita3ons and Opportuni3es – VM Load Balancing •  Open source alterna3ves: no load balancing by default •  VMware vCenter –  Distributed Resource Scheduler (DRS) •  Triggered every 5 min, calculate normalized host u3liza3on •  Minimize cluster-‐wide imbalance (standard devia3on over all hosts)

32

32

Concluding Remarks •  Much has been accomplished already –  RT-Xen, CSF, RTCA support real-time in open-source Xen –  Other approaches have focused on other virtualization architectures and platforms (e.g., L4), mechanisms, etc. •  Much remains to be done –  Especially as we move towards larger and more complex real-time systems and systems-of-systems –  Gains made in real-time virtualization can be extended to offer (and define) new capabilities for real-time clouds 33

Thank You! All source code is available at hZp://sites.google.com/site/real3mexen/

34