Progress and Challenges for Real-Time Virtualization* Chris Gill
Professor of Computer Science and Engineering Washington University, St. Louis, MO, USA
[email protected]
VtRES Workshop Keynote at RTCSA 2013
National Taiwan University, Taipei, Taiwan, Wed Aug 21, 2013 *Our research described in this talk is supported in part by the US NSF and the US ONR, and has been driven by numerous contributions from Sisu Xi, Justin Wilson, Chong Li, and Chenyang Lu (Washington University in St. Louis) and from Jaewoo Lee, Sanjian Chen, Linh Phan, Insup Lee, and Oleg Sokolsky (University of Pennsylvania)
Two Key Uses of Virtualiza3on • Use fewer compu3ng resources and/or pla;orms to (consolidate or) integrate systems via virtualiza3on • Provide elas3c cloud services on-‐demand and at scale to mul3ple tenants via virtualiza3on
2
Challenges for Real-‐Time Virtualiza3on • Real-‐Time System Integra3on – How to schedule resources feasibly among compe3ng domains – How maintain 3ming guarantees as different components and systems are composed – How to preserve guarantees across mul3ple shared resources • Real-‐Time Cloud Services – How to analyze 3ming and provide guarantees in the face of resource elas4city and mul4-‐tenancy
3
RT Virtualiza3on for System Integra3on • Some key challenges for real-‐3me (especially safety-‐cri4cal) systems – Temporal isola3on as dedicated cores become shared ones – Preserving isola3on as components and systems are composed – Maintaining end-‐to-‐end 3ming guarantees as networked communica2on becomes inter-‐domain communica2on spanning both computa3on and communica3on resources Virtualization Platform Hypervisor
Legacy System
Legacy System Domains 4
A Brief Survey of Other Related Work (please see our publica3ons for references) • Improving VMM Scheduling (Credit, SEDF) and Domain 0 in Xen – OUen helps with isola3on, predictability, etc. but without real-‐3me guarantees
• Improving Inter-‐domain communica3on in Xen – E.g., XWAY, XenLoop, Xensocket: involve modifying guest OS or applica3ons
• Approaches targe3ng other virtualiza3on architectures – CucinoZa et al. [COMPSAC 2009] applied hierarchical real-‐3me scheduling to KVM, e.g., towards suppor3ng Real-‐Time Service Oriented Architectures – Fiasco and L4 (TU Dresden) offer precise virtualiza3on capabili3es for systems ranging from small embedded systems to large complex systems 5
Tradi3onal Virtualiza3on in Xen • Good for system integra3on, cost reduc3on, etc. App Real-time aware
X Not real-time aware Domains are scheduled round-robin with NO prioritization of OS instances
App OS
App
App OS
Xen Hypervisor HardwareHardware Hardware
Time
Problem: Some RT Applications CANNOT benefit from this kind of Virtualization 6
RT Virtualiza3on I: Real-‐Time Scheduling of Domains in RT-‐Xen App
App
OS Sched
App
…
App
OS Sched
Xen Scheduler
Basic Solution: Incorporate Hierarchical Scheduling into Xen
App
App
Leaf Sched
App
…
App
Leaf Sched
Root Scheduler Leaves are implemented as Servers (Period, Budget, Priority)
7
Basic Server Design (Deferrable & Periodic) • Servers have 3 parameters (Period, Budget, Priority) S1 (5, 3, 1) with Two Tasks
T1 (10, 3) T2 (10, 3)
Time 0
Deferrable Server
Periodic Server
2
5
10
15
Actual Execution
back-to-back
3
Budget in S1
0
Actual Execution
2
5
10
15
Time
IDLE 3
Budget in S1
Time 0
2
5
10
15
8
Evalua3on Setup Scheduling Algorithm (Deferrable, Polling, Periodic, Sporadic) (Period, Budget, Priority) for Dom1 (Period, Budget, Priority) for Dom2 …
IDLE
Use Rate Monotonic within each Domain For each task: shorter period -‐> higher priority App
App
Dom0
Dom1
VCPU
VCPU
App …
App
Dom5
VCPU
RT-Xen Schedulers (Deferrable, Polling, Periodic, Sporadic) Core 0
Core 1
9
Xen Credit vs. Real-‐Time VM Scheduling 0.8 Deferrable
Deadline Miss Ratio
0.7
Credit scheduler poor real-time performance
Sporadic
0.6
Polling
0.5
Periodic Credit
0.4
SEDF 0.3 0.2 0.1 0 50
Real-time VM scheduling helps! 60
70
80
90
100
Total CPU Load “RT-‐Xen: Towards Real-‐Time Hypervisor Scheduling in Xen”, ACM Interna2onal Conferences on Embedded SoLware (EMSOFT), 2011 10
10
RT Virtualiza3on II: Incorpora3ng Composi3onal Scheduling • Composi3onal Scheduling Framework (CSF) – Provides temporal isola4on and real-‐4me guarantees – Computes components’ minimum-‐bandwidth resource model • Mind the gap between CSF theory and system implementa4on – Realizing CSF though virtualiza3on can bridge that gap Parent component Resource Model
Periodic Resource Model (period, budget) Rate Monotonic
Scheduler Resource Model
Resource Model
Scheduler
Scheduler
Workload
Workload
Child components
Periodic Tasks
Component 11
Composi3onal Scheduling in RT-‐Xen • Component domain • Periodic Resource Model (PRM) Periodic Server (PS) • Task model: independent, CPU-‐intensive, periodic task – Scheduling algorithm: rate monotonic Compositional Scheduling (Theoretical Framework) Task
Task
Component PRM
Task
Task
Component PRM
Root Component
CSF in RT-Xen (System Implementation)) App
App
App
App
Domain
Domain
PS
PS
Hypervisor Hardware
12
First Need to Extend CSF to Deal with Quantum-‐based Scheduling Pla;orms Quantum-based resource model
of resource model
• Find the minimum-‐bandwidth resource model for workload W
Min-BW resource model
Non-decreasing B/P:
the upper bound of the period to find min-BW resource model?
Real-number-based resource model
infeasible bandwidth Necessary condition for schedulability
1 2
P:
13
of resource model
13
Then Can Improve Periodic Server Design • Purely Time-‐driven Periodic Server (PTPS) – If currently scheduled domain is idle, its budget is wasted – Not work-‐conserving t Δ Current Domain
Task Release Task Complete
DH Budget
time
Execution of tasks in DH
DL
Budget Execution of tasks in DL 14
Periodic Server Re-‐Design I • Work-‐Conserving Periodic Server (WCPS) – If currently scheduled domain is idle, the hypervisor picks a lower-‐priority domain that has tasks to execute – Early execu3on of the lower-‐priority domain during idle period does not affect schedulability t Δ
Task Release Task Complete
Current Domain
DH
Budget
time
Execution of tasks in DH
DL
Budget Execution of tasks in DL 15
Periodic Server Re-‐Design II • Capacity Reclaiming Periodic Server (CRPS) – If currently scheduled domain is idle, we can re-‐assign this idled budget to any other domain that has tasks to execute – Early execu3on of the other domain during idle period does not affect schedulability t Δ
Task Release Task Complete
Current Domain
DH
Budget
time
Execution of tasks in DH
DL
Budget Execution of tasks in DL 16
Interface Overhead: Synthe3c Workload UW: 90.4%, URM: 114.3%, Dom5: (22, 1)
CDF Plot, Probability
1
0%
deadline miss
0.8
CRPS ≥ WCPS ≥ PTPS
0.6 60%
0.4 0.2 100%
0 0
0.5
CRPS_dom5 (DMR: 0.0622) WCPS_dom5 (DMR: 60.5) PTPS_dom5 (DMR: 100)
1 1.5 2 Response Time / Deadline
2.5
3
“Realizing Composi2onal Scheduling Through Virtualiza2on”, IEEE Real-‐Time and Embedded Technology and Applica2ons Symposium (RTAS), 2012 17
RT Virtualiza3on III: Inter-‐Domain Communica3on Dom 2
Dom 3 Dom 4 Dom 0 Linux 3.4.2 100% CPU
…
Dom 9
0.8 CDF Plot
sent pkt every 10ms 5,000 data points
1
Dom 1
0.6 0.4 0.2
RT−Xen, Original Dom 0 Credit, Original Dom 0
Dom 10 0 50
100
150 200 250 Micro Seconds
300
VMM Scheduler: RT-‐Xen VS. Credit
When Domain 0 is not busy, the VMM scheduler C 5 higher priority dominates the IDC performance for C 4 C 0 C 1 C 2 C 3 domains (i.e., adding real-‐4me scheduling already helps) 18
But, is Real-‐Time Scheduling Enough??? CDF Plot
1 RT−Xen, Original Dom 0 0.5
0 0
5000 10000 Micro Seconds
Dom 0
Dom 1
15000
Dom 2
Dom 5
…
Dom 4
…
Dom 3
…
100% CPU VMM Scheduler
C 0
C 1
C 2
C 3 C 4
C 5
19
A LiZle Background on Xen’s Domain 0 Domain 1 …
A
netback[0] { rx_action(); tx_action(); }
ne4f
TX
… Domain m
ne4f
…
neWront
RX
ne4f
…
ne4f
Domain n …
D
soYnet_data ne4f
…
B
netback
neWront
C
Domain 2
Domain 0
ne4f
neWront
neWront
Packets are fetched in a round-robin order Packets share one queue in softnet_data 20
RTCA: Refining Domain 0 for Real-‐Time IDC Domain 1 …
A
ne4f
TX
Domain m …
neWront
RX
ne4f
…
ne4f
Domain n …
B
soYnet_data ne4f
…
A
netback
…
neWront
netback[0] { rx_action(); tx_action(); }
ne4f
neWront
B
Domain 2
Domain 0
ne4f neWront
Packets are fetched by priority, up to a batch size Queues are separated by priority in softnet_data 21
Effects on IDC Latency IDC Latency between Domain 1 and Domain 2 in presence of low priority IDC (us)
By reducing priority inversion in Domain 0, RTCA mi4gates impacts of low priority IDC on latency of high priority IDC
“Priori2zing Local Inter-‐Domain Communica2on in Xen”, ACM/IEEE Interna2onal Symposium on Quality of Service (IWQoS), 2013 22
Preserving Domain 0 Throughput in RTCA iPerf Throughput between Dom 1 and Dom 2
12
Gbits/s
10 8
RTCA, Size 1 RTCA, Size 64 RTCA, Size 238 Original
6 4 2 0
Base
Light
Medium Heavy
A small batch size leads to significant reduc4on in high priority IDC latency and improved IDC throughput under interfering traffic 23
What Next? Time to ShiU Gears
Real-‐Time System Integra4on is clearly important, but Real-‐Time Cloud Compu4ng may prove even more so 24
Towards Real-‐Time Cloud Services • Key challenge: how to analyze 3ming and provide guarantees in the face of resource elas4city and mul4-‐tenancy
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” – Leslie Lamport A virtualized system is one in which the failure of a computer that doesn’t actually exist can render your en4re applica4on unusable.
25
How to Address this Issue? • Need to shiU our assump3ons about system design to give precise real-‐3me seman3cs within resource elas4city and mul4-‐tenancy “… the Java pla;orm's promise of "Write Once, Run Anywhere,” … offer[s] far greater cost-‐savings poten3al in the real-‐3me (and more broadly, the embedded) domain than in the desktop and server domains.” … “The real-‐3me Java pla;orm's necessarily qualified promise of "Write Once Carefully, Run Anywhere Condi3onally" is nevertheless the best prospec3ve opportunity for applica3on re-‐usability.” – Foreward to the Real-‐Time Specifica4on for Java
26
Clouds are not Real-‐Time Today • Virtualiza3on technology underlying clouds is not real-‐3me Xen: virtual machine monitor for Amazon EC2 • CPU: propor3onal-‐share scheduling App Real-time
X Not real-time
App VM
App
App VM
Virtual Machine Monitor Hardware HardwareHardware
• If anything, I/O is worse – Vague “performance indicators”: low/medium/large – Or you can pay a lot to get dedicated physical network resources 27
Mo3va3on to Make Clouds Real-‐Time • Hard to provide 3ming guarantees – Simple Interface -‐> no 3ming informa3on – Consolida3on ra3o keeps increasing -‐> more compe33on – Live migra3on without no3fica3on -‐> unstable performance • Why are 3ming guarantees important? – If the steal 3me exceeds a given threshold, Ne;lix shuts down the virtual machine and restarts it elsewhere [sciencelogic], [scout] – “Xbox One may offload computa3ons to cloud…” [MicrosoU Blog] – “Energy efficient GPS sensing with cloud offloading” [Sensys’12] – … also, smart grids, earthquake early warning, etc. in CPS 28
28
Towards Improving the Current State of the Art • Func3ons of the cloud management system are an essen3al focus – Interface to the end users – VM ini3al placement – VM live migra3on (load balance, host maintenance, etc) • Commercial management systems are mostly close-‐source: Amazon EC2 (Xen), Google Compute Engine (KVM), MicrosoU Azure (Hyper-‐V), VMware vCenter (vSphere), Xen Center (XenServer) • Open source alterna3ves – OpenStack (HPCloud, RackSpace, etc), CloudStack, OpenNebula, … – All compa3ble with XenServer, vSphere, KVM, etc. 29
29
Limita3ons and Opportuni3es – Interface • VMware vCenter – Reserva3on: minimum guaranteed resources, in MHz – Limita3on: upper bound for resources, in MHz – Share: rela3ve importance of the VM • OpenStack – # of VCPUs
30
30
Limita3ons and Opportuni3es – Ini3al VM Placement • Filtering – VM-‐VM affinity / an3-‐affinity, VM-‐Host affinity / an3-‐affinity, etc – When is a host ‘full’? • VMware vCenter: based on reserva3on of VMs • OpenStack: pre-‐configured ra3o (default is 16) • Ranking – VMware vCenter: try each host; turn on stand-‐by hosts – OpenStack: spread and packed
31
31
Limita3ons and Opportuni3es – VM Load Balancing • Open source alterna3ves: no load balancing by default • VMware vCenter – Distributed Resource Scheduler (DRS) • Triggered every 5 min, calculate normalized host u3liza3on • Minimize cluster-‐wide imbalance (standard devia3on over all hosts)
32
32
Concluding Remarks • Much has been accomplished already – RT-Xen, CSF, RTCA support real-time in open-source Xen – Other approaches have focused on other virtualization architectures and platforms (e.g., L4), mechanisms, etc. • Much remains to be done – Especially as we move towards larger and more complex real-time systems and systems-of-systems – Gains made in real-time virtualization can be extended to offer (and define) new capabilities for real-time clouds 33
Thank You! All source code is available at hZp://sites.google.com/site/real3mexen/
34