Parallax. Dutch Meyer. University of British Columbia

Parallax Dutch Meyer University of British Columbia [email protected] The Plan Virtual Machines and Storage  Parallax Feature Overview  Technical D...
4 downloads 0 Views 961KB Size
Parallax Dutch Meyer University of British Columbia [email protected]

The Plan Virtual Machines and Storage  Parallax Feature Overview  Technical Design  System Evaluation  Conclusion 

Parallax is a Storage Service

Observations on Naïve storage Virtual machines can be created and destroyed easily - storage can’t  VM encapsulation make capturing wholemachine state attractive, but capturing a whole disk image is slow  Giving similar VMs similar disk images results in wasted space 

Our Research Questions How do we make volume provisioning agile enough to match VM creation?  Can we capture whole-disk state at nearcontinuous granularity?  How much data redundancy can we eliminate?  How much overhead to do all of this well? 

Parallax storage system 

Use snapshots as a unifying tool for  Provisioning

new volumes

 Data

sharing  Low overhead state capture and backup   

Allow block-level layout optimization Allow disconnected/degraded operation Compatibility due to VM based architecture operating at the block level

Snapshots 

Data Protection  Low  

“What if” configuration and testing Backup

 High  



Granularity (eg days) Granularity (eg ms)

Legal compliance Paranoia

Time Travel – By capturing whole-machine state at high frequency, we can revisit previous machine states

Provisioning via Gold Mastering Use snapshots to create a copy of some reference volume, which can be further specialized  Requirements include 

 Global

availability  Efficient operation  No hard limits on the number of volumes

Data Sharing Commonly derived disks can share common data  Sharing is read-only, COW when data is modified  We can further eliminate redundancy by detecting duplicate blocks and deduping them (current focus) 

Parallax Implementation Building Virtual Disks  Locking and Synchronization  Storage Services 

System Review Parallax engine is a user-mode tapdisk driver for block management  Provides services to any VM sharing the same physical machine  Federates across multiple physical machines to share a single volume of storage 

Building Virtual Disks Flexibility in block placement is essential to providing disk isolation  Parallax uses a radix tree to facilitate this 

 Fixed

height  Root is linked to a disk image  Nodes are disk blocks, containing an array of pointers

Radix Nodes and Trees

Taking A Snapshot

IO Batching 

Parallax follows the semantics of a physical disk Simultaneous requests may be completed in any order  Must retain “crash consistency” 



Updating radix trees can involve several IO operations Batching becomes essential to maintaining performance  Ordering constraints are imposed for crash consistency 

 

We use a dependency tracking system to issue writes in the correct order Writes are aggressively pipelined – similar to instruction scheduling

Parallax Implementation Building Virtual Disks  Locking and Synchronization  Storage Services 

Federating Physical Machines All machines share a single disk  Some synchronization is required between physical machines  Data plane is protected through long lived coarse grained allocation  Control plane requires a lock manager 

Lock Management 

Current System has 3 contentious locks  Creating

a virtual disk  Claiming a virtual disk  Requesting a new extent

In practice these locks are very infrequent  It is possible to further limit contention in our design 

Parallax Implementation Building Virtual Disks  Locking and Synchronization  Storage Service 

Degraded Operation

Evaluation: Performance System Throughput

Per Request Latency

Evaluation: Snapshots Snapshot overhead

Storage Overheads

Conclusion We can use VM based encapsulation to extend the services normally provided in a storage stack  Despite using several potentially highoverhead techniques, parallax achieves reasonable performance 

Future Work Working on deduping, layout optimization  Expose features to aware file systems  More storage services for VMs: caching, encryption, etc.  General release 

End of Presentation Thanks!  Questions? 

Extents We wish to minimize contention for the shared disk  The simple approach is to partition the disk into large extents which can be given exclusively to individuals  We use a 2GB extent size currently 

Translating: Virtual to Physical 01 01

01011100 0001 A

B

00 01

11 00

Root

C