BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications Bogdan Nicolae, Diana Moise, Gabriel Antoniu, Luc Bouge, Matthieu Dorier Procedeedings of the 24th IEEE International Parallel and Distributed Processing Symposium, Atlanta, USA, April 2010.
Presented by:
Cristina-Iulia BUCUR
[email protected] Vrije Universiteit Amsterdam 20th February 2012
Motivation
20.02.12
2
http://www.thebiggertruth.com/2012/01/big-data-a-better-definition/
Outline ●
Introduction
●
Specialized File Systems for Map/Reduce:
●
20.02.12
●
Requirements for the storage layer
●
Dedicated Map/Reduce file systems
BlobSeer ●
Key Design & Algorithms
●
System Architecture
●
Integration with Hadoop
●
I/O Operations
●
Evaluation
●
Conclusions
3
Outline ●
Introduction
●
Specialized File Systems for Map/Reduce:
●
20.02.12
●
Requirements for the storage layer
●
Dedicated Map/Reduce file systems
BlobSeer ●
Key Design & Algorithms
●
System Architecture
●
Integration with Hadoop
●
I/O Operations
●
Evaluation
●
Conclusions
4
Hadoop ●
Software framework for distributed data-intensive applications
●
Works on commodity hardware
●
Major contributor: Yahoo!
●
Developer: Apache Foundation
●
Open source Java project
20.02.12
5
MapReduce ●
Parallel programming model for data clusters
●
Perform computations on massive amounts of data
●
Key component: storage layer
20.02.12
6
Hadoop Distributed File System
20.02.12
7
Outline ●
Introduction
●
Specialized File Systems for Map/Reduce:
●
20.02.12
●
Requirements for the storage layer
●
Dedicated Map/Reduce file systems
BlobSeer ●
Key Design & Algorithms
●
System Architecture
●
Integration with Hadoop
●
I/O Operations
●
Evaluation
●
Conclusions
8
Requirements for the storage layer
20.02.12
●
Fine-grain access to files
●
High sustained throughput
●
Heavy access concurrency
●
Data-location awareness
●
Versioning
9
Dedicated File Systems for MapReduce
GFS = Google File System HFS/HDFS = Hadoop File System S3 = Simple Storage Service (Amazon)
20.02.12
10
Outline ●
Introduction
●
Specialized File Systems for Map/Reduce:
●
20.02.12
●
Requirements for the storage layer
●
Dedicated Map/Reduce file systems
BlobSeer ●
Key Design & Algorithms
●
System Architecture
●
Integration with Hadoop
●
I/O Operations
●
Evaluation
●
Conclusions
11
BlobSeer in a Nutshell ●
Concurrency optimized file system for Hadoop
●
Support for data-intensive distributed applications
●
Version-oriented design
●
Data structure: BLOB – Binary Large Objects
●
Operations: ● ●
20.02.12
read (unique ID, version number) write/append (unique ID, offset, size) 12
BlobSeer: Key Design & Algorithms 1. Data striping 2. Distributed metadata 3. Versioning access interface 4. Lock-free synchronization
20.02.12
13
1. Data Striping
20.02.12
●
Splitting large objects in smaller chunks
●
Data chunks: pages up to 64MBs
●
Blocks distributed among storage nodes
●
Load balancing strategy
14
2. Distributed Metadata
Writing four pages to an empty BLOB 20.02.12
Overwriting the first two pages of the BLOB 15
3. Versioning Access Interface ●
Handled on client side entirely
●
After write/append => new snapshot
●
●
20.02.12
Storage of differential patches of data and metadata Older versions: accessible in an undefined period of time
16
4. Lock-Free Synchronization ●
Version-based concurrency control
●
No DELETE operation
●
Avoid synchronization as much as possible: 1. write data 2. generate version number & store new metadata
20.02.12
17
BlobSeer: System Architecture
20.02.12
18
Integrating BlobSeer with Hadoop 1. HDFS API for BlobSeer ● ●
Implement versioning mechanism Concurrent append operation
2. BlobSeer File System (BSFS) as storage backend
20.02.12
●
File system namespace: namespace manager
●
Data prefetching: caching mechanism
●
Affinity scheduling: data distribution 19
I/O Operations
READ 20.02.12
WRITE 20
Heavy Access Concurrency
20.02.12
21
Outline ●
Introduction
●
Specialized File Systems for Map/Reduce:
●
20.02.12
●
Requirements for the storage layer
●
Dedicated Map/Reduce file systems
BlobSeer ●
Key Design & Algorithms
●
System Architecture
●
Integration with Hadoop
●
I/O Operations
●
Evaluation
●
Conclusions
22
Evaluation (1)
Single Writer, single file 20.02.12
23
Evaluation (2)
Concurrent reads, shared file 20.02.12
24
Evaluation (3)
20.02.12
Concurrent clients, appends on same file
25
Evaluation (4)
Higher-level MapReduce applications 20.02.12
26
Outline ●
Introduction
●
Specialized File Systems for Map/Reduce:
●
20.02.12
●
Requirements for the storage layer
●
Dedicated Map/Reduce file systems
BlobSeer ●
Key Design & Algorithms
●
System Architecture
●
Integration with Hadoop
●
I/O Operations
●
Evaluation
●
Conclusions
27
Conclusions ●
BlobSeer extends functionalities of Hadoop: ●
●
Concurrent appends & versioning
Efficiency is directly dependent on storage layer
20.02.12
28
Backup slides
20.02.12
29
BlobSeer File System - BSFS 1. File system namespace: namespace manager ●
Maintaining file system namespace
●
Mapping files to BLOBs
2. Data prefetching: caching mechanism ●
Prefetch data for reads
●
Delay committing data for writes
3. Affinity scheduling: computation close to data 20.02.12
30
Microbenchmarks Setup (1) ●
Grid`5000 testbed
●
Clusters: Sophia-Antipolis, Orsay, Lille
●
Nodes: X86_64 CPUs ● 2 GB (Orsay), 4GB RAM (Sophia, Lille) Intracluster bandwidth: 1 Gbit/s ●
●
●
20.02.12
Intracluster latency: 0,1 ms
31
Microbenchmarks Setup (2) ●
270 nodes from the same cluster on Grid`5000
●
HDFS:
●
●
One namenode on a dedicated machine
●
One datanode on each machine
BSFS: ●
20.02.12
One node each for version manager, provider manager and namespace manager
●
20 metadata providers
●
Remaining nodes as data providers 32