MapReduce Jobs For Video Conversion

MapReduce Jobs For Video Conversion Click to edit Master subtitle style Ankur Gupta Harish Kumar Narware Harsh Agrawal Sourabh Gupta 11/23/09 11/2...
Author: Cornelia May
15 downloads 2 Views 372KB Size
MapReduce Jobs For Video Conversion

Click to edit Master subtitle style Ankur Gupta Harish Kumar Narware Harsh Agrawal Sourabh Gupta

11/23/09

11/23/09

Agenda •

Motivation



Introduction



Why MapReduce ?



What is FFmpeg ?



Project Description



Challenges Faced



Load Balancing ( Optimization )



Practical Use

11/23/09

Motivation

11/23/09

Motivation •







MapReduce is a software framework introduced by Google. It supports distributed computing on large datasets on clusters of computer. The framework is inspired by map and reduce functions commonly used in programming. Example,

MapReduce can sort a petabyte of data in only few hours.

11/23/09

Introduction •

• •



In the project we have to convert huge number of video files from one format to another. We are using the MapReduce framework . We are also using the open source video converter FFMPEG . The data will be retrieved and stored on HDFS .

11/23/09

Why MapReduce ? •





We need MapReduce since the number of video files to be converted is huge . Using parallelism provided by MapReduce we can complete the task in less time . Distributed computing also provides better utilization of resources .

11/23/09

What is FFMPEG ? •





FFmpeg is a complete, cross–platform solution to record, convert and stream audio and video. FFmpeg is free software and is licensed under the LGPL or GPL . FFmpeg can be installed via downloading using SVN from the following link

http://ffmpeg.org/download.html

11/23/09

Project Description •







Video files in a particular format, say AVI, will be stored in HDFS . We will accept an input file containing locations of video files in HDFS and the format in which the file has to be converted. In Map phase we convert the video format . In Map phase firstly we downloaded the input video file from HDFS to local system using filesystem API’s (copyToLocalfile())

11/23/09

Project Description (cont.) •







We used FFmpeg to convert this file into given format . Then this new file is uploaded back into HDFS using API copyFromLocalfile() in the same directory with same name but with the extension of new video format. The HDFS path of the new files is then returned as output of the Map task. Reduce is not needed.

11/23/09

Commands •

FileSystem hdfs = FileSystem.get(config);



hdfs.copyToLocalFile(srcPath, dstPath);







copyToLocalFile copies the file from srcPath in HDFS to dstPath in local system. hdfs.copyFromLocalFile(srcPath,dstPath); copyFromLocalFile copies the file at srcPath in local system to dstPath in HDFS . No file should be present at dstPath in HDFS .

11/23/09

Challenges we Faced ! •



An interesting problem we encountered was ,we were not able to get the whole converted file using FFmpeg commands in Map task. Reason is when we run a command from a java program, it executes the command in a duplicate JVM (like a child process) , and our program was exiting before the child process could complete itself . Therefore only partial file was being converted .

11/23/09

How we solved the problems ? •





We declare a datastream where the standard output of the ffmpeg command (running) is shared . We put a while loop which waits for the output of this datastream and breaks only when this datastream returns null that is when the conversion is complete . So , in this way , we waited for the duplicate jvm to complete the conversion in our map task .

11/23/09

Challenges we Faced ! •





The main challenge was to properly distribute the input splits . Each input split should contain path of files to be converted such that the total video data to be converted remains approximately same . For example there should no be input splits such that it contains the path of all the video files having large size . If such a thing happens then it will unbalance the load .

11/23/09

11/23/09

Load Balancing ( Optimization ) •

• •



Load Balancing between map tasks is very crucial . An approach We sorted the records in the input file on the basis of file size in HDFS using mapreduce . Rewritten the input file by taking one file from the start and one file from the bottom an then second file and and second last file from sorted file .

11/23/09

Load Balancing ( continued ) •



So , when the equal number of video files are given to map tasks , there will be some optimization in terms of total video data converted by a map task . But , still it is not the best method .

11/23/09

Load Balancing ( continued ) •

• •



MapReduce provides function to set the number of map tasks for a given job . Job.setNumMapTasks(x); Parameter is only a hint for the number of map tasks . Actual value depends upon the implementation getsplits function of customInputFormat Class . A lower bound on the split size can be set via mapred.min.split.size .

11/23/09

Input Format •







Validate the input-specification of the job. Split-up the input file(s) into logical InputSplit instances , each of which is then assigned to an individual Mapper . Provide the RecordReader implementation used to glean input records from the logical Inputsplit for processing by the Mapper . Default implementation is to split the input into logical InputSplit instances based on the total size, in bytes, of the input files.

11/23/09

New Approach ! •







Provide the InputFormat Implementation for the map task . In getsplit function of the InputFormat class , we divide the input split on basis of total size of video files for map task . In the function , we check the size of each file present in input in HDFS . And , when the total size of files exceeds a certain limit for a InputSplit , we create a new InputSplit . InputSplit is logical and consists of path of input file , start offset and the end offset .

11/23/09

New Approach !( cont. ) •



Here , we can exactly define the number of map tasks and the input for each map task . Set the Input Format for the job job.setInputFormat( CustomInputFormat.class).

11/23/09

11/23/09

Practical Use •

There are many website which convert video files from one format to another online. They can use this project to do so.



Most of this websites do not use MapReduce right now.



Example of such sites are,



http://www.zamzar.com/



http://www.any-video-converter.com/products/for_video_free/





http:// www.getafreelancer.com/projects/PHP-Python/Youtube-API-video-conversion-website.html http://vixy.net/

11/23/09

Questions ?

11/23/09