BIG DATA APACHE HADOOP ADMINISTRATION amron

0 BIG DATA – APACHE HADOOP ADMINISTRATION | amron Training Details Course Duration: 40 hours Training + Assignments + Actual Project Based Case St...
Author: Meghan Preston
6 downloads 0 Views 701KB Size
0

BIG DATA – APACHE HADOOP ADMINISTRATION |

amron

Training Details Course Duration: 40 hours Training + Assignments + Actual Project Based Case Studies Training Materials: All attendees will receive,   

Assignment after each module, Video recording of every session Notes and study material for examples covered. Access to the Training Blog & Repository of Materials

Training Format: This course is delivered as a highly interactive session, with extensive live examples. This course is Live Instructor led Online training delivered using Cisco Webex Meeting center Web and Audio Conferencing tool. Timing: Weekdays and Weekends after work hours.

Course Objective: This training aims to provide the participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster. From Installation and configuration through load-balancing and tuning. The participants will learn the complete Installation of Hadoop Cluster, understand the basic and advanced concepts of Map Reduce and the best practices for Apache Hadoop Development as experienced by the developers and architects of core Apache Hadoop with the help of hands-on exercises, participants will learn the following topics during the course.       

The internals of Map Reduce and HDFS and how to build Hadoop Architecture. Proper cluster configuration and deployment to integrate with systems and hardware in data center. How to load data into cluster from dynamically-generated files using Flume and from RDBMS using Sqoop. Configuring the FairScheduler to provide service-level agreements for multiple users of a cluster. Installing and implementing Kerberos-based security for your cluster. Best practices for preparing and maintaining Apache Hadoop in production. Troubleshooting, diagnosing, tuning and solving Hadoop issues.

Note: The course will be have 40% of theoretical discussion and 60% of actual hands on

1

BIG DATA – APACHE HADOOP ADMINISTRATION |

amron

Audience & Pre-Requisites: 

This course is designed for Systems Administrators and IT Managers who have basic Linux experience. No need for prior knowledge of Apache Hadoop.

Project Work and Case Study details and Time spent?  

We will provide case study based on the real-time project, which takes 4 weeks to develop. The specification and guidance will be given on the case study and the participants need to develop and show the result.

Who should plan on joining? 

Students, DBAs, System Administrators, Software Architects, Data Warehouse Professionals, IT Managers, and Software Developers interested in learning Hadoop Cluster Administration should go for this course.

Training Highlights         

2

Focus on Hands on training 40 hours of Assignments, Live Case Studies Video Recordings of sessions provided One Problem Statement discussed across the whole training program. Introduction to HADOOP and BIG DATA HADOOP Admin Certification Guidance. 100% Practical with Mentor Guidance Resume prep, Interview Questions provided. Covers All Important Hadoop Ecosystem Products

BIG DATA – APACHE HADOOP ADMINISTRATION |

amron

Road Map

3

BIG DATA – APACHE HADOOP ADMINISTRATION |

amron

Modules Covered in this Training In this training, attendees learn: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

What is Big Data The Case for Apache Hadoop The Hadoop Distributed File System MapReduce An Overview of the Hadoop Ecosystem Planning your Hadoop Cluster Hadoop Installation Advanced Configuration Hadoop Security Managing and Scheduling Jobs Cluster Maintenance Cluster Monitoring and Troubleshooting Installing and Managing Other Hadoop Projects

Attendees also learn: 1. Resume Preparation Guidelines and Tips 2. Mock Interviews and Interview Preparation Tips

Topics Covered What is Big Data?    

Need for a different technique for Data Storage Need for a different paradigm for Data Analysis The 3 V’s of Big Data Different distributions of Hadoop

The Case for Apache Hadoop    

A Brief History of Hadoop Core Hadoop Components Fundamental Concepts Hadoop Eco-Systems – Overview

The Hadoop Distributed File System     

4

HDFS Features HDFS Design Assumptions Overview of HDFS Architecture Writing and Reading Files Hands-On Exercise

BIG DATA – APACHE HADOOP ADMINISTRATION |

amron

MapReduce       

What Is MapReduce? Features of MapReduce Basic MapReduce Concepts Architectural Overview What is a Combiner? What is a Practitioner? Hands-On Exercise

An Overview of the Hadoop Ecosystem    

What is the Hadoop Ecosystem? Integration Tools Analysis Tools Data Storage and Retrieval Tools

Planning your Hadoop Cluster    

General planning Considerations Choosing the Right Hardware Network Considerations Configuring Nodes

Hadoop Installation     

Deployment Types Installing Hadoop Basic Configuration Parameters Hands-On Exercise on a Pseudo – Cluster Hands-On Exercise on a Multi-Node Cluster

Advanced Configuration     

Advanced Parameters core-site.xml parameters mapred-site.xml parameters hdfs-site.xml parameters Configuring Rack Awareness

Hadoop Security    

5

Why Hadoop Security Is Important Hadoop’ s Security System Concepts What Kerberos Is and How it Works Integrating a Secure Cluster with Other Systems

BIG DATA – APACHE HADOOP ADMINISTRATION |

amron

Managing and Scheduling Jobs        

Managing Running Jobs Hands-On Exercise The FIFO Scheduler The Fair Scheduler The Capacity Scheduler Configuring the Fair Scheduler Evaluating the different schedulers Hands-On Exercise

Cluster Maintenance        

Checking HDFS Status Hands-On Exercise Copying Data Between Clusters Adding and Removing Cluster Nodes Rebalancing the Cluster Hands-On Exercise Name Node Metadata Backup Cluster Upgrading

Cluster Monitoring and Troubleshooting       

General System Monitoring Managing Hadoop’s Log Files Using the Name Node and Job Tracker Web UIs Hands-On Exercise Cluster Monitoring with Ganglia Common Troubleshooting Issues Benchmarking Your Cluster

Installing and Managing Other Hadoop Projects    

Hive Pig Hbase Oozie

CASE STUDY # 1 – “Healthcare System” Healthcare System Application: As the Product Manager for Inner Expressions you are asked to provide one of your largest clients with additional features in the EMR (Electronic Medical Records Management) System. The client has requested an integrated Referral Management System that tracks patients from Primary care into the Specialist departments. Appointments are created by either the Primary Care Physicians 6

BIG DATA – APACHE HADOOP ADMINISTRATION |

amron

themselves or other clinical staff like Nurse Practitioners or Clinical Assistants. Each appointment must go through the appropriate checks including checking if the patient has an active insurance with the client, whether the insurance program covers the condition of the patient, patient’s preference for location and timings and availability of the Specialist doctor. Some appointments may have to be reviewed by the Specialists themselves before they can be approved, the administrator of the facility (hospital) must have the ability to choose by appointment type to either make it directly bookable by the Primary Care Staff or as a type that requires review by the specialist. The system should also allow the Primary Care Staff and specialists departments to exchange notes and comments about a particular appointment. If the specialist department requests tests or reports as mandatory for the appointment, the system must ensure that the patient has these available on the date of the appointment. The system shall also allow users to track the status of patients’ appointments & must store the entire clinical history of each patient. This will be used by the hospital for two main purposes; the specialist and the primary care providers will have access to the patients complete medical history before the patient walks in for the appointment and hence allowing for better patient care, the Hospital also stores this data in a general data warehouse (without Protected Health Information) to do analytics on it and come up with local disease management programs for the area. This is aligned with the Hospitals mission of providing top quality preventive medical care. The Hospital sets about 300 appointments per day and must support about 50 users at the same time. The existing EMR system is based on Java and an Oracle database system. Tasks    

Identify Actors, Use Cases, Relationships, Draw Use Case Diagrams Identify Ideal, Alternate and Exception Flows Write a Business Requirements Document

CASE STUDY # 2 – “Asset Management System” Asset Management Application: An e Examination system is also known as (e-Pariksha/ Online Examination Scheduler), an Intelligent Web Application which automates the process of pre examination scheduling of Any Academic Institutions, Universities, Colleges and School. This automations primary scope is to save nature by saving tons of paper involved in conducting the examination. All examination communications are done via email management between student and Academia. Usually any examination would start with Exam Registrations, which is connected to Subject Creation, Exam Room Management, Room Allotment, Examination Hall Dairy, and Absentees Information (Variety of Reports) – Required by University. This WebApp edges two sides of Client side and Server side Application. Client side enables student community to fill up their examination registration form online via internet and also they have privileges to check out their examination details like (Day of Start, Complete Time Table, 7

BIG DATA – APACHE HADOOP ADMINISTRATION |

amron

Day-wise Exam Details and Day seating details of the candidate- like room name, seating number subject, date and time. The Server side involves the processing of each candidate exam registration form into workflow like, Subject Loader, Room Management, Seating Manager, Room Allotment, Room Dairies, Absentee Marking, and Rich Crystal Reports to meet various needs of Data set. The Web App Admin records new chattel into database, deletes archaic ones, and revises any information related to examination. “User”. All users are known to the system by their USN, ID and their The asset management system keeps track of a number of assets that can be borrowed, their ownership, their availability, their current location, the current borrower and the asset history. Assets include books, software, computers, and peripherals. Assets are entered in the database when acquired, deleted from the database when disposed. The availability is updated whenever it is borrowed or returned. When a borrower fails to return an asset on time, the asset management system sends a reminder to the borrower and informs the asset owner. The administrator enters new assets in the database, deletes obsolete ones, and updates any information related to assets. The borrower search for assets in the database to determine their availability and borrows and returns assets. The asset owner loans assets to borrowers. Each system has exactly one administrator, one or more asset owners, and one or more borrowers. When referring to any of the above actor, we use the term “user”. All users are known to the system by their name and their email address. The system may keep track of other attributes such as the owner’s telephone number, title, address, and position in the organization. The system should support at least 200 borrowers and 2000 assets. The system should be extensible to other types of assets. The system should checkpoint the state of the database every day such that it can be recovered in case of data loss. Owners and the administrator are authenticated using a user/password combination. Actors interact with the system via a web browser capable of rendering HTML and HTTP without support for JavaScript and Java. The persistent storage is realized using an SQL database. The business logic is realized using the WebObjects runtime system. The system includes: TASKS    

Identify Actors, Use Cases, Relationships, Draw Use Case Diagrams Identify Ideal, Alternate and Exception Flows Write a Business Requirements Document

OTHER CASE STUDIES: Social Networking, Cruise Management System, Collegiate Sporting system

8

BIG DATA – APACHE HADOOP ADMINISTRATION |

amron

Copyright © Amron IT Solutions & Resource Management. 2014 All Rights Reserved. No part of this document or website may be reproduced without Amron IT Solutions & Resource Management’s express consent. www.amronitsolutions.com

9

BIG DATA – APACHE HADOOP ADMINISTRATION |

amron

Suggest Documents