Big Data Administration

This course is a comprehensive study of Big Data Administration using Hadoop. The course topics include Introduction to Hadoop and its Architecture, MapReduce and HDFS and MapReduce Abstraction. It further covers best practices to configure, deploy, administer, maintain, monitor and troubleshoot a Hadoop Cluster. This course also trains on Job scheduling in Hadoop, Multinode Cluster setup using Amazon ec2 and brief about Yarn. Become a Big Data Administrator by learning concepts of Hadoop and implement advanced operations on Hadoop Clusters.



Apache Web UI

  • NameNode architecture (EditLog, FsImage, location of replicas)
  • Secondary NameNode architecture
  • DataNode architecture

MapReduce Architecture

  • Exploring JobTracker/TaskTracker
  • How a client submits a Map-Reduce job
  • Exploring Mapper/Reducer/Combiner
  • Shuffle: Sort & Partition
  • Input/output formats
  • Job Scheduling (FIFO, Fair Scheduler, Capacity Scheduler)
  • Exploring the Apache MapReduce Web UI

Hadoop Administrative Tasks

  • Routine Administrative Procedures
  • Understanding dfsadmin and mradmin
  • Block Scanner, Balancer
  • Health Check & Safe mode
  • Monitoring and Debugging on a production cluster
  • NameNode Back up and Recovery
  • DataNode commissioning/decommissioning
  • ACL (Access control list)
  • Upgrading Hadoop

Hive Architecture

  • Introduction to Hive

Pig Architecture

  • Introduction to Pig

Sqoop Architecture

  • Introduction to Sqoop
  • Installation of Sqoop
  • Import data from RDBMS to HDFS
  • Hands-On Exercise


  • Introduction to Oozie
  • Installation