Prerequisites
To apply for the Hadoop Administration Training, you need to either:
- To learn big data Analytics tools you need to know at least one programming language like Java, Python or R.
- You must also have basic knowledge on databases like SQL to retrieve and manipulate data.
- You need to have knowledge on basic statistics like progression, distribution, etc. and mathematical skills like linear algebra and calculus.
Course Curriculum
Module 1: Hadoop Administration Fundamentals
- Introduction to big data
- Limitations of existing solutions
- Common Big Data domain scenarios
- Hadoop Architecture
- Hadoop Components and Ecosystem
- Data loading & Reading from HDFS
- Replication Rules
- Rack Awareness theory
- Hadoop cluster Administrator: Roles and Responsibilities.
Module 2: Hadoop clusters and Architecture
- Working of HDFS and its internals
- Hadoop Server roles and their usage
- Hadoop Installation and Initial configuration
- Different Modes of Hadoop Cluster.
- Deploying Hadoop in a Pseudo-distributed mode
- Deploying a Multi-node Hadoop cluster
- Installing Hadoop Clients
- Understanding the working of HDFS and resolving simulated problems.
- Hadoop 1 and its Core Components.
- Hadoop 2 and its Core Components.
Module 3: hadoop cluster administration and processing of frameworks
- Properties of NameNode, DataNode and Secondary Namenode
- OS Tuning for Hadoop Performance
- Understanding Secondary Namenode
- Log Files in Hadoop
- Working with Hadoop distributed cluster
- Decommissioning or commissioning of nodes
- Different Processing Frameworks
- Understanding MapReduce
- Spark and its Features
- Application Workflow in YARN
- YARN Metrics
- YARN Capacity Scheduler and Fair Scheduler
- Understanding Schedulers and enabling them.
Module 4: Hadoop cluster administration and maintenance
- Namenode Federation in Hadoop
- HDFS Balancer
- High Availability in Hadoop
- Enabling Trash Functionality
- Checkpointing in Hadoop
- DistCP and Disk Balancer.
Module 5: Planning and management
- Planning a Hadoop 2.0 cluster
- Cluster sizing
- HardwareNetwork and Software considerations
- Popular Hadoop distributions
- Workload and usage patterns
- Industry recommendations.
Module 6: Backup and recovery
- Key Admin commands like DFSADMIN
- Safe Mode
- Importing Check Point
- MetaSave command
- Data backup and recovery
- Backup vs Disaster recovery
- Namespace count quota or space quota
- Manual failover or metadata recovery.
Module 7: Hadoop cluster monitoring and security
- Monitoring Hadoop Clusters
- Authentication & Authorization
- Nagios and Ganglia
- Hadoop Security System Concepts
- Securing a Hadoop Cluster With Kerberos
- Common Misconfigurations
- Overview on Kerberos
- Checking log files to understand Hadoop clusters for troubleshooting.
Module 8: Hadoop with HA and upgrading
- Configuring Hadoop 2 with high availability
- Upgrading to Hadoop 2
- Working with Sqoop
- Understanding Oozie
- Working with Hive.
- Working with Pig.
Module9: setting up the cluster
- Cloudera Manager and cluster setup
- Hive administration
- HBase architecture
- HBase setup
- Hadoop/Hive/Hbase performance optimization.
- Pig setup and working with a grunt.
Module 10: Conclusion
- Summarize all the points discussed.