Big Data Hadoop Training

Teacher

keentechnologies

Free

Prerequisites

To apply for the Big Data Hadoop Training, you need to either:

To learn big data Analytics tools you need to know at least one programming language like Java, Python or R.
You must also have basic knowledge on databases like SQL to retrieve and manipulate data.
You need to have knowledge on basic statistics like progression, distribution, etc. and mathematical skills like linear algebra and calculus.

Course Curriculum

Hadoop installation and setup

Topics:

Introduction to Hadoop
Hadoop Architecture overview
Overview of high availability and federation
Different shell commands available in Hadoop
Procedure to set up a production cluster
Overview of configuration files in Hadoop
Single node cluster installation
Understanding Spark, Flume, Pig, Scala and Sqoop.

Learning outcome: Upon the completion of this module, you will gain hands-on experience in Hadoop Installation, shell commands cluster installation, etc.

Overview of Big Data Hadoop and Introduction to MapReduce and HDFS

Topics:

Overview of Big data Hadoop
Big data and the role of Hadoop
Components of Hadoop ecosystem
Distributed File System Replications
Secondary Name node, Block Size, and High Availability. Y
ARN- Node and Resource manager

Learning Outcome: Upon the completion of this chapter you will gain knowledge of data replication process, HDFS working mechanism, deciding the size of a block, gain knowledge of data node and name node.

Detail explanation of MapReduce

Topics:

Introduction to MapReduce
Learning the working procedure of MapReduce
Understanding Map and reduce concepts
Stages in MapReduce
The terminology used in MR such as Shuffle, Sort, Combiners, Partitions, Output Format, Input Format and Output Format.

Learning Outcome: Upon the completion of this chapter you learn the procedure to write a word count program, knowledge of MapReduce Combiner, writing a custom practitioner, deploying unit tests, how to use a local job runner, what is a tool runner, data set joining etc.

Introduction to Hive

Topics:

Overview of Hadoop Hive
Understanding the architecture of Hadoop
Comparison between Hive, RDBMS, and Pig
Creation of database
working with Hive Query Language
Different Hive Tables
Group by and other clauses,
Storing the Hive Results,
HCatalog, and Hive tables,
Hive partitioning, and Buckets

Learning outcome: By the completion of this module you will learn the process to create a database in Hive, Hive table creation, Database dropping and customization to a table, Writing Hive queries to pull data, Hive Table Partitioning and Group by clause.

Advanced Hive and Impala

Topics:

The index in Hive
Hive Map side join
User-defined functions in Hive
Working with complex data types
overview of Impala
Difference between Impala and Hive
Architecture of Impala

Learning Outcome: This chapter will give you complete knowledge of Hive queries, joining table, sequence table deployment, writing indexes, data storage in a different table.

Introduction to Pig

Topics:

Introduction to Apache Pig
Pig features
Schema and various data types in Hive
Tuples and Fields
Available functions in Pig, and Hive Bags

Learning outcome: By the completion of this chapter you will gain knowledge to work with Pig, loading of data, storing the data into files, restricting data to 4 rows, working with Filter By, Group By, Split, Distinct, Cross in Hive.

Sqoop and Flume

Topics:

Introduction to Apache Sqoop
Importing and exporting data
Sqoop Limitations
Performance improvement with Sqoop
Flume overview
Flume Architecture
What is CAP theorem and Hbase

Learning Outcome: Upon the completion of this module you will be able to generate sequence numbers, Consume twitter data using Sqoop, Hive table creation with AVRO, Table creation in HBase, AVRO with Pig, Scan and enable table, Deploying disable.

Writing Spark applications using Scala

Topics:

Introduction to Spark
Procedure to write Spark applications with Scala
Overview of object-oriented programming
A detailed study of Scala
Scala Uses
Executing Scala code
Multiple classes of Scala such as Getters, Extending Objects, Abstract, Constructors,
Setters, Overriding Methods.
Scala and Java interoperability
Bobsrockets package
Anonymous functions, and functional programming
comparison between Mutable and immutable collections
control Structures in Scala
Scala REPL, Lazy Values
Directed Acyclic Graph (DAG),
Spark in Hadoop ecosystem and Spark UI
Developing Spark application using SBT/Eclipse

Learning Outcome: Upon the completion of this module you will gain knowledge to write Spark applications using Scala, Scala ability for Spark real-time analytics operation.

Spark framework

Topics:

Introduction to Apache Spark
Features of Spark
Spark components Comparison
between Spark and Hadoop
Introduction t Scala and RDD
Integrating HDFS with Spark

Learning Outcome: Upon the completion of this chapter, you will learn the importance of RDD in Spark and how it makes big data processes faster.

Data Frames and Spark SQL

Topics:

Introduction to Spark SQL
Importance of SQL in Spark
Spark SQL JSON support
Structured data processing
Working with parquet files and XML data
Procedure to read JDBC file
Writing Data frame to HIve
Hive context creation
Role of Spark Dataframe
Overview of schema manual inferring,
JDBC table reading
working with CSV files
Data transformation from DataFrame to JDBC
Shared accumulators, and variables.
User-defined functions in Spark SQL
Query and Transform data in data frames
Configuration of Hive on Spark as an execution engine
Dataframe benefits

Learning Outcome: After finishing this chapter you will gain knowledge to use data frames to query and transform data and get an overview of advantages that arise out of using data frames.

Machine Learning Using Spark (MLib)

Topics:

Overview of Spark MLlib
Introduction to different algorithms
Graph processing analysis in Spark
Understanding Spark interactive algorithm
ML algorithms supported by MLlib,
Introduction to Machine learning
Introduction to accumulators,
Overview of Decision Tree, Logistic Regression,
Linear Regression. Building a Recommendation Engine
K-means clustering techniques

Learning Outcome: Upon the completion of this module you will gain hands-on experience in building a recommendation engine.

Integration of Apache Kafka and Apache Flume

Topics:

Introduction to Kafka
Use of Kafka
Kafka workflow,
Kafka architecture
Basic operations,
Configuring Kafka cluster
Integration of Apache
Kafka and Apache Flume Producing and consuming messages
Kafka monitoring tools.

Learning Outcome: Upon the completion of this module, you will gain hands-on exposure in the configuration of Single Node Multi Broker Cluster, Single Node Single Broker Cluster, and integration of Apache Flume and Kafka.

Spark Streaming

Topics:

Introduction to Spark Streaming
Working with Spark streaming
Spark Streaming Architecture
Data processing using Spark streaming
Requesting count and DStream
Features of Spark Streaming
Working with advanced data sources
Sliding window and multi-batch operations Spark Streaming features Discretized Streams (DStreams),
Spark Streaming workflow
Output Operations on DStreams,
important Windowed Operators
Windowed Operators and their use
Stateful Operators.important

Windowed Operators Learning Outcome: After finishing this module you will learn to execute Twitter sentiment analysis, Kafka-Spark Streaming, streaming using Netcat server, and Spark-Flume Streaming.

Hadoop Administration: Configuration of a cluster

Topics:

Introduction to Hadoop configuration
Various parameters to be followed in the configuration process
Importance of Hadoop configuration file
Hadoop environment setup
MapReduce parameters
HDFS parameters The process to include and exclude
Data node directory structures
Overview of the File system image Understanding Edit log

Learning Outcome: In this chapter, you will gain hands-on exposure in executing performance tuning in MapReduce.

Hadoop Administration: Using Amazon EC@ instance to setup Multi-node cluster

Topics:

Setting up 4 node cluster
Running MapReduce code
Running MapReduce jobs
Working with cloud manager setup

Learning Outcome: By the completion of this chapter you will gain hands-on expertise in building a multi-node Hadoop cluster and working knowledge of cloud managers.

Hadoop Administration: Management, Monitoring and Troubleshooting

Topics:

Basics of checkpoint procedure
Failure of Name node
Procedure to recover failed node
Metadata and Data backup,
Safe Mode, Different problems and solutions Adding and removing nodes

Learning Outcome: Upon the completion of this chapter, you will learn the process to recover the MapReduce File system, Hadoop cluster monitoring, Usage of job scheduler to schedule jobs, Fair Scheduler and process to its configuration, FIFO schedule and MapReduce job submission flow.

Frequently Asked Questions

Give me some information about your instructors?

Our workforce comprises of experts and certified professionals having years of experience in related technology. All professionals are having in depth knowledge and are always ready to tackle your queries.

Are assignments provided during online training?

As you receive assignments during offline classes same way online assignments are also provided by certified professionals.

Who can opt for online training program?

Anyone who is having passion for learning new technologies can choose our online training programs. Training is perfect for candidates interested in enhancing their career by grabbing knowledge about latest technologies.

Is there any specific software requirement for starting your online training course?

No specific software is required for staring our training. You only need a computer with internet connection and you can easily access our course from any part of world.

What options are available for payment of fee from abroad?

International candidates can easily pay their fees with various online options like Xoom, money bookers etc. For more details you can check payments option available in our site.

Do you offer study material for various training programs?

Yes, we provide study materials according to course you have selected. We offer soft copies containing complete information of subject you have opted.

What are benefits of online training?

With online training you can get vast knowledge about latest technologies while saving your precious money and time.