PySpark Training Certification

Teacher

keentechnologies

Free

Prerequisites

To apply for the PySpark Training Certification, you need to either:

You must have basic knowledge of Big data.
It will be beneficial if you have basic Python Programming skills
Having basic skills in Data Analytics will be an added advantage.

Course Curriculum

Module 1: Python

Environment Setup
Decision Making
Loops and Number
Strings
Lists
Tuples
Dictionary
Date and Time
Regex
Functions
OOPS
Files I/O
Exceptions
SET
Lambda
Map and filter

Module 2: Hadoop distributed File System(HDFS)

What is HDFS ?
How the data stored in HDFS ?
What is BLOCK ?
Replication Factor in HDFS ?
Command in HDFS ?

Module 3: Pyspark

What is Hadoop platform Why Hadoop platform What is Spark
Why spark Evolution of Spark
Hadoop Vs Spark (Spark Benefits )
Architecture of Spark Define Spark Components Lazy Evaluation
Spark-shell spark submit
Setting Up memory (Driver Memory
Executor Memory)
Setting Up Cores (Executors Core) Running Spark in Local
Hadoop Map Reduce VS Spark RDD
Benefits Of RDD Over Hadoop Map Reduce
RDD overview Transformations and actions in the context of RDDs.
Demonstrate Each Api’s of RDD
With Real Time Example(Like:cache
uncancahe
count
filter
map etc)
Magic With Data frames
Overview Of data frames
Read a CSV/Excel Files And create a data frame.
Cache/Uncahe Operations On data frames.
Persist/UnPersist Operations On data frames.
Partition and repartition Concepts of data frames.
For each Partitions On Data frames.
Programming using data frame. How to use data frames Api’s effectually.
A magic spark Job using data frame concept.(small project)
Schema Defining on from data frame How to perform SQL operations On data frame.
Check Point in data frame.
StructType and arrayType in data frames
Complex Data Structure on data frame

Module 4: Various data sources

CSV files Excel Files JSON Files Parquet file
Benefits of Parquet file Text Files

Module 5: Various levels of persistence

MEMORY_ONLY
MEMORY_ONLY_SER
MEMORY_AND_DISK
MEMORY_AND_DISK_SER
DISK_ONLY
OFF_HEAP

Module 6: User Define Functions

Benefits of UDF’s over SQL Writing the UDF’s and applying on to the data frame
Complex UDF’s
Data cleaning Using UDF’s

Module 7: Connecting Spark With S3

Connect spark with s3
Read a file from s3 and perform Transformation
Write a File to the s3 Preparation and close while
Writing the file to the s3

Module 9: PostgreSQL

Overview of PostgreSQL
How to connect spark with PostgreSQL
Collection concepts of PostgreSQL
Doing operation in spark
Writing various keys to the redis using PostgreSQL

Module 8: MySQL Database

Overview of mysql database and benefits.
Partition Key and collection concepts in mysql Connecting mysql with spark
Read a table from mysql and perform transformations.
Writing data to a mysql table with millions of data

Module 10: Spark SQL

Overview of Spark SQL.
How to write SQL in Spark.
Various types of Clause in Spark SQL
Using UDF’s inside Spark SQL SQL Fine Tuning using Spark

Module 11: Data cleaning

What are the data column types?
How many fields match thedata type?
How many fields are mismatches?
Which fields are matches?
Which fields are mismatches?

Module 12: Pyspark HIVE connectivity

Pyspark HIVE_READ_Table
Pyspark HIVE Write Table
Pyspark Hive Checkpoint

Module 14: Pyspark broadcast and accumulator

Pyspark broadcast
Pyspark accumulator

Module 13: Pyspark Array Type Column and operation in Pyspark

Module 15: Pyspark storage level

Module 16: Pyspark mlib library

Module 17: Pyspark structure streaming

Module 18: Conclusion

Summarize all the points discussed.

Frequently Asked Questions

Give me some information about your instructors?

Our workforce comprises of experts and certified professionals having years of experience in related technology. All professionals are having in depth knowledge and are always ready to tackle your queries.

Are assignments provided during online training?

As you receive assignments during offline classes same way online assignments are also provided by certified professionals.

Who can opt for online training program?

Anyone who is having passion for learning new technologies can choose our online training programs. Training is perfect for candidates interested in enhancing their career by grabbing knowledge about latest technologies.

Is there any specific software requirement for starting your online training course?

No specific software is required for staring our training. You only need a computer with internet connection and you can easily access our course from any part of world.

What options are available for payment of fee from abroad?

International candidates can easily pay their fees with various online options like Xoom, money bookers etc. For more details you can check payments option available in our site.

Do you offer study material for various training programs?

Yes, we provide study materials according to course you have selected. We offer soft copies containing complete information of subject you have opted.

What are benefits of online training?

With online training you can get vast knowledge about latest technologies while saving your precious money and time.