Course Curriculum
Module 1: Introduction to Data Science
In this module, you are going to learn about the various Data Science concepts, as follows:
- Introduction to Data Science, Importance of data science, applications, lifecycle, and components.
- Big data Hadoop, machine learning and deep learning
- Introduction to R programming and R studio.
Module 2: Data Exploration
- Introduction to data exploration
- Importing and exporting data to and from external sources
- Data exploratory analysis
- Data Frames, factors, loops, operators, conditional and looping statements, user-defined functions and data types, etc.
Module 3: Data Manipulation
- Introduction to data manipulation
- Introduction to dplyr package
- Briefing about functions and combining different features with the pipe operator
- Implementing SQL operations with sqldf.
Module 4: Data Visualization
- Introduction to data visualization
- Explaining different graphs functions and it’s implementations
- Multivariate analysis with geom_boxplot
- Univariate analysis with a barplot
- Creation of barplots
- Visualization with Plotly
- Working with themes and coordinates to present graphs more visually and clearly.
- Geographic visualization with ggmap() and building applications with shinyR.
Module 5: Introduction to Statistics
- Need for statistics
- Categories of statistics
- Correlation and covariance, standardization, normalization, normal distribution, chi-square testing, ANOVA, and binary distribution.
Module 6: Logistic Regression
- Introduction to logistic regression
- Concepts of logistic regression
- Building a simple binomial model
- Finding out the right threshold using the ROC plot
- Real-time applications of logistic regression.
Module 7: Machine Learning
- Introduction to machine learning, linear regression, predictive modeling, formulas, assumptions, and building a simple linear model
- Introduction to logistic regression
- Comparison of different types of regressions.
- Confusion matrix the accuracy of a model, threshold evaluation with ROCR, and understanding qqnorm() and qqline()
- Building linear models with multiple independent variables.
Module 8: Decision Trees and random forest
- Classification and it’s techniques
- Introduction to decision trees, algorithm and building a decision tree in R
- Confusion matrix, regression trees vs. classification
- trees Introduction to bagging
- Random Forest and its implementation in R
- Naive Bayes and it’s computing possibilities
- Concepts of Impurity function, Entropy, Gini index, and Information gain for the right split of node
- Overfitting, pruning basics, finding out the correct number of trees, and evaluating performance metrics.
Module 9: Unsupervised Learning
- Clustering, types and use cases
- Introduction to unsupervised learning
- Feature extraction, clustering algorithm, and k-means clustering algorithm
- Briefing about k-means and its implementation
- Explaining the Principal Component Analysis (PCA) in detail and implementing PCA in R
Module 10: Recommendation engines and Association rule mining
- Introduction to association rule mining, advantages, types, measuring the association
- rule mining and implementation.
- Introduction to recommendation engines
- How are recommendation engines implemented in R?
- Recommendation engines use cases. Summary
Module 11: Time Series Analysis
- What is a time series?
- Techniques, applications, and components of the time series
- ARIMA model
- Time series in R, sentiment analysis in R (Twitter sentiment analysis), and text analysis
Module 12: Introduction to AI
- AI and deep learning
- Fundamental of Artificial Neural Networks
- Tensor Flow
- Computational frameworks for building AI models
- Fundamental of tensorflow and working it with R
Module 13: Naive Bayes
- What is the Bayes theorem?, Naïve Bayes Classifier?
- How Naive Bayes classifier works and classifier building in Scikit-Learn.
- classification model using Naïve Bayes and the zero probability problem
Module 14: Text Mining
- Introduction to text mining, use cases, and understanding and manipulating the text with ‘tm’ and ‘stringR’.
- Text mining algorithms and the quantification of the text
- TF-IDF and after TF-IDF