Who should attend
Big Data, Business Intelligence and Business Analyst Professionals, Information Architects, Statisticians, Developers looking to master Machine Learning and Predictive Analytics and those looking to take up the roles of Data Scientist and Machine Learning Experts
What are the prerequisites for learning Data Science?
There are no particular prerequisites for this training course. If you love mathematics, it is helpful to learn Data Science. You will also get MS Excel self-paced course free with this course.
About the course
Intellipaat Data Science course training lets you master data analysis, R statistical computing, connecting R with Hadoop framework, Machine Learning algorithms, time-series analysis, K-Means Clustering, Naïve Bayes, business analytics and more. In this Data science online course and certification, you will gain hands-on experience in Data Science by engaging in several real-life projects in domains of banking, finance, entertainment, e-commerce, etc. So, get the best online Data Science courses training from top data scientists!
- Instructor Led Training : 42 Hrs
- Self-paced Videos : 28 Hrs
- Exercises & Project Work : 56 Hrs
- Certification and Job Assistance
- Flexible Schedule
- Lifetime free upgrade
- 24 x 7 Lifetime Support & Access
About Data Science Online Course
This is a complete Data Science boot camp specialization training course from Intellipaat that provides you with detailed learning in Data Science, Data Analytics, project life cycle, data acquisition, analysis, statistical methods and Machine Learning. You will gain expertise to deploy Recommenders using R programming, and you will also learn data analysis, data transformation, experimentation and evaluation.
What will you learn in this Data Science course online training?
- Data Science introduction and importance
- Data acquisition and Data Science life cycle
- Experimentation, evaluation and project deployment tools
- Different algorithms used in Machine Learning
- Predictive analytics and segmentation using clustering
- Big Data fundamentals and Hadoop integration with R
- Data Scientist roles and responsibilities
- Deploying recommender systems on real-world data sets
- Working on data mining, data structures and data manipulation
Why should you take up the Data Scientist certification course online?
- Data Scientist is the best job of the 21st century – Harvard Business Review
- Global Big Data market to reach $122 billion in revenue in six years – Frost & Sullivan
- The number of jobs for all the US Data Professionals will increase to 2.7 million per year – IBM
The demand for Data Scientists far outstrips the supply of them. This is a serious problem in a data-driven world that we are living in today. Most of the organizations are ready to pay top-dollar salaries for professionals with the right Data Science skills. This Data Science course online will provide you with all skills needed to master Data Science along with Big Data, Data Analytics and R programming. All this means that you can fast-track your career to take on more lucrative and promising job roles and take your career to the next level.
What are the different paths to enter Data Science?
There are multiple paths to becoming a Data Scientist. There are a set of tools that are being extensively used by a Data Scientist like the programming languages of R and Python, along with the analytical tools like SAS and others. The person should be well aware of data analytics and statistical packages. He should also be aware of Big Data Hadoop and Spark which can be very useful for a Data Scientist. When the data is converted into business insights, the Data Scientist is supposed to have a good knowledge of various visualization and reporting tools. He should be firmly grounded in various aspects such as coming up with compelling visualizations, charts, maps and reports that can help anybody to understand the data.
How is Intellipaat Data Science Certification awarded?
Intellipaat follows a rigorous certification process. To become a certified Data Scientist, you must fulfil the following criteria:
Online Instructor-led Course
- Successful completion of all projects, which will be evaluated by trainers
- Scoring minimum 60% in the Data Science quiz conducted by Intellipaat
- Completing all course videos in our LMS
- Scoring minimum 60% in the Data Science quiz conducted by Intellipaat
What does a Data Scientist do?
Understand the Problem
A Data Scientist should learn about the issue at ground and ask the right questions.
Collect Enough Data
As the name implies, a Data Scientist has to collect enough data in order to make sense of the problem at hand and get a better grip of the issue with respect to the time, money and resources needed.
Process the Raw Data
Data can rarely be used in its original form. It needs to be processed, and there exist various methods to convert it into a usable format.
Explore the Data
After the data has been processed and converted into a form that can then be used in the later stages, the Data Scientist need to explore it further so as to get the characteristics of the data and find out more about the obvious trends, correlation and more.
Analyze the Data
This is where the magic happens. The Data Scientist deploys various arsenals in his repository like Machine Learning, statistics and probability, linear and logistic regression, time-series analysis and more in order to make sense of the data.
Communicate the Results
At the end of the entire process, there is a need to communicate the findings to the right stakeholders in order to get the groundwork done for all recognized issues.
Data Science Course Content
Introduction to Data Science with R
What is Data Science, significance of Data Science in today’s digitally-driven world, applications of Data Science, lifecycle of Data Science, components of the Data Science lifecycle, introduction to big data and Hadoop, introduction to Machine Learning and Deep Learning, introduction to R programming and R Studio.
Hands-on Exercise – Installation of R Studio, implementing simple mathematical operations and logic using R operators, loops, if statements and switch cases.
Introduction to data exploration, importing and exporting data to/from external sources, what is data exploratory analysis, data importing, dataframes, working with dataframes, accessing individual elements, vectors and factors, operators, in-built functions, conditional, looping statements and user-defined functions, matrix, list and array.
Hands-on Exercise – Accessing individual elements of customer churn data, modifying and extracting the results from the dataset using user-defined functions in R.
Need for Data Manipulation, Introduction to dplyr package, Selecting one or more columns with select() function, Filtering out records on the basis of a condition with filter() function, Adding new columns with the mutate() function, Sampling & Counting with sample_n(), sample_frac() & count() functions, Getting summarized results with the summarise() function, Combining different functions with the pipe operator, Implementing sql like operations with sqldf.
Hands-on Exercise – Implementing dplyr to perform various operations for abstracting over how data is manipulated and stored.
Introduction to visualization, Different types of graphs, Introduction to grammar of graphics & ggplot2 package, Understanding categorical distribution with geom_bar() function, understanding numerical distribution with geom_hist() function, building frequency polygons with geom_freqpoly(), making a scatter-plot with geom_pont() function, multivariate analysis with geom_boxplot, univariate Analysis with Bar-plot, histogram and Density Plot, multivariate distribution, Bar-plots for categorical variables using geom_bar(), adding themes with the theme() layer, visualization with plotly package & building web applications with shinyR, frequency-plots with geom_freqpoly(), multivariate distribution with scatter-plots and smooth lines, continuous vs categorical with box-plots, subgrouping the plots, working with co-ordinates and themes to make the graphs more presentable, Intro to plotly & various plots, visualization with ggvis package, geographic visualization with ggmap(), building web applications with shinyR.
Hands-on Exercise – Creating data visualization to understand the customer churn ratio using charts using ggplot2, Plotly for importing and analyzing data into grids. You will visualize tenure, monthly charges, total charges and other individual columns by using the scatter plot.
Introduction to Statistics
Why do we need Statistics?, Categories of Statistics, Statistical Terminologies,Types of Data, Measures of Central Tendency, Measures of Spread, Correlation & Covariance,Standardization & Normalization,Probability & Types of Probability, Hypothesis Testing, Chi-Square testing, ANOVA, normal distribution, binary distribution.
Hands-on Exercise – Building a statistical analysis model that uses quantifications, representations, experimental data for gathering, reviewing, analyzing and drawing conclusions from data.
Introduction to Machine Learning, introduction to Linear Regression, predictive modeling with Linear Regression, simple Linear and multiple Linear Regression, concepts and formulas, assumptions and residual diagnostics in Linear Regression, building simple linear model, predicting results and finding p-value, introduction to logistic regression, comparing linear regression and logistics regression, bivariate & multi-variate logistic regression, confusion matrix & accuracy of model, threshold evaluation with ROCR, Linear Regression concepts and detailed formulas, various assumptions of Linear Regression,residuals, qqnorm(), qqline(), understanding the fit of the model, building simple linear model, predicting results and finding p-value, understanding the summary results with Null Hypothesis, p-value & F-statistic, building linear models with multiple independent variables.
Hands-on Exercise – Modeling the relationship within the data using linear predictor functions. Implementing Linear & Logistics Regression in R by building model with ‘tenure’ as dependent variable and multiple independent variables.
Introduction to Logistic Regression, Logistic Regression Concepts, Linear vs Logistic regression, math behind Logistic Regression, detailed formulas, logit function and odds, Bi-variate logistic Regression, Poisson Regression, building simple “binomial” model and predicting result, confusion matrix and Accuracy, true positive rate, false positive rate, and confusion matrix for evaluating built model, threshold evaluation with ROCR, finding the right threshold by building the ROC plot, cross validation & multivariate logistic regression, building logistic models with multiple independent variables, real-life applications of Logistic Regression.
Hands-on Exercise – Implementing predictive analytics by describing the data and explaining the relationship between one dependent binary variable and one or more binary variables. You will use glm() to build a model and use ‘Churn’ as the dependent variable.
Decision Trees & Random Forest
What is classification and different classification techniques, introduction to Decision Tree, algorithm for decision tree induction, building a decision tree in R, creating a perfect Decision Tree, Confusion Matrix, Regression trees vs Classification trees, introduction to ensemble of trees and bagging, Random Forest concept, implementing Random Forest in R, what is Naive Bayes, Computing Probabilities, Impurity Function – Entropy, understand the concept of information gain for right split of node, Impurity Function – Information gain, understand the concept of Gini index for right split of node, Impurity Function – Gini index, understand the concept of Entropy for right split of node, overfitting & pruning, pre-pruning, post-pruning, cost-complexity pruning, pruning decision tree and predicting values, find the right no of trees and evaluate performance metrics.
Hands-on Exercise – Implementing Random Forest for both regression and classification problems. You will build a tree, prune it by using ‘churn’ as the dependent variable and build a Random Forest with the right number of trees, using ROCR for performance metrics.
What is Clustering & it’s Use Cases, what is K-means Clustering, what is Canopy Clustering, what is Hierarchical Clustering, introduction to Unsupervised Learning, feature extraction & clustering algorithms, k-means clustering algorithm, Theoretical aspects of k-means, and k-means process flow, K-means in R, implementing K-means on the data-set and finding the right no. of clusters using Scree-plot, hierarchical clustering & Dendogram, understand Hierarchical clustering, implement it in R and have a look at Dendograms, Principal Component Analysis, explanation of Principal Component Analysis in detail, PCA in R, implementing PCA in R.
Hands-on Exercise – Deploying unsupervised learning with R to achieve clustering and dimensionality reduction, K-means clustering for visualizing and interpreting results for the customer churn data.
Association Rule Mining & Recommendation Engine
Introduction to association rule Mining & Market Basket Analysis, measures of Association Rule Mining: Support, Confidence, Lift, Apriori algorithm & implementing it in R, Introduction to Recommendation Engine, user-based collaborative filtering & Item-Based Collaborative Filtering, implementing Recommendation Engine in R, user-Based and item-Based, Recommendation Use-cases.
Hands-on Exercise – Deploying association analysis as a rule-based machine learning method, identifying strong rules discovered in databases with measures based on interesting discoveries.
Introduction to Artificial Intelligence
Introducing Artificial Intelligence and Deep Learning, what is an Artificial Neural Network, TensorFlow – computational framework for building AI models, fundamentals of building ANN using TensorFlow, working with TensorFlow in R.
Time Series Analysis
What is Time Series, techniques and applications, components of Time Series, moving average, smoothing techniques, exponential smoothing, univariate time series models, multivariate time series analysis, Arima model, Time Series in R, sentiment analysis in R (Twitter sentiment analysis), text analysis.
Hands-on Exercise – Analyzing time series data, sequence of measurements that follow a non-random order to identify the nature of phenomenon and to forecast the future values in the series.
Support Vector Machine - (SVM)
Introduction to Support Vector Machine (SVM), Data classification using SVM, SVM Algorithms using Separable and Inseparable cases, Linear SVM for identifying margin hyperplane.
What is Bayes theorem, What is Naïve Bayes Classifier, Classification Workflow, How Naive Bayes classifier works, Classifier building in Scikit-learn, building a probabilistic classification model using Naïve Bayes, Zero Probability Problem.
Introduction to concepts of Text Mining, Text Mining use cases, understanding and manipulating text with ‘tm’ & ‘stringR’, Text Mining Algorithms, Quantification of Text, Term Frequency-Inverse Document Frequency (TF-IDF), After TF-IDF.
- The Market Basket Analysis (MBA) case study
This case study is associated with the modeling technique of Market Basket Analysis where you will learn about loading of data, various techniques for plotting the items and running the algorithms. It includes finding out what are the items that go hand in hand and hence can be clubbed together. This is used for various real world scenarios like a supermarket shopping cart and so on.
- Logistic Regression Case Study
In this case study you will get a detailed understanding of the advertisement spends of a company that will help to drive more sales. You will deploy logistic regression to forecast the future trends, detect patterns, uncover insights and more all through the power of R programming. Due to this the future advertisement spends can be decided and optimized for higher revenues.
- Multiple Regression Case Study
You will understand how to compare the miles per gallon (MPG) of a car based on the various parameters. You will deploy multiple regression and note down the MPG for car make, model, speed, load conditions, etc. It includes the model building, model diagnostic, checking the ROC curve, among other things.
- Receiver Operating Characteristic (ROC) case study
You will work with various data sets in R, deploy data exploration methodologies, build scalable models, predict the outcome with highest precision, diagnose the model that you have created with various real world data, check the ROC curve and more.
Data Science Projects
What projects I will be working in this Data Science certification course?
Project 1 : Augmenting retail sales with Data Science
Industry : Retail
Problem Statement : How to deploy the various rules and algorithms of Data Science for analyzing stationary store purchase data.
Topics : In this project you will deploy the various tools of Data Science like association rule, Apriori algorithm in R, support, lift and confidence of association rule. You will analyze the purchase data of the stationary outlet for three days and understand the customer buying patterns across products.
- Association rules for transaction data
- Association mining with Apriori algorithm
- Generating rules and identifying patterns.
Project 2 : Analyzing pre-paid model of stock broking
Industry : Finance
Problem Statement : Finding out the deciding factor for people to opt for the pre-paid model of stock broking.
Topics : In this Data Science project you will learn about the various variables that are highly correlated in pre-paid brokerage model, analysis of various market opportunities, developing targeted promotion plans for various products sold under various categories. You will also do competitor analysis, the advantages and disadvantages of pre-paid model.
- Deploying the rules of statistical analysis
- Implementing data visualization
- Linear regression for predictive modeling.
Project 3 : Cold Start Problem in Data Science
Industry : Ecommerce
Problem Statement : how to build a recommender system without the historical data available
Topics : This project involves understanding of the cold start problem associated with the recommender systems. You will gain hands-on experience in information filtering, working on systems with zero historical data to refer to, as in the case of launching a new product. You will gain proficiency in working with personalized applications like movies, books, songs, news and such other recommendations. This project includes the various ways of working with algorithms and deploying other data science techniques.
- Algorithms for Recommender
- Ways of Recommendation
- Types of Recommendation -Collaborative Filtering Based Recommendation, Content-Based Recommendation
- Complete mastery in working with the Cold Start Problem.
Project 4 : Recommendation for Movie, Summary
Topics : This is real world project that gives you hands-on experience in working with a movie recommender system. Depending on what movies are liked by a particular user, you will be in a position to provide data-driven recommendations. This project involves understanding recommender systems, information filtering, predicting ‘rating’, learning about user ‘preference’ and so on. You will exclusively work on data related to user details, movie details and others. The main components of the project include the following:
- Recommendation for movie
- Two Types of Predictions – Rating Prediction, Item Prediction
- Important Approaches: Memory Based and Model-Based
- Knowing User Based Methods in K-Nearest Neighbor
- Understanding Item Based Method
- Matrix Factorization
- Decomposition of Singular Value
- Data Science Project discussion
- Collaboration Filtering
- Business Variables Overview
Project 5 : Prediction on Pokemon dataset
Problem Statement :For the purpose of this case study, you are a Pokemon trainer who is on his way to catch all the 800 Pokemons
Topics :This real-world project will give you a hands-on experience on the data science life cycle. You’ll understand the structure of the ‘Pokemon’ dataset & use machine learning algorithms to make some predictions. You will use the dplyr package to filter out specific Pokemons and use decision trees to find if the Pokemon is legendary or not.
- dplyr package to filter Pokemons
- Decision Tree algorithm
- Linear regression algorithm.
Project 6 : Book Recommender System
Problem Statement :Building a book recommender system for readers with similar interests
Topics :This real-world project will give you a hands-on experience in working with a book recommender system. Depending on what books are read by a particular user, you will be in a position to provide data-driven recommendations. You will understand the structure of the data and visualize it to find interesting patterns.
- Data analysis & visualization
- Recommender Lab
- User Based Collaborative Filtering Model.
Project 7: Census Income
Problem Statement: In this project, you will process the data and then develop an understanding of different features of the data by performing explanatory analysis and creating the visualizations. After having enough knowledge about the attributes, you will perform a predictive task of classification to predict whether an individual makes over 50K a year or less by using different Machine Learning Algorithms.
Topics: An end-to-end exhaustive project comprising topics in:
- Data Processing
- Data Manipulation
- Data Visualization
- Linear Regression
- Logistic Regression
- Decision Tree
- Random Forest
Project 8: Loan Prediction
Problem Statement: You are the Senior Data Scientist at a major private bank. Since the last 6 months, the number of customers who are not able to repay their loan has increased. Keeping this in mind, you have to look at your customer data and analyse which customers should be given the loan approval and which customers should be denied.
Topics: An exhaustive project on Customer_loan Dataset comprising topics in:
- Data Processing
- Model Building
Project 9: Capstone
Problem Statement: Predicting if the customer will churn or not.
Topics: An end-to-end capstone project comprising:
- Manipulating and envisioning the data for insights.
- Implementing the linear regression model to predict continuous values.
- Implementing classification models – decision tree, logistic regression, and random forest on “customer churn”.
An end-to-end capstone project covering all the modules. You’ll start off by manipulating and visualizing the data to get interesting insights. Then you’d have to implement the linear regression model to predict continuous values. Following which you’ll implement these classification models – logistic regression, decision tree & random forest on the “customer churn” data frame to find if the customer will churn or not.
An experienced Blockchain Professional who has been bringing integrated Blockchain, particularly Hyperledger and Ethereum, and Big Data solutions to the cloud, David Callaghan has previously worked on Hadoop, AWS Cloud, Big Data and Pentaho projects that have had major impact on revenues of marqu...
A Senior Software Architect at NextGen Healthcare who has previously worked with IBM Corporation, Suresh Paritala has worked on Big Data, Data Science, Advanced Analytics, Internet of Things and Azure, along with AI domains like Machine Learning and Deep Learning. He has successfully implemented ...
A renowned Data Scientist who has worked with Google and is currently working at ASCAP, Samanth Reddy has a proven ability to develop Data Science strategies that have a high impact on the revenues of various organizations. He comes with strong Data Science expertise and has created decisive Data...
Videos and materials
Because of COVID-19, many providers are cancelling or postponing in-person programs or providing online participation options.
We are happy to help you find a suitable online alternative.