List of Courses Included
Online Instructor-led Courses:
- Data Science with R
- Python for Data Science
- Machine Learning
- Artificial Intelligence and Deep Learning with TensorFlow
- Big Data Hadoop & Spark
- Tableau Desktop 10
- Data Science with SAS
- Advanced Excel
What will you learn in this MS in Data Science program?
In this Data Science graduate program, you will learn about
- MapReduce and HDFS
- Real-time analytics with Spark
- Data Scientist roles and responsibilities
- Prediction and analysis through clustering
- Deploying the recommender system
- SAS advanced analytics and R programming
- Linear and logistic regression
- Making sense of NoSQL data
- Deep Learning model in AI
Who should take up this top Data Science master’s program?
- Data Scientists, Machine Learning Professionals and Software Developers
- Business Intelligence Professionals, Information Architects and Project Managers
- Those looking to be a Data Science Architect
What are the prerequisites for taking up this master’s in Data Science training course?
There are no prerequisites for taking up this Data Science graduate program.
Why should you take up this best online MS in Data Science course?
- Data Scientist is the best job of the 21st century – Harvard Business Review
- Global Big Data market to reach $122 billion in revenue by 2025 – Frost & Sullivan
This best Data Science master’s program has been created keeping in mind the needs of the industry when it comes to the domain of Data Science. Today’s Data Scientists need to have a diverse set of skills which include working with huge volumes of data, parsing that data and converting them into a format that is easily understandable, using which business insights can be derived. This training program lets you play multiple roles in the Big Data and Data Science domains and get hired for top-notch salaries.
Data Science with R
Module 01 – Introduction to Data Science with R
1.1 What is Data Science?
1.2 Significance of Data Science in today’s data-driven world, its applications of, , lifecycle, and its components
1.3 Introduction to R programming and RStudio
1. Installation of RStudio
2. Implementing simple mathematical operations and logic using R operators, loops, if statements, and switch cases
Module 02 – Data Exploration
2.1 Introduction to data exploration
2.2 Importing and exporting data to/from external sources
2.3 What are data exploratory analysis and data importing?
2.4 DataFrames, working with them, accessing individual elements, vectors, factors, operators, in-built functions, conditional and looping statements, user-defined functions, and data types
1. Accessing individual elements of customer churn data
2. Modifying and extracting results from the dataset using user-defined functions in R
Module 03 – Data Manipulation
3.1 Need for data manipulation
3.2 Introduction to the dplyr package
3.3 Selecting one or more columns with select(), filtering records on the basis of a condition with filter(), adding new columns with mutate(), sampling, and counting
3.4 Combining different functions with the pipe operator and implementing SQL-like operations with sqldf
1. Implementing dplyr
2. Performing various operations for manipulating data and storing it
Module 04 – Data Visualization
4.1 Introduction to visualization
4.2 Different types of graphs, the grammar of graphics, the ggplot2 package, categorical distribution with geom_bar(), numerical distribution with geom_hist(), building frequency polygons with geom_freqpoly(), and making a scatterplot with geom_pont()
4.3 Multivariate analysis with geom_boxplot
4.4 Univariate analysis with a barplot, a histogram and a density plot, and multivariate distribution
4.5 Creating barplots for categorical variables using geom_bar(), and adding themes with the theme() layer
4.6 Visualization with plotly, frequency plots with geom_freqpoly(), multivariate distribution with scatter plots and smooth lines, continuous distribution vs categorical distribution with box-plots, and sub grouping plots
4.7 Working with co-ordinates and themes to make graphs more presentable, understanding plotly and various plots, and visualization with ggvis
4.8 Geographic visualization with ggmap() and building web applications with shinyR
1. Creating data visualization to understand the customer churn ratio using ggplot2 charts
2. Using plotly for importing and analyzing data
3. Visualizing tenure, monthly charges, total charges, and other individual columns using a scatter plot
Module 05 – Introduction to Statistics
5.1 Why do we need statistics?
5.2 Categories of statistics, statistical terminology, types of data, measures of central tendency, and measures of spread
5.3 Correlation and covariance, standardization and normalization, probability and the types, hypothesis testing, chi-square testing, ANOVA, normal distribution, and binary distribution
1. Building a statistical analysis model that uses quantification, representations, and experimental data
2. Reviewing, analyzing, and drawing conclusions from the data
Module 06 – Machine Learning
6.1 Introduction to Machine Learning
6.2 Introduction to linear regression, predictive modeling, simple linear regression vs multiple linear regression, concepts, formulas, assumptions, and residuals in Linear Regression, and building a simple linear model
6.3 Predicting results and finding the p-value and an introduction to logistic regression
6.4 Comparing linear regression with logistics regression and bivariate logistic regression with multivariate logistic regression
6.5 Confusion matrix the accuracy of a model, understanding the fit of the model, threshold evaluation with ROCR, and using qqnorm() and qqline()
6.6 Understanding the summary results with null hypothesis, F-statistic, and
building linear models with multiple independent variables
1. Modeling the relationship within data using linear predictor functions
2. Implementing linear and logistics regression in R by building a model with ‘tenure’ as the dependent variable
Module 07 – Logistic Regression
7.1 Introduction to logistic regression
7.2 Logistic regression concepts, linear vs logistic regression, and math behind logistic regression
7.3 Detailed formulas, logit function and odds, bivariate logistic regression, and Poisson regression
7.4 Building a simple binomial model and predicting the result, making a confusion matrix for evaluating the accuracy, true positive rate, false positive rate, and threshold evaluation with ROCR
7.5 Finding out the right threshold by building the ROC plot, cross validation, multivariate logistic regression, and building logistic models with multiple independent variables
7.6 Real-life applications of logistic regression
1. Implementing predictive analytics by describing data
2. Explaining the relationship between one dependent binary variable and one or more binary variables
3. Using glm() to build a model, with ‘Churn’ as the dependent variable
Module 08 – Decision Trees and Random Forest
8.1 What is classification? Different classification techniques
8.2 Introduction to decision trees
8.3 Algorithm for decision tree induction and building a decision tree in R
8.4 Confusion matrix and regression trees vs classification trees
8.5 Introduction to bagging
8.6 Random forest and implementing it in R
8.7 What is Naive Bayes? Computing probabilities
8.8 Understanding the concepts of Impurity function, Entropy, Gini index, and Information gain for the right split of node
8.9 Overfitting, pruning, pre-pruning, post-pruning, and cost-complexity pruning, pruning a decision tree and predicting values, finding out the right number of trees, and evaluating performance metrics
1. Implementing random forest for both regression and classification problems
2. Building a tree, pruning it using ‘churn’ as the dependent variable, and building a random forest with the right number of trees
3. Using ROCR for performance metrics
Module 09 – Unsupervised Learning
9.1 What is Clustering? Its use cases
9.2 what is k-means clustering? What is canopy clustering?
9.3 What is hierarchical clustering?
9.4 Introduction to unsupervised learning
9.5 Feature extraction, clustering algorithms, and the k-means clustering algorithm
9.6 Theoretical aspects of k-means, k-means process flow, k-means in R, implementing k-means, and finding out the right number of clusters using a scree plot
9.7 Dendograms, understanding hierarchical clustering, and implementing it in R
9.8 Explanation of Principal Component Analysis (PCA) in detail and implementing PCA in R
1. Deploying unsupervised learning with R to achieve clustering and dimensionality reduction
2. K-means clustering for visualizing and interpreting results for the customer churn data
Module 10 – Association Rule Mining and Recommendation Engines
10.1 Introduction to association rule mining and MBA
10.2 Measures of association rule mining: Support, confidence, lift, and apriori algorithm, and implementing them in R
10.3 Introduction to recommendation engines
10.4 User-based collaborative filtering and item-based collaborative filtering, and implementing a recommendation engine in R
10.5 Recommendation engine use cases
1. Deploying association analysis as a rule-based Machine Learning method
2. Identifying strong rules discovered in databases with measures based on interesting discoveries
Self-paced Course Content
Module 11 – Introduction to Artificial Intelligence
11.1 Introducing Artificial Intelligence and Deep Learning
11.2 What is an artificial neural network? TensorFlow: The computational framework for building AI models
11.3 Fundamentals of building ANN using TensorFlow and working with TensorFlow in R
Module 12 – Time Series Analysis
12.1 What is a time series? The techniques, applications, and components of time series
12.2 Moving average, smoothing techniques, and exponential smoothing
12.3 Univariate time series models and multivariate time series analysis
12.4 ARIMA model
12.5 Time series in R, sentiment analysis in R (Twitter sentiment analysis), and text analysis
1. Analyzing time series data
2. Analyzing the sequence of measurements that follow a non-random order to identify the nature of phenomenon and forecast the future values in the series
Module 13 – Support Vector Machine (SVM)
13.1 Introduction to Support Vector Machine (SVM)
13.2 Data classification using SVM
13.3 SVM algorithms using separable and inseparable cases
13.4 Linear SVM for identifying margin hyperplane
Module 14 – Naïve Bayes
14.1 What is the Bayes theorem?
14.2 What is Naïve Bayes Classifier?
14.3 Classification Workflow
14.4 How Naive Bayes classifier works and classifier building in Scikit-Learn
14.5 Building a probabilistic classification model using Naïve Bayes and the zero probability problem
Module 15 – Text Mining
15.1 Introduction to the concepts of text mining
15.2 Text mining use cases and understanding and manipulating the text with ‘tm’ and ‘stringR’
15.3 Text mining algorithms and the quantification of the text
15.4 TF-IDF and after TF-IDF
Data Science Projects Covered
Market Basket Analysis
This is an inventory management project where you will find the trends in the data that will help the company to increase sales. In this project, you will be implementing association rule mining, data extraction, and data manipulation for the Market Basket Analysis.
Credit Card Fraud Detection
The project consists of data analysis for various parameters of banking dataset. You will be using a V7 predictor, V4 predictor for analysis, and data visualization for finding the probability of occurrence of fraudulent activities.
Loan Approval Prediction
In this project, you will use the banking dataset for data analysis, data cleaning, data preprocessing, and data visualization. You will implement algorithms such as Principal Component Analysis and Naive Bayes after data analysis to predict the approval rate of a loan using various parameters.
Netflix Recommendation System
Implement exploratory data analysis, data manipulation, and visualization to understand and find the trends in the Netflix dataset. You will use various Machine Learning algorithms such as association rule mining, classification algorithms, and many more to create movie recommendation systems for viewers using Netflix dataset.
Case Study 1: Introduction to R Programming
In this project, you need to work with several operators involved in R programming including relational operators, arithmetic operators, and logical operators for various organizational needs.
Case Study 2: Solving Customer Churn Using Data Exploration
Use data exploration in order to understand what needs to be done to make reductions in customer churn. In this project, you will be required to extract individual columns, use loops to work on repetitive operations, and create and implement filters for data manipulation.
Case Study 3: Creating Data Structures in R
Implement numerous data structures for numerous possible scenarios. This project requires you to create and use vectors. Further, you need to build and use metrics, utilize arrays for storing those metrics, and have knowledge of lists.
Case Study 4: Implementing SVD in R
Utilize the dataset of MovieLens to analyze and understand single value decomposition and its use in R programming. Further, in this project, you must build custom recommended movie sets for all users, develop a collaborative filtering model based on the users, and for a movie recommendation, you must create realRatingMatrix.
Case Study 5: Time Series Analysis
This project required you to perform TSA and understand ARIMA and its concepts with respect to a given scenario. Here, you will use the R programming language, ARIMA model, time series analysis, and data visualization. So, you must understand how to build an ARIMA model and fit it, find optimal parameters by plotting PACF charts, and perform various analyses to predict values.
Python for Data Science
Module 01 – Introduction to Data Science using Python
1.1 What is Data Science, what does a data scientist do
1.2 Various examples of Data Science in the industries
1.3 How Python is deployed for Data Science applications
1.4 Various steps in Data Science process like data wrangling, data exploration and selecting the model.
1.5 Introduction to Python programming language
1.6 Important Python features, how is Python different from other programming languages
1.7 Python installation, Anaconda Python distribution for Windows, Linux and Mac
1.8 How to run a sample Python script, Python IDE working mechanism
1.9 Running some Python basic commands
1.10 Python variables, data types and keywords.
Hands-on Exercise – Installing Python Anaconda for the Windows, Linux and Mac
Module 02 – Python basic constructs
2.1 Introduction to a basic construct in Python
2.2 Understanding indentation like tabs and spaces
2.3 Python built-in data types
2.4 Basic operators in Python
2.5 Loop and control statements like break, if, for, continue, else, range() and more.
Hands-on Exercise –
1.Write your first Python program
2. Write a Python function (with and without parameters)
3. Use Lambda expression
4. Write a class
5. Create a member function and a variable
6. Create an object and write a for loop to print all odd numbers
Module 03 – Maths for DS-Statistics & Probability
3.1 Central Tendency
3.3 Hypothesis Testing
3.7 Probability Definitions and Notation
3.8 Joint Probabilities
3.9 The Sum Rule, Conditional Probability, and the Product Rule
3.10 Bayes Theorem
Hands-on Exercise –
1. We will analyze both categorical data and quantitative data
2. Focusing on specific case studies to help solidify the week’s statistical concepts
Module 04 – OOPs in Python (Self paced)
4.1 Understanding the OOP paradigm like encapsulation, inheritance, polymorphism and abstraction
4.2 What are access modifiers, instances, class members
4.3 Classes and objects
4.4 Function parameter and return type functions
4.5 Lambda expressions.
Hands-on Exercise –
1. Writing a Python program and incorporating the OOP concepts
Module 05 – NumPy for mathematical computing
5.1 Introduction to mathematical computing in Python
5.2 What are arrays and matrices, array indexing, array math, Inspecting a NumPy array, NumPy array manipulation
Hands-on Exercise –
1. How to import NumPy module
2. Creating array using ND-array
3. Calculating standard deviation on array of numbers and calculating correlation between two variables.
Module 06 – SciPy for scientific computing
6.1 Introduction to SciPy, building on top of NumPy
6.2 What are the characteristics of SciPy
6.3 Various sub-packages for SciPy like Signal, Integrate, Fftpack, Cluster, Optimize, Stats and more, Bayes Theorem with SciPy.
1. Importing of SciPy
2. Applying the Bayes theorem on the given dataset.
Module 07 – Data manipulation
7.1 What is a data Manipulation. Using Pandas library
7.2 NumPy dependency of Pandas library
7.3 Series object in pandas
7.4 DataFrame in Pandas
7.5 Loading and handling data with Pandas
7.6 How to merge data objects
7.7 Concatenation and various types of joins on data objects, exploring dataset
Hands-on Exercise –
1. Doing data manipulation with Pandas by handling tabular datasets that includes variable types like float, integer, double and others.
2. Cleaning dataset, Manipulating dataset, Visualizing dataset
Module 08 – Data visualization with Matplotlib
8.1 Introduction to Matplotlib
8.2 Using Matplotlib for plotting graphs and charts like Scatter, Bar, Pie, Line, Histogram and more
8.3 Matplotlib API
Hands-on Exercise –
1. Deploying Matplotlib for creating pie, scatter, line and histogram.
2. Subplots and Pandas built-in data visualization.
Module 09 – Machine Learning using Python
9.1 Revision of topics in Python (Pandas, Matplotlib, NumPy, scikit-Learn)
9.2 Introduction to machine learning
9.3 Need of Machine learning
9.4 Types of machine learning and workflow of Machine Learning
9.5 Uses Cases in Machine Learning, its various algorithms
9.6 What is supervised learning
9.7 What is Unsupervised Learning
Hands-on Exercise –
1. Demo on ML algorithms
Module 10 – Supervised learning
10.1 What is linear regression
10.2 Step by step calculation of Linear Regression
10.3 Linear regression in Python
10.4 Logistic Regression
10.5 What is classification
10.6 Decision Tree, Confusion Matrix, Random Forest, Naïve Bayes classifier (Self paced), Support Vector Machine(self paced), XGBoost (self paced)
Hands-on Exercise – Using Python library Scikit-Learn for coming up with Random Forest algorithm to implement supervised learning.
Module 11 – Unsupervised Learning
11.1 Introduction to unsupervised learning
11.2 Use cases of unsupervised learning
11.3 What is clustering
11.4 Types of clustering(self-paced)-Exclusive clustering, Overlapping Clustering, Hierarchical Clustering(self-paced)
11.5 What is K-means clustering
11.6 Step by step calculation of k-means algorithm
11.7 Association Rule Mining(self-paced), Market Basket Analysis(self-paced), Measures in association rule mining(self-paced)-support, confidence, lift
11.8 Apriori Algorithm
Hands-on Exercise –
1. Setting up the Jupyter notebook environment
2. Loading of a dataset in Jupyter
3. Algorithms in Scikit-Learn package for performing Machine Learning techniques and training a model to search a grid.
4. Practice on k-means using Scikit
5. Practice on Apriori
Module 12 – Python integration with Spark (Self paced)
12.1 Introduction to PySpark
12.2 Who uses PySpark, need of spark with python
12.3 PySpark installation
12.4 PySpark fundamentals
12.5 Advantage over MapReduce, PySpark
12.6 Use-cases PySpark and demo.
1. Demonstrating Loops and Conditional Statements
2. Tuple – related operations, properties, list, etc.
3. List – operations, related properties
4. Set – properties, associated operations, dictionary – operations, related properties.
Module 13 – Dimensionality Reduction
13.1 Introduction to Dimensionality
13.2 Why Dimensionality Reduction
13.4 Factor Analysis
Hands-on Exercise –
Practice Dimensionality reduction Techniques : PCA, Factor Analysis, t-SNE, Random Forest, Forward and Backward feature
Module 14 – Time Series Forecasting
14.1 White Noise
14.2 AR model
14.3 MA model
14.4 ARMA model
14.5 ARIMA model
14.7 ACF & PACF
Hands-on Exercise –
1. Create AR model
2. Create MA model
3. Create ARMA model
Data Science with Python Projects
Analyzing the Trends of COVID-19 With Python
In this project, you will use Pandas to accumulate data from multiple data files, Plotly (visualization library) to create interactive visualizations, and Facebook’s Prophet library to make time series models. You will also be visualizing the prediction by combining these technologies.
Analyzing the Naming Trends Using Python
In this project, you will use Python Programming and Algorithms to understand the applications of data manipulation, extract files with useful data only, and concepts of data visualization. You will be required to analyze baby names by sorting out the top 100 birth counts.
Performing Analysis on Customer Churn Dataset
Through this project, you will be analyzing employment reliability in the telecom industry. The project will require you to work on real-time analysis of data with multiple labels, data visualization for reliability factor, visual analysis of various columns to verify, and plotting charts to substantiate the findings in total.
Analysis of movies dataset and recommendation of movies with respect to ratings. You will be working with the combined data of movies and their ratings, performing data analysis on various labels in the data, finding the distribution of different ratings in the dataset, and training the SVD for the prediction of the model.
Python Web Scraping for Data Science
In this project, you will learn web scraping using Python. You will work on Beautiful Soup, web scraping libraries, common data and page format on the web, the important kinds of objects, Navigable String, the searching tree deployment, navigation options, parser, search tree, searching by CSS class, list, function, and keyword argument.
OOPS in Python
Creating multiple methods using OOPS. You will work on methods like “check_balance’ to check the remaining balance in an account, “withdraw” to withdraw from the bank, find the distribution of different ratings in the dataset, and override the “withdraw” to ensure that the minimum balance is maintained. You will also work with Parameterization and Classes.
Working With NumPy
In this case study, you will be working with the NumPy library to solve various problems in Python. You will create 2D arrays, initialize a NumPy array of 5*5 dimensions, and perform simple arithmetic operations on the two arrays. To carry out this case study successfully, you will have to be familiar with NumPy.
Visualizing and Analyzing the Customer Churn dataset using Python
This case study will require you to analyze data by building aesthetic graphs to make better sense of the data. You will be working with the ggplot2 package, bar plots and its applications, histogram graphs for data analysis, and box-plots and outliers in them.
Building Models With the Help of Machine Learning Algorithms
You will be designing tree-based models on the ‘Heart’ dataset, performing real-time data manipulation on the heart dataset, data-visualization for multiple columnar data, building a tree-based model on top of the database, and designing a probabilistic classification model on the database. You will have to be familiar with ML Algorithms.
Module 01 – Introduction to Machine Learning
1.1 Need of Machine Learning
1.2 Introduction to Machine Learning
1.3 Types of Machine Learning, such as supervised, unsupervised, and reinforcement learning, Machine Learning with Python, and the applications of Machine Learning
Module 02 – Supervised Learning and Linear Regression
2.1 Introduction to supervised learning and the types of supervised learning, such as regression and classification
2.2 Introduction to regression
2.3 Simple linear regression
2.4 Multiple linear regression and assumptions in linear regression
2.5 Math behind linear regression
1. Implementing linear regression from scratch with Python
2. Using Python library Scikit-Learn to perform simple linear regression and multiple linear regression
3. Implementing train–test split and predicting the values on the test set
Module 03 – Classification and Logistic Regression
3.1 Introduction to classification
3.2 Linear regression vs logistic regression
3.3 Math behind logistic regression, detailed formulas, the logit function and odds, confusion matrix and accuracy, true positive rate, false positive rate, and threshold evaluation with ROCR
1. Implementing logistic regression from scratch with Python
2. Using Python library Scikit-Learn to perform simple logistic regression and multiple logistic regression
3. Building a confusion matrix to find out accuracy, true positive rate, and false positive rate
Module 04 – Decision Tree and Random Forest
4.1 Introduction to tree-based classification
4.2 Understanding a decision tree, impurity function, entropy, and understanding the concept of information gain for the right split of node
4.3 Understanding the concepts of information gain, impurity function, Gini index, overfitting, pruning, pre-pruning, post-pruning, and cost-complexity pruning
4.4 Introduction to ensemble techniques, bagging, and random forests and finding out the right number of trees required in a random forest
1. Implementing a decision tree from scratch in Python
2. Using Python library Scikit-Learn to build a decision tree and a random forest
3. Visualizing the tree and changing the hyper-parameters in the random forest
Module 05 – Naïve Bayes and Support Vector Machine (self-paced)
5.1 Introduction to probabilistic classifiers
5.2 Understanding Naïve Bayes and math behind the Bayes theorem
5.3 Understanding a support vector machine (SVM)
5.4 Kernel functions in SVM and math behind SVM
1. Using Python library Scikit-Learn to build a Naïve Bayes classifier and a support vector classifier
Module 06 – Unsupervised Learning
6.1 Types of unsupervised learning, such as clustering and dimensionality reduction, and the types of clustering
6.2 Introduction to k-means clustering
6.3 Math behind k-means
6.4 Dimensionality reduction with PCA
1. Using Python library Scikit-Learn to implement k-means clustering
2. Implementing PCA (principal component analysis) on top of a dataset
Module 07 – Natural Language Processing and Text Mining (self-paced)
7.1 Introduction to Natural Language Processing (NLP)
7.2 Introduction to text mining
7.3 Importance and applications of text mining
7.4 How NPL works with text mining
7.5 Writing and reading to word files
7.6 Language Toolkit (NLTK) environment
7.7 Text mining: Its cleaning, pre-processing, and text classification
1. Learning Natural Language Toolkit and NLTK Corpora
2. Reading and writing .txt files from/to a local drive
3. Reading and writing .docx files from/to a local drive
Module 08 – Introduction to Deep Learning
8.1 Introduction to Deep Learning with neural networks
8.2 Biological neural networks vs artificial neural networks
8.3 Understanding perception learning algorithm, introduction to Deep Learning frameworks, and TensorFlow constants, variables, and place-holders
Module 09 – Time Series Analysis (self-paced)
9.1 What is time series? Its techniques and applications
9.2 Time series components
9.3 Moving average, smoothing techniques, and exponential smoothing
9.4 Univariate time series models
9.5 Multivariate time series analysis
9.6 ARIMA model and time series in Python
9.7 Sentiment analysis in Python (Twitter sentiment analysis) and text analysis
1. Analyzing time series data
2. The sequence of measurements that follow a non-random order to recognize the nature of the phenomenon
3. Forecasting the future values in the series
Machine Learning Projects
Analyzing the Trends of COVID-19 with Python
In this project, you will be using Pandas to accumulate data from multiple data files, Plotly to create interactive visualizations, Facebook’s Prophet library to make time series models, and visualizing the prediction by combining these technologies.
Customer Churn Classification
This project will help you get more familiar with Machine Learning algorithms. You will be manipulating data to gain meaningful insights, visualizing data to figure out trends and patterns among different factors, and implementing algorithms like linear regression, decision tree, and Naïve Bayes.
Creating a Recommendation System for Movies
You will be creating a Recommendation system for movies by working with Rating prediction, item prediction, user-based methods in k-nearest neighbor, matrix factorization, decomposition of singular value, collaboration filtering, business variables overview, etc. Two approaches you will use are memory-based and model-based.
Case Study 1 – Decision Tree
Conducting this case study will help you understand the structure of a dataset (PIMA Indians Diabetes database) and create a decision tree model based on it by making use of Scikit-Learn.
Case Study 2 – Insurance Cost Prediction (Linear Regression)
In this case study, you will understand the structure of a medical insurance dataset, implement both simple and multiple linear regressions, and predict values for the insurance cost.
Case Study 3 – Diabetes Classification (Logistic Regression)
Through this case study, you will come to understand the structure of a dataset (PIMA Indians Diabetes dataset), implement multiple logistic regressions and classify, fit your model on the test and train data for prediction, evaluate your model using confusion matrix, and then visualize it.
Case Study 4 – Random Forest
You will be creating a model that would help in classifications of patients in the following ways: ‘is normal,’ ‘is suspected to have a disease,’ or in actuality ‘has the disease’ with the help of the ‘Cardiotocography’ dataset.
Case Study 5 – Principal Component Analysis (PCA)
As part of the case study, you will read the sample Iris dataset. You will use PCA to figure out the number of most important principal features and reduce the number of features using PCA. You will have to train and test the random forest classifier algorithm to check the model performance. Find the optimal number of dimensions that will give good quality results and predict accurately.
Case Study 6 – K-means Clustering
This case study involves data analysis, column extraction from the dataset, data visualization, using the elbow method to find out the appropriate number of groups or clusters for the data to be segmented, using k-means clustering, segmenting the data into k groups, visualizing a scatter plot of clusters, and many more.
AI & Deep Learning
Module 01 – Introduction to Deep Learning and Neural Networks
1.1 Field of machine learning, its impact on the field of artificial intelligence
1.2 The benefits of machine learning w.r.t. Traditional methodologies
1.3 Deep learning introduction and how it is different from all other machine learning methods
1.4 Classification and regression in supervised learning
1.5 Clustering and association in unsupervised learning, algorithms that are used in these categories
1.6 Introduction to ai and neural networks
1.7 Machine learning concepts
1.8 Supervised learning with neural networks
1.9 Fundamentals of statistics, hypothesis testing, probability distributions
Module 02 – Multi-layered Neural Networks
2.1 Multi-layer network introduction, regularization, deep neural networks
2.2 Multi-layer perceptron
2.3 Overfitting and capacity
2.4 Neural network hyperparameters, logic gates
2.5 Different activation functions used in neural networks, including relu, softmax, sigmoid and hyperbolic functions
2.6 Back propagation, forward propagation, convergence, hyperparameters, and overfitting.
Module 03 – Artificial Neural Networks and Various Methods
3.1 Various methods that are used to train artificial neural networks
3.2 Perceptron learning rule, gradient descent rule, tuning the learning rate, regularization techniques, optimization techniques
3.3 Stochastic process, vanishing gradients, transfer learning, regression techniques
Module 04 – Deep Learning Libraries
4.1 Understanding how deep learning works
4.2 Activation functions, illustrating perceptron, perceptron training
4.3 multi-layer perceptron, key parameters of perceptron;
4.4 Tensorflow introduction and its open-source software library that is used to design, create and train
4.5 Deep learning models followed by google’s tensor processing unit (tpu) programmable ai
4.6 Python libraries in tensorflow, code basics, variables, constants, placeholders
4.7 Graph visualization, use-case implementation, keras, and more.
Module 05 – Keras API
5.1 Keras high-level neural network for working on top of tensorflow
5.2 Defining complex multi-output models
5.3 Composing models using keras
5.3 Sequential and functional composition, batch normalization
5.4 Deploying keras with tensorboard, and neural network training process customization.
Module 06 – TFLearn API for TensorFlow
6.1 Using tflearn api to implement neural networks
6.2 Defining and composing models, and deploying tensorboard
Module 07 – Dnns (deep neural networks)
7.1 Mapping the human mind with deep neural networks (dnns)
7.2 Several building blocks of artificial neural networks (anns)
7.3 The architecture of dnn and its building blocks
7.4 Reinforcement learning in dnn concepts, various parameters, layers, and optimization algorithms in dnn, and activation functions.
Module 08 – Cnns (convolutional neural networks)
8.1 What is a convolutional neural network?
8.2 Understanding the architecture and use-cases of cnn
8.3‘What is a pooling layer?’ how to visualize using cnn
8.4 How to fine-tune a convolutional neural network
8.5 What is transfer learning?
8.6 Understanding recurrent neural networks, kernel filter, feature maps, and pooling, and deploying convolutional neural networks in tensorflow.
Module 09 – Rnns (recurrent neural networks)
9.1 Introduction to the rnn model
9.2 Use cases of rnn, modeling sequences
9.3 Rnns with back propagation
9.4 Long short-term memory (lstm)
9.5 Recursive neural tensor network theory, the basic rnn cell, unfolded rnn, dynamic rnn
9.6 Time-series predictions.
Module 10 – Gpu in deep learning
10.1 Gpu’s introduction, ‘how are they different from cpus?,’ the significance of gpus
10.2 Deep learning networks, forward pass and backward pass training techniques
10.3 Gpu constituent with simpler core and concurrent hardware.
Module 11- Autoencoders and restricted boltzmann machine (rbm)
11.1 Introduction rbm and autoencoders
11.2 Deploying rbm for deep neural networks, using rbm for collaborative filtering
11.3 Autoencoders features and applications of autoencoders.
Module 12 – Deep learning applications
12.1 Image processing
12.2 Natural language processing (nlp) – Speech recognition, and video analytics.
Module 13 – Chatbots
13.1 Automated conversation bots leveraging any of the following descriptive techniques: Ibm watson, Microsoft’s luis, Open–closed domain bots,
13.2 Generative model, and the sequence to sequence model (lstm).
Artificial Intelligence Assignments and Projects
As part of this assignment, you have to implement an LSTM encoder. Create an input sequence of numbers. Build an LSTM RNN model on top of this data. Compile the model with ‘adam’ to be the optimizer and loss to be ‘mse’. Fit the model on data and set the number of epochs to be 300. Predict the values and verify it with the input data.
In this assignment, you have to build your convolutional Neural Network using MNIST dataset. For this, you will have to download the MNIST dataset through Keras. You will be asked to fit the dataset to a model and evaluate the loss and accuracy of the model. You will be working with pooling layers, dense layers, dropout layers, flatten layers, and NumPy.
Binary Classification on ‘Customer_Churn’ Using Keras
In this project, you will have to analyze the data of a telecom company to find insights and stop customers from churning out to other telecom companies. You will be working on data manipulation and visualization, and create 3 different models with the help of Keras.
Face Detection Project
For the project, you will be using Python 3.5(64-bit) with OpenCV for face detection. The system will have to be able to detect multiple faces in a single image. You will be working with essential libraries like cv2 and glob (glob helps in finding all the pathnames matching a specified pattern).
Build a sequential model using Keras on top of this Diabetes dataset to find out if a patient has diabetes or not. You will use Stochastic Gradient as the optimization algorithm. You will be required to build another sequential model where ‘Outcome’ is the dependent variable and all other columns are predictors.
You will be detecting wine fraud using Neural Networks as a part of this assignment. You will use the latest version of SciKit Learn (>0.18). Use the wine data set from the UCI Machine Learning Repository. Import the dataset, split the data, and use the predict () method to get predictions. You will have to train your model using Scikit Learn’s estimator objects.
AI and Deep Learning Intro Assignment
For this assignment, you will need to install Anaconda on your system with Python version 3.6 or above. Create a TensorFlow environment, download TensorFlow, and download Pandas, Numpy, SciKit-learn, SciPy, Matplotlib in both Anaconda and TensorFlow environment. You will also need to install Keras and TFLearn in the TensorFlow environment.
As part of the assignment, you will be using an airline-passenger dataset to predict the number of passengers for a particular month. Write a simple function to convert a single column of data into a two-column dataset. You will divide the data into train and test set.
Through this assignment, you will learn to create a session in TensorFlow. You will define constants and perform computations using the session, print ‘Hello World’ using the same, and create a simple Linear Equation, y=mx+c in Tensorflow, where m and c are variables and x is a placeholder.
In this assignment, you will be required to find out the factors that lead up to a patient having cancer. You will need to load the dataset and print the number of samples and features in the data. Then, you will divide the data into train & and create a network.
Big Data Hadoop and Spark
Module 01 – Hadoop Installation and Setup
1.1 The architecture of Hadoop cluster
1.2 What is High Availability and Federation?
1.3 How to setup a production cluster?
1.4 Various shell commands in Hadoop
1.5 Understanding configuration files in Hadoop
1.6 Installing a single node cluster with Cloudera Manager
1.7 Understanding Spark, Scala, Sqoop, Pig, and Flume
Module 02 – Introduction to Big Data Hadoop and Understanding HDFS and MapReduce
2.1 Introducing Big Data and Hadoop
2.2 What is Big Data and where does Hadoop fit in?
2.3 Two important Hadoop ecosystem components, namely, MapReduce and HDFS
2.4 In-depth Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability and in-depth YARN – resource manager and node manager
1. HDFS working mechanism
2. Data replication process
3. How to determine the size of the block?
4. Understanding a data node and name node
Module 03 – Deep Dive in MapReduce
3.1 Learning the working mechanism of MapReduce
3.2 Understanding the mapping and reducing stages in MR
3.3 Various terminologies in MR like Input Format, Output Format, Partitioners, Combiners, Shuffle, and Sort
1. How to write a WordCount program in MapReduce?
2. How to write a Custom Partitioner?
3. What is a MapReduce Combiner?
4. How to run a job in a local job runner
5. Deploying a unit test
6. What is a map side join and reduce side join?
7. What is a tool runner?
8. How to use counters, dataset joining with map side, and reduce side joins?
Module 04 – Introduction to Hive
4.1 Introducing Hadoop Hive
4.2 Detailed architecture of Hive
4.3 Comparing Hive with Pig and RDBMS
4.4 Working with Hive Query Language
4.5 Creation of a database, table, group by and other clauses
4.6 Various types of Hive tables, HCatalog
4.7 Storing the Hive Results, Hive partitioning, and Buckets
1. Database creation in Hive
2. Dropping a database
3. Hive table creation
4. How to change the database?
5. Data loading
6. Dropping and altering table
7. Pulling data by writing Hive queries with filter conditions
8. Table partitioning in Hive
9. What is a group by clause?
Module 05 – Advanced Hive and Impala
5.1 Indexing in Hive
5.2 The ap Side Join in Hive
5.3 Working with complex data types
5.4 The Hive user-defined functions
5.5 Introduction to Impala
5.6 Comparing Hive with Impala
5.7 The detailed architecture of Impala
1. How to work with Hive queries?
2. The process of joining the table and writing indexes
3. External table and sequence table deployment
4. Data storage in a different table
Module 06 – Introduction to Pig
6.1 Apache Pig introduction and its various features
6.2 Various data types and schema in Hive
6.3 The available functions in Pig, Hive Bags, Tuples, and Fields
1. Working with Pig in MapReduce and local mode
2. Loading of data
3. Limiting data to 4 rows
4. Storing the data into files and working with Group By, Filter By, Distinct, Cross, Split in Hive
Module 07 – Flume, Sqoop and HBase
7.1 Apache Sqoop introduction
7.2 Importing and exporting data
7.3 Performance improvement with Sqoop
7.4 Sqoop limitations
7.5 Introduction to Flume and understanding the architecture of Flume
7.6 What is HBase and the CAP theorem?
1. Working with Flume to generate Sequence Number and consume it
2. Using the Flume Agent to consume the Twitter data
3. Using AVRO to create Hive Table
4. AVRO with Pig
5. Creating Table in HBase
6. Deploying Disable, Scan, and Enable Table
Module 08 – Writing Spark Applications Using Scala
8.1 Using Scala for writing Apache Spark applications
8.2 Detailed study of Scala
8.3 The need for Scala
8.4 The concept of object-oriented programming
8.5 Executing the Scala code
8.6 Various classes in Scala like getters, setters, constructors, abstract, extending objects, overriding methods
8.7 The Java and Scala interoperability
8.8 The concept of functional programming and anonymous functions
8.9 Bobsrockets package and comparing the mutable and immutable collections
8.10 Scala REPL, Lazy Values, Control Structures in Scala, Directed Acyclic Graph (DAG), first Spark application using SBT/Eclipse, Spark Web UI, Spark in Hadoop ecosystem.
1. Writing Spark application using Scala
2. Understanding the robustness of Scala for Spark real-time analytics operation
Module 09 – Use Case Bobsrockets Package
9.1 Introduction to Scala packages and imports
9.2 The selective imports
9.3 The Scala test classes
9.4 Introduction to JUnit test class
9.5 JUnit interface via JUnit 3 suite for Scala test
9.6 Packaging of Scala applications in the directory structure
9.7 Examples of Spark Split and Spark Scala
Module 10 – Introduction to Spark
10.1 Introduction to Spark
10.2 Spark overcomes the drawbacks of working on MapReduce
10.3 Understanding in-memory MapReduce
10.4 Interactive operations on MapReduce
10.5 Spark stack, fine vs. coarse-grained update, Spark stack, Spark Hadoop YARN, HDFS Revision, and YARN Revision
10.6 The overview of Spark and how it is better than Hadoop
10.7 Deploying Spark without Hadoop
10.8 Spark history server and Cloudera distribution
Module 11 – Spark Basics
11.1 Spark installation guide
11.2 Spark configuration
11.3 Memory management
11.4 Executor memory vs. driver memory
11.5 Working with Spark Shell
11.6 The concept of resilient distributed datasets (RDD)
11.7 Learning to do functional programming in Spark
11.8 The architecture of Spark
Module 12 – Working with RDDs in Spark
12.1 Spark RDD
12.2 Creating RDDs
12.3 RDD partitioning
12.4 Operations and transformation in RDD
12.5 Deep dive into Spark RDDs
12.6 The RDD general operations
12.7 Read-only partitioned collection of records
12.8 Using the concept of RDD for faster and efficient data processing
12.9 RDD action for the collect, count, collects map, save-as-text-files, and pair RDD functions
Module 13 – Aggregating Data with Pair RDDs
13.1 Understanding the concept of key-value pair in RDDs
13.2 Learning how Spark makes MapReduce operations faster
13.3 Various operations of RDD
13.4 MapReduce interactive operations
13.5 Fine and coarse-grained update
13.6 Spark stack
Module 14 – Writing and Deploying Spark Applications
14.1 Comparing the Spark applications with Spark Shell
14.2 Creating a Spark application using Scala or Java
14.3 Deploying a Spark application
14.4 Scala built application
14.5 Creation of the mutable list, set and set operations, list, tuple, and concatenating list
14.6 Creating an application using SBT
14.7 Deploying an application using Maven
14.8 The web user interface of Spark application
14.9 A real-world example of Spark
14.10 Configuring of Spark
Module 15 – Project Solution Discussion and Cloudera Certification Tips and Tricks
15.1 Working towards the solution of the Hadoop project solution
15.2 Its problem statements and the possible solution outcomes
15.3 Preparing for the Cloudera certifications
15.4 Points to focus on scoring the highest marks
15.5 Tips for cracking Hadoop interview questions
1. The project of a real-world high value Big Data Hadoop application
2. Getting the right solution based on the criteria set by the Intellipaat team
Module 16 – Parallel Processing
16.1 Learning about Spark parallel processing
16.2 Deploying on a cluster
16.3 Introduction to Spark partitions
16.4 File-based partitioning of RDDs
16.5 Understanding of HDFS and data locality
16.6 Mastering the technique of parallel operations
16.7 Comparing repartition and coalesce
16.8 RDD actions
Module 17 – Spark RDD Persistence
17.1 The execution flow in Spark
17.2 Understanding the RDD persistence overview
17.3 Spark execution flow, and Spark terminology
17.4 Distribution shared memory vs. RDD
17.5 RDD limitations
17.6 Spark shell arguments
17.7 Distributed persistence
17.8 RDD lineage
17.9 Key-value pair for sorting implicit conversions like CountByKey, ReduceByKey, SortByKey, and AggregateByKey
Module 18 – Spark MLlib
18.1 Introduction to Machine Learning
18.2 Types of Machine Learning
18.3 Introduction to MLlib
18.4 Various ML algorithms supported by MLlib
18.5 Linear regression, logistic regression, decision tree, random forest, and K-means clustering techniques
1. Building a Recommendation Engine
Module 19 – Integrating Apache Flume and Apache Kafka
19.1 Why Kafka and what is Kafka?
19.2 Kafka architecture
19.3 Kafka workflow
19.4 Configuring Kafka cluster
19.6 Kafka monitoring tools
19.7 Integrating Apache Flume and Apache Kafka
1. Configuring Single Node Single Broker Cluster
2. Configuring Single Node Multi Broker Cluster
3. Producing and consuming messages
4. Integrating Apache Flume and Apache Kafka
Module 20 – Spark Streaming
20.1 Introduction to Spark Streaming
20.2 Features of Spark Streaming
20.3 Spark Streaming workflow
20.4 Initializing StreamingContext, discretized Streams (DStreams), input DStreams and Receivers
20.5 Transformations on DStreams, output operations on DStreams, windowed operators and why it is useful
20.6 Important windowed operators and stateful operators
1. Twitter Sentiment analysis
2. Streaming using Netcat server
3. Kafka–Spark streaming
4. Spark–Flume streaming
Module 21 – Improving Spark Performance
21.1 Introduction to various variables in Spark like shared variables and broadcast variables
21.2 Learning about accumulators
21.3 The common performance issues
21.4 Troubleshooting the performance problems
Module 22 – Spark SQL and Data Frames
22.1 Learning about Spark SQL
22.2 The context of SQL in Spark for providing structured data processing
22.3 JSON support in Spark SQL
22.4 Working with XML data
22.5 Parquet files
22.6 Creating Hive context
22.7 Writing data frame to Hive
22.8 Reading JDBC files
22.9 Understanding the data frames in Spark
22.10 Creating Data Frames
22.11 Manual inferring of schema
22.12 Working with CSV files
22.13 Reading JDBC tables
22.14 Data frame to JDBC
22.15 User-defined functions in Spark SQL
22.16 Shared variables and accumulators
22.17 Learning to query and transform data in data frames
22.18 Data frame provides the benefit of both Spark RDD and Spark SQL
22.19 Deploying Hive on Spark as the execution engine
Module 23 – Scheduling/Partitioning
23.1 Learning about the scheduling and partitioning in Spark
23.2 Hash partition
23.3 Range partition
23.4 Scheduling within and around applications
23.5 Static partitioning, dynamic sharing, and fair scheduling
23.6 Map partition with index, the Zip, and GroupByKey
23.7 Spark master high availability, standby masters with ZooKeeper, single-node recovery with the local file system and high order functions
Following topics will be available only in self-paced mode:
Module 24 – Hadoop Administration – Multi-node Cluster Setup Using Amazon EC2
24.1 Create a 4-node Hadoop cluster setup
24.2 Running the MapReduce Jobs on the Hadoop cluster
24.3 Successfully running the MapReduce code
24.4 Working with the Cloudera Manager setup
1. The method to build a multi-node Hadoop cluster using an Amazon EC2 instance
2. Working with the Cloudera Manager
Module 25 – Hadoop Administration – Cluster Configuration
25.1 Overview of Hadoop configuration
25.2 The importance of Hadoop configuration file
25.3 The various parameters and values of configuration
25.4 The HDFS parameters and MapReduce parameters
25.5 Setting up the Hadoop environment
25.6 The Include and Exclude configuration files
25.7 The administration and maintenance of name node, data node directory structures, and files
25.8 What is a File system image?
25.9 Understanding Edit log
1. The process of performance tuning in MapReduce
Module 26 – Hadoop Administration – Maintenance, Monitoring and Troubleshooting
26.1 Introduction to the checkpoint procedure, name node failure
26.2 How to ensure the recovery procedure, Safe Mode, Metadata and Data backup, various potential problems and solutions, what to look for and how to add and remove nodes
1. How to go about ensuring the MapReduce File System Recovery for different scenarios
2. JMX monitoring of the Hadoop cluster
3. How to use the logs and stack traces for monitoring and troubleshooting
4. Using the Job Scheduler for scheduling jobs in the same cluster
5. Getting the MapReduce job submission flow
6. FIFO schedule
7. Getting to know the Fair Scheduler and its configuration
Module 27 – ETL Connectivity with Hadoop Ecosystem (Self-Paced)
27.1 How ETL tools work in Big Data industry?
27.2 Introduction to ETL and data warehousing
27.3 Working with prominent use cases of Big Data in ETL industry
27.4 End-to-end ETL PoC showing Big Data integration with ETL tool
1. Connecting to HDFS from ETL tool
2. Moving data from Local system to HDFS
3. Moving data from DBMS to HDFS,
4. Working with Hive with ETL Tool
5. Creating MapReduce job in ETL tool
Module 28 – Hadoop Application Testing
28.1 Importance of testing
28.2 Unit testing, Integration testing, Performance testing, Diagnostics, Nightly QA test, Benchmark and end-to-end tests, Functional testing, Release certification testing, Security testing, Scalability testing, Commissioning and Decommissioning of data nodes testing, Reliability testing, and Release testing
Module 29 – Roles and Responsibilities of Hadoop Testing Professional
29.1 Understanding the Requirement
29.2 Preparation of the Testing Estimation
29.3 Test Cases, Test Data, Test Bed Creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion, ETL testing at every stage (HDFS, Hive and HBase) while loading the input (logs, files, records, etc.) using Sqoop/Flume which includes but not limited to data verification, Reconciliation, User Authorization and Authentication testing (Groups, Users, Privileges, etc.), reporting defects to the development team or manager and driving them to closure
29.4 Consolidating all the defects and create defect reports
29.5 Validating new feature and issues in Core Hadoop
Module 30 – Framework Called MRUnit for Testing of MapReduce Programs
30.1 Report defects to the development team or manager and driving them to closure
30.2 Consolidate all the defects and create defect reports
30.3 Responsible for creating a testing framework called MRUnit for testing of MapReduce programs
Module 31 – Unit Testing
31.1 Automation testing using the OOZIE
31.2 Data validation using the query surge tool
Module 32 – Test Execution
32.1 Test plan for HDFS upgrade
32.2 Test automation and result
Module 33 – Test Plan Strategy and Writing Test Cases for Testing Hadoop Application
33.1 Test, install and configure
Big Data Hadoop Course Projects
Working with MapReduce, Hive, and Sqoop
In this project, you will successfully import data using Sqoop into HDFS for data analysis. The transfer will be from Sqoop data transfer from RDBMS to Hadoop. You will code in Hive query language and carry out data querying and analysis. You will acquire an understanding of Hive and Sqoop after completion of this project.
Work on MovieLens Data For Finding the Top Movies
Create the top-ten-movies list using the MovieLens data. For this project, you will use the MapReduce program for working on the data file, Apache Pig for analyzing data, and Apache Hive data warehousing and querying. You will be working with distributed datasets.
Hadoop YARN Project: End-to-End PoC
Bring the daily incremental data into the Hadoop Distributed File System. As part of the project, you will be using Sqoop commands to bring the data into HDFS, working with the end-to-end flow of transaction data, and the data from HDFS. You will work on a live Hadoop YARN cluster. You will work on the YARN central resource manager.
Table Partitioning in Hive
In this project, you will learn how to improve the query speed using Hive data partitioning. You will get hands-on experience in partitioning of Hive tables manually, deploying single SQL execution in dynamic partitioning, and bucketing of data to break it into manageable chunks.
Connecting Pentaho with Hadoop Ecosystem
Deploy ETL for data analysis activities. In this project, you will challenge your working knowledge of ETL and Business Intelligence. You will configure Pentaho to work with Hadoop distribution as well as load, transform, and extract data into the Hadoop cluster.
Multi-node Cluster Setup
Set up a Hadoop real-time cluster on Amazon EC2. The project will involve installing and configuring Hadoop. You will need to run a Hadoop multi-node using a 4-node cluster on Amazon EC2 and deploy a MapReduce job on the Hadoop cluster. Java will need to be installed as a prerequisite for running Hadoop.
Hadoop Testing Using MRUnit
In this project, you will be required to test MapReduce applications. You will write JUnit tests using MRUnit for MapReduce applications. You will also be doing mock static methods using PowerMock and Mockito and implementing MapReduce Driver for testing the map and reduce pair
Hadoop Web Log Analytics
Derive insights from web log data. The project involves the aggregation of log data, implementation of Apache Flume for data transportation, and processing of data and generating analytics. You will learn to use workflow and data cleansing using MapReduce, Pig, or Spark.
Through this project, you will learn how to administer a Hadoop cluster for maintaining and managing it. You will be working with the name node directory structure, audit logging, data node block scanner, balancer, Failover, fencing, DISTCP, and Hadoop file formats.
Twitter Sentiment Analysis
Find out what is the reaction of the people to the demonetization move by India by analyzing their tweets. You will have to download the tweets, load them into Pig storage, divide the tweets into words to calculate sentiment, rate the words from +5 to −5 on the AFFIN dictionary, filter them and analyze sentiment.
Analyzing IPL T20 Cricket
This project will require you to analyze an entire cricket match and get any details of the match. You will need to load the IPL dataset into HDFS. You will then analyze that data using Apache Pig or Hive. Based on the user queries, the system will have to give the right output.
Recommend the most appropriate movie to a user based on his taste. This is a hands-on Apache Spark project, which will include the creation of collaborative filtering, regression, clustering, and dimensionality reduction. You will need to make use of the Apache Spark MLlib component and statistical analysis.
Twitter API Integration for Tweet Analysis
Analyze the user sentiment based on a tweet. In this Twitter analysis project, you will integrate the Twitter API and use Python or PHP for developing the essential server-side codes. You will carry out filtering, parsing, and aggregation depending on the tweet analysis requirement.
Data Exploration Using Spark SQL – Wikipedia Data Set
In this project, you will be making use of the Spark SQL tool for analyzing Wikipedia data. You will be integrating Spark SQL for batch analysis, Machine Learning, visualizing, and processing of data and ETL processes, along with real-time analysis of data.
Tableau Desktop 10
Module 1 – Introduction to Data Visualization and Power of Tableau
1.1 What is data visualization?
1.2 Comparison and benefits against reading raw numbers
1.3 Real use cases from various business domains
1.4 Some quick and powerful examples using Tableau without going into the technical details of Tableau
1.5 Installing Tableau
1.6 Tableau interface
1.7 Connecting to DataSource
1.8 Tableau data types
1.9 Data preparation
Module 2 – Architecture of Tableau
2.1 Installation of Tableau Desktop
2.2 Architecture of Tableau
2.3 Interface of Tableau (Layout, Toolbars, Data Pane, Analytics Pane, etc.)
2.4 How to start with Tableau
2.5 The ways to share and export the work done in Tableau
1. Play with Tableau desktop
2. Learn about the interface
3. Share and export existing works
Module 3 – Working with Metadata and Data Blending
3.1 Connection to Excel
3.2 Cubes and PDFs
3.3 Management of metadata and extracts
3.4 Data preparation
3.5 Joins (Left, Right, Inner, and Outer) and Union
3.6 Dealing with NULL values, cross-database joining, data extraction, data blending, refresh extraction, incremental extraction, how to build extract, etc.
1. Connect to Excel sheet to import data
2. Use metadata and extracts
3. Manage NULL values
4. Clean up data before using
5. Perform the join techniques
6. Execute data blending from multiple sources
Module 4 – Creation of Sets
4.1 Mark, highlight, sort, group, and use sets (creating and editing sets, IN/OUT, sets in hierarchies)
4.2 Constant sets
4.3 Computed sets, bins, etc.
1. Use marks to create and edit sets
2. Highlight the desired items
3. Make groups
4. Apply sorting on results
5. Make hierarchies among the created sets
Module 5 – Working with Filters
5.1 Filters (addition and removal)
5.2 Filtering continuous dates, dimensions, and measures
5.3 Interactive filters, marks card, and hierarchies
5.4 How to create folders in Tableau
5.5 Sorting in Tableau
5.6 Types of sorting
5.7 Filtering in Tableau
5.8 Types of filters
5.9 Filtering the order of operations
1. Use the data set by date/dimensions/measures to add a filter
2. Use interactive filter to view the data
3. Customize/remove filters to view the result
Module 6 – Organizing Data and Visual Analytics
6.1 Using Formatting Pane to work with menu, fonts, alignments, settings, and copy-paste
6.2 Formatting data using labels and tooltips
6.3 Edit axes and annotations
6.4 K-means cluster analysis
6.5 Trend and reference lines
6.6 Visual analytics in Tableau
6.7 Forecasting, confidence interval, reference lines, and bands
1. Apply labels and tooltips to graphs, annotations, edit axes’ attributes
2. Set the reference line
3. Perform k-means cluster analysis on the given dataset
Module 7 – Working with Mapping
7.1 Working on coordinate points
7.2 Plotting longitude and latitude
7.3 Editing unrecognized locations
7.4 Customizing geocoding, polygon maps, WMS: web mapping services
7.5 Working on the background image, including add image
7.6 Plotting points on images and generating coordinates from them
7.7 Map visualization, custom territories, map box, WMS map
7.8 How to create map projects in Tableau
7.9 Creating dual axes maps, and editing locations
1. Plot longitude and latitude on a geo map
2. Edit locations on the geo map
3. Custom geocoding
4. Use images of the map and plot points
5. Find coordinates
6. Create a polygon map
7. Use WMS
Module 8 – Working with Calculations and Expressions
8.1 Calculation syntax and functions in Tableau
8.2 Various types of calculations, including Table, String, Date, Aggregate, Logic, and Number
8.3 LOD expressions, including concept and syntax
8.4 Aggregation and replication with LOD expressions
8.5 Nested LOD expressions
8.6 Levels of details: fixed level, lower level, and higher level
8.7 Quick table calculations
8.8 The creation of calculated fields
8.9 Predefined calculations
8.10 How to validate
Module 9 – Working with Parameters
9.1 Creating parameters
9.2 Parameters in calculations
9.3 Using parameters with filters
9.4 Column selection parameters
9.5 Chart selection parameters
9.6 How to use parameters in the filter session
9.7 How to use parameters in calculated fields
9.8 How to use parameters in the reference line
1. Creating new parameters to apply on a filter
2. Passing parameters to filters to select columns
3. Passing parameters to filters to select charts
Module 10 – Charts and Graphs
10.1 Dual axes graphs
10.3 Single and dual axes
10.4 Box plot
10.5 Charts: motion, Pareto, funnel, pie, bar, line, bubble, bullet, scatter, and waterfall charts
10.6 Maps: tree and heat maps
10.7 Market basket analysis (MBA)
10.8 Using Show me
10.9 Text table and highlighted table
1. Plot a histogram, tree map, heat map, funnel chart, and more using the given dataset
2. Perform market basket analysis (MBA) on the same dataset
Module 11 – Dashboards and Stories
11.1 Building and formatting a dashboard using size, objects, views, filters, and legends
11.2 Best practices for making creative as well as interactive dashboards using the actions
11.3 Creating stories, including the intro of story points
11.4 Creating as well as updating the story points
11.5 Adding catchy visuals in stories
11.6 Adding annotations with descriptions; dashboards and stories
11.7 What is dashboard?
11.8 Highlight actions, URL actions, and filter actions
11.9 Selecting and clearing values
11.10 Best practices to create dashboards
11.11 Dashboard examples; using Tableau workspace and Tableau interface
11.12 Learning about Tableau joins
11.13 Types of joins
11.14 Tableau field types
11.15 Saving as well as publishing data source
11.16 Live vs extract connection
11.17 Various file types
1. Create a Tableau dashboard view, include legends, objects, and filters
2. Make the dashboard interactive
3. Use visual effects, annotations, and descriptions to create and edit a story
Module 12 – Tableau Prep
12.1 Introduction to Tableau Prep
12.2 How Tableau Prep helps quickly combine join, shape, and clean data for analysis
12.3 Creation of smart examples with Tableau Prep
12.4 Getting deeper insights into the data with great visual experience
12.5 Making data preparation simpler and accessible
12.6 Integrating Tableau Prep with Tableau analytical workflow
12.7 Understanding the seamless process from data preparation to analysis with Tableau Prep
Module 13 – Integration of Tableau with R
13.1 Introduction to R language
13.2 Applications and use cases of R
13.3 Deploying R on the Tableau platform
13.4 Learning R functions in Tableau
1. Deploy R on Tableau
2. Create a line graph using R interface
Tableau Projects Covered
Understanding the global covid-19 mortality rates
Analyze and develop a dashboard to understand the covid-19 global cases. Compare the global confirmed vs. death cases in a world map. Compare the country wise cases using logarithmic axes. Dashboard should display both a log axis chart and a default axis chart in an alternate interactive way. Create a parameter to dynamically view Top N WHO regions based on cumulative new cases and death cases ratio. Dashboard should have a drop down menu to view the WHO region wise data using a bar chart, line chart or a map as per user’s requirement.
Understand the UK bank customer data
Analyze and develop a dashboard to understand the customer data of a UK bank. Create an asymmetric drop down of Region with their respective customer names and their Balances with a gender wise color code. Region wise bar chart which displays the count of customers based on High and low balance. Create a parameter to let the users’ dynamically decide the limit value of balance which categorizes it into high and low. Include interactive filters for Job classifications and Highlighters for Region in the final dashboard.
Understand Financial Data
Create an interactive map to analyze the worldwide sales and profit. Include map layers and map styles to enhance the visualization. Interactive analysis to display the average gross sales of a product under each segment, allowing only one segment data to be displayed at once. Create a motion chart to compare the sales and profit through the years. Annotate the day wise profit line chart to indicate the peaks and also enable drop lines. Add go to URL actions in the final dashboard which directs the user to the respective countries Wikipedia page.
Understand Agriculture Data
Create interactive tree map to display district wise data. Tree maps should have state labels. On hovering on a particular state, the corresponding districts data are to be displayed. Add URL actions, which direct users’ to a Google search page of the selected crop. Web page is to be displayed on the final dashboard. Create a hierarchy of seasons, crop categories and the list of crops under each. Add highlighters for season. One major sheet in the final dashboard should be unaffected by any action applied. Use the view in this major sheet to filter data in the other. Using parameters color code the seasons with high yield and low yield based on its crop categories. Rank the crops based on their yield
Data Science with SAS
Introduction to SAS
Installation and introduction to SAS, how to get started with SAS, understanding different SAS windows, how to work with data sets, various SAS windows like output, search, editor, log and explorer and understanding the SAS functions, which are various library types and programming files
SAS Enterprise Guide
How to import and export raw data files, how to read and subset the data sets, different statements like SET, MERGE and WHERE
Hands-on Exercise: How to import the Excel file in the workspace and how to read data and export the workspace to save data
SAS Operators and Functions
Different SAS operators like logical, comparison and arithmetic, deploying different SAS functions like Character, Numeric, Is Null, Contains, Like and Input/Output, along with the conditional statements like If/Else, Do While, Do Until and so on
Hands-on Exercise: Performing operations using the SAS functions and logical and arithmetic operations
Compilation and Execution
Understanding about input buffer, PDV (backend) and learning what is Missover
Defining and using KEEP and DROP statements, apply these statements and formats and labels in SAS
Hands-on Exercise: Use KEEP and DROP statements
Creation and Compilation of SAS Data Sets
Understanding the delimiter, dataline rules, DLM, delimiter DSD, raw data files and execution and list input for standard data
Hands-on Exercise: Use delimiter rules on raw data files
Various SAS standard procedures built-in for popular programs: PROC SORT, PROC FREQ, PROC SUMMARY, PROC RANK, PROC EXPORT, PROC DATASET, PROC TRANSPOSE, PROC CORR, etc.
Hands-on Exercise: Use SORT, FREQ, SUMMARY, EXPORT and other procedures
Input Statement and Formatted Input
Reading standard and non-standard numeric inputs with formatted inputs, column pointer controls, controlling while a record loads, line pointer control/absolute line pointer control, single trailing, multiple IN and OUT statements, dataline statement and rules, list input method and comparing single trailing and double trailing
Hands-on Exercise: Read standard and non-standard numeric inputs with formatted inputs, control while a record loads, control a line pointer and write multiple IN and OUT statements
SAS Format statements: standard and user-written, associating a format with a variable, working with SAS Format, deploying it on PROC data sets and comparing ATTRIB and Format statements
Hands-on Exercise: Format a variable, deploy format rule on PROC data set and use ATTRIB statement
Understanding PROC GCHART, various graphs, bar charts: pie, bar and 3D and plotting variables with PROC GPLOT
Hands-on Exercise: Plot graphs using PROC GPLOT and display charts using PROC GCHART
Interactive Data Processing
SAS advanced data discovery and visualization, point-and-click analytics capabilities and powerful reporting tools
Data Transformation Function
Character functions, numeric functions and converting variable type
Hands-on Exercise: Use functions in data transformation
Output Delivery System (ODS)
Introduction to ODS, data optimization and how to generate files (rtf, pdf, html and doc) using SAS
Hands-on Exercise: Optimize data and generate rtf, pdf, html and doc files
Macro Syntax, macro variables, positional parameters in a macro and macro step
Hands-on Exercise: Write a macro and use positional parameters
SQL statements in SAS, SELECT, CASE, JOIN and UNION and sorting data
Hands-on Exercise: Create SQL query to select and add a condition and use a CASE in select query
Advanced Base SAS
Base SAS web-based interface and ready-to-use programs, advanced data manipulation, storage and retrieval and descriptive statistics
Hands-on Exercise: Use web UI to do statistical operations
Report enhancement, global statements, user-defined formats, PROC SORT, ODS destinations, ODS listing, PROC FREQ, PROC Means, PROC UNIVARIATE, PROC REPORT and PROC PRINT
Hands-on Exercise: Use PROC SORT to sort the results, list ODS, find mean using PROC Means and print using PROC PRINT
Categorization of Patients Based on the Count of Drugs for Their Therapy
This project aims to find out descriptive statistics and subset for specific clinical data problems. It will give them brief insight about Base SAS procedures and data steps.
Build Revenue Projections Reports
You will be working with the SAS data analytics and business intelligence tool. You will get to work on the data entered in a business enterprise setup and will aggregate, retrieve, and manage that data. Create insightful reports and graphs and come up with statistical and mathematical analysis to predict revenue projection.
Impact of Pre-paid Plans on the Preferences of Investors
This project aims to find the most impacting factors in the preferences of the pre-paid model. The project also identifies which variables are highly correlated with impacting factors. In addition to this, the project also looks to identify various insights that would help a newly established brand to foray deeper into the market on a large scale.
k-means cluster Analysis on Iris Dataset
In this project, you will be required to do k-means cluster analysis on an Iris dataset to predict the class of a flower using the dimensions of its petals.
Introduction to Excel spreadsheet, learning to enter data, filling of series and custom fill list, editing and deleting fields.
Referencing in Formulas
Learning about relative and absolute referencing, the concept of relative formulae, the issues in relative formulae, creating of absolute and mixed references and various other formulae.
Creating names range, using names in new formulae, working with the name box, selecting range, names from a selection, pasting names in formulae, selecting names and working with Name Manager.
Understanding Logical Functions
the various logical functions in Excel, the If function for calculating values and displaying text, nested If functions, VLookUp and IFError functions.
Getting started with Conditional Formatting
Learning about conditional formatting, the options for formatting cells, various operations with icon sets, data bars and color scales, creating and modifying sparklines.
multi-level drop down validation, restricting value from list only, learning about error messages and cell drop down.
Important Formulas in Excel
Introduction to the various formulae in Excel like Sum, SumIF & SumIFs, Count, CountA, CountIF and CountBlank, Networkdays, Networkdays International, Today & Now function, Trim (Eliminating undesirable spaces), Concatenate (Consolidating columns)
Working with Dynamic table
Introduction to dynamic table in Excel, data conversion, table conversion, tables for charts and VLOOKUP.
Sorting in Excel, various types of sorting including, alphabetical, numerical, row, multiple column, working with paste special, hyperlinking and using subtotal.
The concept of data filtering, understanding compound filter and its creation, removing of filter, using custom filter and multiple value filters, working with wildcards.
Creation of Charts in Excel, performing operations in embedded chart, modifying, resizing, and dragging of chart.
Various Techniques of Charting
Introduction to the various types of charting techniques, creating titles for charts, axes, learning about data labels, displaying data tables, modifying axes, displaying gridlines and inserting trendlines, textbox insertion in a chart, creating a 2-axis chart, creating combination chart.
Pivot Tables in Excel
The concept of Pivot tables in Excel, report filtering, shell creation, working with Pivot for calculations, formatting of reports, dynamic range assigning, the slicers and creating of slicers.
Ensuring Data and File Security
Data and file security in Excel, protecting row, column, and cell, the different safeguarding techniques.
Getting started with VBA Macros
Learning about VBA macros in Excel, executing macros in Excel, the macro shortcuts, applications, the concept of relative reference in macros, In-depth understanding of Visual Basic for Applications, the VBA Editor, module insertion and deletion, performing action with Sub and ending Sub if condition not met.
Ranges and Worksheet in VBA
Learning about the concepts of workbooks and worksheets in Excel, protection of macro codes, range coding, declaring a variable, the concept of Pivot Table in VBA, introduction to arrays, user forms, getting to know how to work with databases within Excel.
Learning how the If condition works and knowing how to apply it in various scenarios, working with multiple Ifs in Macro, The concept of message box in VBA, learning to create the message box, various types of message boxes, the IF condition as related to message boxes.
Loops in VBA
Understanding the concept of looping, deploying looping in VBA Macros.
Debugging in VBA
Studying about debugging in VBA, the various steps of debugging like running, breaking, resetting, understanding breakpoints and way to mark it, the code for debugging and code commenting.
Introduction to powerful data visualization with Excel Dashboard, important points to consider while designing the dashboards like loading the data, managing data and linking the data to tables and charts, creating Reports using dashboard features, Learning to create Dashboards, the various rules to follow while creating Dashboards, creation of dynamic dashboards, knowing what is data layout, introduction to thermometer chart and its creation, how to use alerts in the Dashboard setup, Understanding data quality issues in Excel, linking of data, consolidating and merging data, working with dashboards for Excel Pivot Tables.
Principles of Charting
Learning to create charts in Excel, the various charts available, the steps to successfully build a chart, personalization of charts, formatting and updating features, various special charts for Excel dashboards, understanding how to choose the right chart for the right data, How to insert a Scroll bar to a data window?, Concept of Option buttons in a chart, Use of combo box drop-down, List box control Usage, How to use Checkbox Control?
Getting started with Pivot Tables
Creation of Pivot Tables in Excel, learning to change the Pivot Table layout, generating Reports, the methodology of grouping and ungrouping of data.
Statistics with Excel
ONE TAILED TEST AND TWO TAILED T-TEST, LINEAR REGRESSIONLECTURE, PERFORMING STATISTICAL ANALYSIS USING EXCEL, IMPLEMENTING LINEAR REGRESSION WITH EXCEL
What projects will I be working on in this Excel certification training?
Project – if Function
Data – Employee
Problem Statement – It describes about if function and how to implement this if function. It includes following actions:
Calculates Bonus for all employee at 10% of their salary using if Function, Rate the salesman based on the sales and the rating scale., Find the number of times “3” is repeated in the table and find the number of values greater than 5 using Count Function, Uses of Operators and nested if function
Introduction to NoSQL and MongoDB
RDBMS, types of relational databases, challenges of RDBMS, NoSQL database, its significance, how NoSQL suits Big Data needs, introduction to MongoDB and its advantages, MongoDB installation, JSON features, data types and examples
Installing MongoDB, basic MongoDB commands and operations, MongoChef (MongoGUI) installation and MongoDB data types
Hands-on Exercise: Install MongoDB and install MongoChef (MongoGUI)
Importance of NoSQL
The need for NoSQL, types of NoSQL databases, OLTP, OLAP, limitations of RDBMS, ACID properties, CAP Theorem, Base property, learning about JSON/BSON, database collection and documentation, MongoDB uses, MongoDB write concern—acknowledged, replica acknowledged, unacknowledged, journaled—and Fsync
Hands-on Exercise: Write a JSON document
Understanding CRUD and its functionality, CRUD concepts, MongoDB query and syntax and read and write queries and query optimization
Hands-on Exercise:Use insert query to create a data entry, use find query to read data, use update and replace queries to update and use delete query operations on a DB file
Data Modeling and Schema Design
Concepts of data modelling, difference between MongoDB and RDBMS modelling, model tree structure, operational strategies, monitoring and backup
Hands-on Exercise: Write a data model tree structure for a family hierarchy
Data Management and Administration
In this module, you will learn MongoDB® Administration activities such as health check, backup, recovery, database sharding and profiling, data import/export, performance tuning, etc.
Hands-on Exercise: Use shard key and hashed shard keys, perform backup and recovery of a dummy dataset, import data from a CSV file and export data to a CSV file
Data Indexing and Aggregation
Concepts of data aggregation and types and data indexing concepts, properties and variations
Hands-on Exercise: Do aggregation using pipeline, sort, skip and limit and create index on data using single key and using multi-key
Understanding database security risks, MongoDB security concept and security approach and MongoDB integration with Java and Robomongo
Hands-on Exercise: MongoDB integration with Java and Robomongo
Working with Unstructured Data
Implementing techniques to work with variety of unstructured data like images, videos, log data and others and understanding GridFS MongoDB file system for storing data
Hands-on Exercise: Work with variety of unstructured data like images, videos, log data and others
What projects I will be working on this MongoDB training?
Project: Working with the MongoDB Java Driver
Problem Statement: How to create table for video insertion using Java
Topics: In this project, you will work with MongoDB Java Driver and become proficient in creating a table for inserting video using Java programming. You will work with collections and documents and understand the read and write basics of MongoDB database and the Java virtual machine libraries.
- Setting up MongoDB JDBC Driver
- Connecting to the database
- Java virtual machine libraries
Module 1 – Introduction to SQL
1.1 Various types of databases
1.2 Introduction to Structured Query Language
1.3 Distinction between client server and file server databases
1.4 Understanding SQL Server Management Studio
1.5 SQL Table basics
1.6 Data types and functions
1.8 Authentication for Windows
1.9 Data control language
1.10 The identification of the keywords in T-SQL, such as Drop Table
Module 2 – Database Normalization and Entity Relationship Model
2.1 Entity-Relationship Model
2.2 Entity and Entity Set
2.3 Attributes and types of Attributes
2.4 Entity Sets
2.5 Relationship Sets
2.6 Degree of Relationship
2.7 Mapping Cardinalities, One-to-One, One-to-Many, Many-to-one, Many-to-many
2.8 Symbols used in E-R Notation
Module 3 – SQL Operators
3.1 Introduction to relational databases
3.2 Fundamental concepts of relational rows, tables, and columns
3.3 Several operators (such as logical and relational), constraints, domains, indexes, stored procedures, primary and foreign keys
3.4 Understanding group functions
3.5 The unique key
Module 4 – Working with SQL: Join, Tables, and Variables
4.1 Advanced concepts of SQL tables
4.2 SQL functions
4.3 Operators & queries
4.4 Table creation
4.5 Data retrieval from tables
4.6 Combining rows from tables using inner, outer, cross, and self joins
4.7 Deploying operators such as ‘intersect,’ ‘except,’ ‘union,’
4.8 Temporary table creation
4.9 Set operator rules
4.10 Table variables
Module 5 – Deep Dive into SQL Functions
5.1 Understanding SQL functions – what do they do?
5.2 Scalar functions
5.3 Aggregate functions
5.4 Functions that can be used on different datasets, such as numbers, characters, strings, and dates
5.5 Inline SQL functions
5.6 General functions
5.7 Duplicate functions
Module 6 – Working with Subqueries
6.1 Understanding SQL subqueries, their rules
6.2 Statements and operators with which subqueries can be used
6.3 Using the set clause to modify subqueries
6.4 Understanding different types of subqueries, such as where, select, insert, update, delete, etc.
6.5 Methods to create and view subqueries
Module 7 – SQL Views, Functions, and Stored Procedures
7.1 Learning SQL views
7.2 Methods of creating, using, altering, renaming, dropping, and modifying views
7.3 Understanding stored procedures and their key benefits
7.4 Working with stored procedures
7.5 Studying user-defined functions
7.6 Error handling
Module 8 – Deep Dive into User-defined Functions
8.1 User-defined functions
8.2 Types of UDFs, such as scalar
8.3 Inline table value
8.4 Multi-statement table
8.5 Stored procedures and when to deploy them
8.6 What is rank function?
8.7 Triggers, and when to execute triggers?
Module 9 – SQL Optimization and Performance
9.1 SQL Server Management Studio
9.2 Using pivot in MS Excel and MS SQL Server
9.3 Differentiating between Char, Varchar, and NVarchar
9.4 XL path, indexes and their creation
9.5 Records grouping, advantages, searching, sorting, modifying data
9.6 Clustered indexes creation
9.7 Use of indexes to cover queries
9.8 Common table expressions
9.9 Index guidelines
Module 10 – Advanced Topics
10.1 Correlated Subquery, Grouping Sets, Rollup, Cube
- Implementing Correlated Subqueries
- Using EXISTS with a Correlated subquery
- Using Union Query
- Using Grouping Set Query
- Using Rollup
- Using CUBE to generate four grouping sets
- Perform a partial CUBE
Module 11 – Managing Database Concurrency
11.1 Applying transactions
11.2 Using the transaction behavior to identify DML statements
11.3 Learning about implicit and explicit transactions
11.4 Isolation levels management
11.5 Understanding concurrency and locking behavior
11.6 Using memory-optimized tables
Module 12 – Programming Databases Using Transact-SQL
12.1 Creating Transact-SQL queries
12.2 Querying multiple tables using joins
12.3 Implementing functions and aggregating data
12.4 Modifying data
12.5 Determining the results of DDL statements on supplied tables and data
12.6 Constructing DML statements using the output statement
12.7 Querying data using subqueries and APPLY
12.8 Querying data using table expressions
12.9 Grouping and pivoting data using queries
12.10 Querying temporal data and non-relational data
12.11 Constructing recursive table expressions to meet business requirements
12.12 Using windowing functions to group
12.13 Rank the results of a query
12.14 Creating database programmability objects by using T-SQL
12.15 Implementing error handling and transactions
12.16 Implementing transaction control in conjunction with error handling in stored procedures
12.17 Implementing data types and NULL
12.18 Designing and implementing relational database schema
12.19 Designing and implementing indexes
12.20 Learning to compare between indexed and included columns
12.21 Implementing clustered index
12.22 Designing and deploying views
12.23 Column store views
12.24 Explaining foreign key constraints
12.25 Using T-SQL statements
12.26 Usage of Data Manipulation Language (DML)
12.27 Designing the components of stored procedures
12.28 Implementing input and output parameters
12.29 Applying error handling
12.30 Executing control logic in stored procedures
12.31 Designing trigger logic, DDL triggers, etc.
12.32 Accuracy of statistics
12.33 Formulating statistics maintenance tasks
12.34 Dynamic management objects management
12.35 Identifying missing indexes
12.36 Examining and troubleshooting query plans
12.37 Consolidating the overlapping indexes
12.38 The performance management of database instances
12.39 SQL server performance monitoring
Module 13 – Microsoft Courses: Study Material
13.1 Performance Tuning and Optimizing SQL Databases
13.2Querying Data with Transact-SQL
Writing Complex Subqueries
In this project, you will be working with SQL subqueries and utilizing them in various scenarios. You will learn to use IN or NOT IN, ANY or ALL, EXISTS or NOT EXISTS, and other majorRead More. queries. You will be required to access and manipulate datasets, operate and control statements, execute queries in SQL against databases.
Querying a Large Relational Database
This project is about how to get details about customers by querying the database. You will be working with Table basics and data types, various SQL operators, and SQL functions. The project will require youRead More. to download a database and restore it on the server, query the database for customer details and sales information.
Relational Database Design
In this project, you will learn to convert a relational design that has enlisted within its various users, user roles, user accounts, and their statuses into a table in SQL Server. You will have Read More.. to define relations/attributes, primary keys, and create respective foreign keys with at least two rows in each of the tables.