Big Data Master’s Course

292,854

“Edutech Skills” Big Data Architect master’s course will provide you with in-depth knowledge on Big Data platforms like Hadoop, Spark and NoSQL databases, along with a detailed exposure of analytics and ETL by working on tools. This program is specially designed by industry experts, and you will get 13 courses with 33 industry-based projects.

List of Courses Included:

Online Instructor-led Courses:

  • Big Data Hadoop and Spark
  • Apache Spark and Scala
  • Splunk Developer and Admin
  • Python for Data Science
  • Pyspark Training
  • MongoDB
  • AWS Big Data

Self-paced Courses:

  • Hadoop Testing
  • Apache Storm
  • Apache Kafka
  • Apache Cassandra
  • Java
  • Linux

What will you learn in this master’s course?

  • Introduction to Hadoop ecosystem
  • Working with HDFS and MapReduce
  • Real-time analytics with Apache Spark
  • ETL in Business Intelligence domain
  • Working on large amounts of data with NoSQL databases
  • Real-time message brokering system
  • Hadoop analysis and testing

Who should take up this training?

  • Data Science and Big Data Professionals and Software Developers
  • Business Intelligence Professionals, Information Architects and Project Managers
  • Those who aspire to be a Big Data Architect

What are the prerequisites for taking up this training course?

There are no prerequisites for taking up this training program.

Why should you take up this training program?

  • Global Hadoop market to reach $84.6 billion in 2 years – Allied Market Research
  • The number of jobs for all US-based data professionals will increase 2.7 million per year – IBM
  • A Hadoop Administrator in the US can get a salary of $123,000 – Indeed

Big Data is the fastest growing and the most promising technology that aids profiles like Big Data Engineer and Big Data Solutions Architect that are in huge demand. This Big Data Architect master’s course will help you grab the best jobs in this domain.

This edutech skills training program has been specifically created to let you master the Hadoop architecture, along with helping you gain proficiency in Business Intelligence domain. Upon the completion of the training, you will be well-versed in extracting valuable business insights from raw data. This way, you can apply for top jobs in the Big Data ecosystem.

Big Data Hadoop & Spark (Live Course)

Module 01 – Hadoop Installation and Setup

1.1 The architecture of Hadoop cluster
1.2 What is High Availability and Federation?
1.3 How to setup a production cluster?
1.4 Various shell commands in Hadoop
1.5 Understanding configuration files in Hadoop
1.6 Installing a single node cluster with Cloudera Manager
1.7 Understanding Spark, Scala, Sqoop, Pig, and Flume

Module 02 – Introduction to Big Data Hadoop and Understanding HDFS and MapReduce

2.1 Introducing Big Data and Hadoop
2.2 What is Big Data and where does Hadoop fit in?
2.3 Two important Hadoop ecosystem components, namely, MapReduce and HDFS
2.4 In-depth Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability and in-depth YARN – resource manager and node manager

Hands-on Exercise:

1. HDFS working mechanism
2. Data replication process
3. How to determine the size of the block?
4. Understanding a data node and name node

Module 03 – Deep Dive in MapReduce

3.1 Learning the working mechanism of MapReduce
3.2 Understanding the mapping and reducing stages in MR
3.3 Various terminologies in MR like Input Format, Output Format, Partitioners, Combiners, Shuffle, and Sort

Hands-on Exercise:

1. How to write a WordCount program in MapReduce?
2. How to write a Custom Partitioner?
3. What is a MapReduce Combiner?
4. How to run a job in a local job runner
5. Deploying a unit test
6. What is a map side join and reduce side join?
7. What is a tool runner?
8. How to use counters, dataset joining with map side, and reduce side joins?

Module 04 – Introduction to Hive

4.1 Introducing Hadoop Hive
4.2 Detailed architecture of Hive
4.3 Comparing Hive with Pig and RDBMS
4.4 Working with Hive Query Language
4.5 Creation of a database, table, group by and other clauses
4.6 Various types of Hive tables, HCatalog
4.7 Storing the Hive Results, Hive partitioning, and Buckets

Hands-on Exercise:

1. Database creation in Hive
2. Dropping a database
3. Hive table creation
4. How to change the database?
5. Data loading
6. Dropping and altering table
7. Pulling data by writing Hive queries with filter conditions
8. Table partitioning in Hive
9. What is a group by clause?

Module 05 – Advanced Hive and Impala

5.1 Indexing in Hive
5.2 The ap Side Join in Hive
5.3 Working with complex data types
5.4 The Hive user-defined functions
5.5 Introduction to Impala
5.6 Comparing Hive with Impala
5.7 The detailed architecture of Impala

Hands-on Exercise: 

1. How to work with Hive queries?
2. The process of joining the table and writing indexes
3. External table and sequence table deployment
4. Data storage in a different table

Module 06 – Introduction to Pig

6.1 Apache Pig introduction and its various features
6.2 Various data types and schema in Hive
6.3 The available functions in Pig, Hive Bags, Tuples, and Fields

Hands-on Exercise: 

1. Working with Pig in MapReduce and local mode
2. Loading of data
3. Limiting data to 4 rows
4. Storing the data into files and working with Group By, Filter By, Distinct, Cross, Split in Hive

Module 07 – Flume, Sqoop and HBase

7.1 Apache Sqoop introduction
7.2 Importing and exporting data
7.3 Performance improvement with Sqoop
7.4 Sqoop limitations
7.5 Introduction to Flume and understanding the architecture of Flume
7.6 What is HBase and the CAP theorem?

Hands-on Exercise: 

1. Working with Flume to generate Sequence Number and consume it
2. Using the Flume Agent to consume the Twitter data
3. Using AVRO to create Hive Table
4. AVRO with Pig
5. Creating Table in HBase
6. Deploying Disable, Scan, and Enable Table

Module 08 – Writing Spark Applications Using Scala

8.1 Using Scala for writing Apache Spark applications
8.2 Detailed study of Scala
8.3 The need for Scala
8.4 The concept of object-oriented programming
8.5 Executing the Scala code
8.6 Various classes in Scala like getters, setters, constructors, abstract, extending objects, overriding methods
8.7 The Java and Scala interoperability
8.8 The concept of functional programming and anonymous functions
8.9 Bobsrockets package and comparing the mutable and immutable collections
8.10 Scala REPL, Lazy Values, Control Structures in Scala, Directed Acyclic Graph (DAG), first Spark application using SBT/Eclipse, Spark Web UI, Spark in Hadoop ecosystem.

Hands-on Exercise:

1. Writing Spark application using Scala
2. Understanding the robustness of Scala for Spark real-time analytics operation

Module 09 – Use Case Bobsrockets Package

9.1 Introduction to Scala packages and imports
9.2 The selective imports
9.3 The Scala test classes
9.4 Introduction to JUnit test class
9.5 JUnit interface via JUnit 3 suite for Scala test
9.6 Packaging of Scala applications in the directory structure
9.7 Examples of Spark Split and Spark Scala

Module 10 – Introduction to Spark

10.1 Introduction to Spark
10.2 Spark overcomes the drawbacks of working on MapReduce
10.3 Understanding in-memory MapReduce
10.4 Interactive operations on MapReduce
10.5 Spark stack, fine vs. coarse-grained update, Spark stack, Spark Hadoop YARN, HDFS Revision, and YARN Revision
10.6 The overview of Spark and how it is better than Hadoop
10.7 Deploying Spark without Hadoop
10.8 Spark history server and Cloudera distribution

Module 11 – Spark Basics

11.1 Spark installation guide
11.2 Spark configuration
11.3 Memory management
11.4 Executor memory vs. driver memory
11.5 Working with Spark Shell
11.6 The concept of resilient distributed datasets (RDD)
11.7 Learning to do functional programming in Spark
11.8 The architecture of Spark

Module 12 – Working with RDDs in Spark

12.1 Spark RDD
12.2 Creating RDDs
12.3 RDD partitioning
12.4 Operations and transformation in RDD
12.5 Deep dive into Spark RDDs
12.6 The RDD general operations
12.7 Read-only partitioned collection of records
12.8 Using the concept of RDD for faster and efficient data processing
12.9 RDD action for the collect, count, collects map, save-as-text-files, and pair RDD functions

Module 13 – Aggregating Data with Pair RDDs

13.1 Understanding the concept of key-value pair in RDDs
13.2 Learning how Spark makes MapReduce operations faster
13.3 Various operations of RDD
13.4 MapReduce interactive operations
13.5 Fine and coarse-grained update
13.6 Spark stack

Module 14 – Writing and Deploying Spark Applications

14.1 Comparing the Spark applications with Spark Shell
14.2 Creating a Spark application using Scala or Java
14.3 Deploying a Spark application
14.4 Scala built application
14.5 Creation of the mutable list, set and set operations, list, tuple, and concatenating list
14.6 Creating an application using SBT
14.7 Deploying an application using Maven
14.8 The web user interface of Spark application
14.9 A real-world example of Spark
14.10 Configuring of Spark

Module 15 – Project Solution Discussion and Cloudera Certification Tips and Tricks

15.1 Working towards the solution of the Hadoop project solution
15.2 Its problem statements and the possible solution outcomes
15.3 Preparing for the Cloudera certifications
15.4 Points to focus on scoring the highest marks
15.5 Tips for cracking Hadoop interview questions

Hands-on Exercise:

1. The project of a real-world high value Big Data Hadoop application
2. Getting the right solution based on the criteria set by the Intellipaat team

Module 16 – Parallel Processing

16.1 Learning about Spark parallel processing
16.2 Deploying on a cluster
16.3 Introduction to Spark partitions
16.4 File-based partitioning of RDDs
16.5 Understanding of HDFS and data locality
16.6 Mastering the technique of parallel operations
16.7 Comparing repartition and coalesce
16.8 RDD actions

Module 17 – Spark RDD Persistence

17.1 The execution flow in Spark
17.2 Understanding the RDD persistence overview
17.3 Spark execution flow, and Spark terminology
17.4 Distribution shared memory vs. RDD
17.5 RDD limitations
17.6 Spark shell arguments
17.7 Distributed persistence
17.8 RDD lineage
17.9 Key-value pair for sorting implicit conversions like CountByKey, ReduceByKey, SortByKey, and AggregateByKey

Module 18 – Spark MLlib

18.1 Introduction to Machine Learning
18.2 Types of Machine Learning
18.3 Introduction to MLlib
18.4 Various ML algorithms supported by MLlib
18.5 Linear regression, logistic regression, decision tree, random forest, and K-means clustering techniques

Hands-on Exercise: 

1. Building a Recommendation Engine

Module 19 – Integrating Apache Flume and Apache Kafka

19.1 Why Kafka and what is Kafka?
19.2 Kafka architecture
19.3 Kafka workflow
19.4 Configuring Kafka cluster
19.5 Operations
19.6 Kafka monitoring tools
19.7 Integrating Apache Flume and Apache Kafka

Hands-on Exercise: 

1. Configuring Single Node Single Broker Cluster
2. Configuring Single Node Multi Broker Cluster
3. Producing and consuming messages
4. Integrating Apache Flume and Apache Kafka

Module 20 – Spark Streaming

20.1 Introduction to Spark Streaming
20.2 Features of Spark Streaming
20.3 Spark Streaming workflow
20.4 Initializing StreamingContext, discretized Streams (DStreams), input DStreams and Receivers
20.5 Transformations on DStreams, output operations on DStreams, windowed operators and why it is useful
20.6 Important windowed operators and stateful operators

Hands-on Exercise: 

1. Twitter Sentiment analysis
2. Streaming using Netcat server
3. Kafka–Spark streaming
4. Spark–Flume streaming

Module 21 – Improving Spark Performance

21.1 Introduction to various variables in Spark like shared variables and broadcast variables
21.2 Learning about accumulators
21.3 The common performance issues
21.4 Troubleshooting the performance problems

Module 22 – Spark SQL and Data Frames

22.1 Learning about Spark SQL
22.2 The context of SQL in Spark for providing structured data processing
22.3 JSON support in Spark SQL
22.4 Working with XML data
22.5 Parquet files
22.6 Creating Hive context
22.7 Writing data frame to Hive
22.8 Reading JDBC files
22.9 Understanding the data frames in Spark
22.10 Creating Data Frames
22.11 Manual inferring of schema
22.12 Working with CSV files
22.13 Reading JDBC tables
22.14 Data frame to JDBC
22.15 User-defined functions in Spark SQL
22.16 Shared variables and accumulators
22.17 Learning to query and transform data in data frames
22.18 Data frame provides the benefit of both Spark RDD and Spark SQL
22.19 Deploying Hive on Spark as the execution engine

Module 23 – Scheduling/Partitioning

23.1 Learning about the scheduling and partitioning in Spark
23.2 Hash partition
23.3 Range partition
23.4 Scheduling within and around applications
23.5 Static partitioning, dynamic sharing, and fair scheduling
23.6 Map partition with index, the Zip, and GroupByKey
23.7 Spark master high availability, standby masters with ZooKeeper, single-node recovery with the local file system and high order functions

Following topics will be available only in self-paced mode:

Module 24 – Hadoop Administration – Multi-node Cluster Setup Using Amazon EC2

24.1 Create a 4-node Hadoop cluster setup
24.2 Running the MapReduce Jobs on the Hadoop cluster
24.3 Successfully running the MapReduce code
24.4 Working with the Cloudera Manager setup

Hands-on Exercise:

1. The method to build a multi-node Hadoop cluster using an Amazon EC2 instance
2. Working with the Cloudera Manager

Module 25 – Hadoop Administration – Cluster Configuration

25.1 Overview of Hadoop configuration
25.2 The importance of Hadoop configuration file
25.3 The various parameters and values of configuration
25.4 The HDFS parameters and MapReduce parameters
25.5 Setting up the Hadoop environment
25.6 The Include and Exclude configuration files
25.7 The administration and maintenance of name node, data node directory structures, and files
25.8 What is a File system image?
25.9 Understanding Edit log

Hands-on Exercise:

1. The process of performance tuning in MapReduce

Module 26 – Hadoop Administration – Maintenance, Monitoring and Troubleshooting

26.1 Introduction to the checkpoint procedure, name node failure
26.2 How to ensure the recovery procedure, Safe Mode, Metadata and Data backup, various potential problems and solutions, what to look for and how to add and remove nodes

Hands-on Exercise:

1. How to go about ensuring the MapReduce File System Recovery for different scenarios
2. JMX monitoring of the Hadoop cluster
3. How to use the logs and stack traces for monitoring and troubleshooting
4. Using the Job Scheduler for scheduling jobs in the same cluster
5. Getting the MapReduce job submission flow
6. FIFO schedule
7. Getting to know the Fair Scheduler and its configuration

Module 27 – ETL Connectivity with Hadoop Ecosystem (Self-Paced)

27.1 How ETL tools work in Big Data industry?
27.2 Introduction to ETL and data warehousing
27.3 Working with prominent use cases of Big Data in ETL industry
27.4 End-to-end ETL PoC showing Big Data integration with ETL tool

Hands-on Exercise:

1. Connecting to HDFS from ETL tool
2. Moving data from Local system to HDFS
3. Moving data from DBMS to HDFS,
4. Working with Hive with ETL Tool
5. Creating MapReduce job in ETL tool

Module 28 – Hadoop Application Testing

28.1 Importance of testing
28.2 Unit testing, Integration testing, Performance testing, Diagnostics, Nightly QA test, Benchmark and end-to-end tests, Functional testing, Release certification testing, Security testing, Scalability testing, Commissioning and Decommissioning of data nodes testing, Reliability testing, and Release testing

Module 29 – Roles and Responsibilities of Hadoop Testing Professional

29.1 Understanding the Requirement
29.2 Preparation of the Testing Estimation
29.3 Test Cases, Test Data, Test Bed Creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion, ETL testing at every stage (HDFS, Hive and HBase) while loading the input (logs, files, records, etc.) using Sqoop/Flume which includes but not limited to data verification, Reconciliation, User Authorization and Authentication testing (Groups, Users, Privileges, etc.), reporting defects to the development team or manager and driving them to closure
29.4 Consolidating all the defects and create defect reports
29.5 Validating new feature and issues in Core Hadoop

Module 30 – Framework Called MRUnit for Testing of MapReduce Programs

30.1 Report defects to the development team or manager and driving them to closure
30.2 Consolidate all the defects and create defect reports
30.3 Responsible for creating a testing framework called MRUnit for testing of MapReduce programs

Module 31 – Unit Testing

31.1 Automation testing using the OOZIE
31.2 Data validation using the query surge tool

Module 32 – Test Execution

32.1 Test plan for HDFS upgrade
32.2 Test automation and result

Module 33 – Test Plan Strategy and Writing Test Cases for Testing Hadoop Application

33.1 Test, install and configure

Big Data Hadoop Course Projects

Working with MapReduce, Hive, and Sqoop

In this project, you will successfully import data using Sqoop into HDFS for data analysis. The transfer will be from Sqoop data transfer from RDBMS to Hadoop. You will code in Hive query language and carry out data querying and analysis. You will acquire an understanding of Hive and Sqoop after completion of this project.

Work on MovieLens Data For Finding the Top Movies

Create the top-ten-movies list using the MovieLens data. For this project, you will use the MapReduce program for working on the data file, Apache Pig for analyzing data, and Apache Hive data warehousing and querying. You will be working with distributed datasets.

Hadoop YARN Project: End-to-End PoC

Bring the daily incremental data into the Hadoop Distributed File System. As part of the project, you will be using Sqoop commands to bring the data into HDFS, working with the end-to-end flow of transaction data, and the data from HDFS. You will work on a live Hadoop YARN cluster. You will work on the YARN central resource manager.

Table Partitioning in Hive

In this project, you will learn how to improve the query speed using Hive data partitioning. You will get hands-on experience in partitioning of Hive tables manually, deploying single SQL execution in dynamic partitioning, and bucketing of data to break it into manageable chunks.

Connecting Pentaho with Hadoop Ecosystem

Deploy ETL for data analysis activities. In this project, you will challenge your working knowledge of ETL and Business Intelligence. You will configure Pentaho to work with Hadoop distribution as well as load, transform, and extract data into the Hadoop cluster.

Multi-node Cluster Setup

Set up a Hadoop real-time cluster on Amazon EC2. The project will involve installing and configuring Hadoop. You will need to run a Hadoop multi-node using a 4-node cluster on Amazon EC2 and deploy a MapReduce job on the Hadoop cluster. Java will need to be installed as a prerequisite for running Hadoop.

Hadoop Testing Using MRUnit

In this project, you will be required to test MapReduce applications. You will write JUnit tests using MRUnit for MapReduce applications. You will also be doing mock static methods using PowerMock and Mockito and implementing MapReduce Driver for testing the map and reduce pair

Hadoop Web Log Analytics

Derive insights from web log data. The project involves the aggregation of log data, implementation of Apache Flume for data transportation, and processing of data and generating analytics. You will learn to use workflow and data cleansing using MapReduce, Pig, or Spark.

Hadoop Maintenance

Through this project, you will learn how to administer a Hadoop cluster for maintaining and managing it. You will be working with the name node directory structure, audit logging, data node block scanner, balancer, Failover, fencing, DISTCP, and Hadoop file formats.

Twitter Sentiment Analysis

Find out what is the reaction of the people to the demonetization move by India by analyzing their tweets. You will have to download the tweets, load them into Pig storage, divide the tweets into words to calculate sentiment, rate the words from +5 to −5 on the AFFIN dictionary, filter them and analyze sentiment.

Analyzing IPL T20 Cricket

This project will require you to analyze an entire cricket match and get any details of the match. You will need to load the IPL dataset into HDFS. You will then analyze that data using Apache Pig or Hive. Based on the user queries, the system will have to give the right output.

Movie Recommendation

Recommend the most appropriate movie to a user based on his taste. This is a hands-on Apache Spark project, which will include the creation of collaborative filtering, regression, clustering, and dimensionality reduction. You will need to make use of the Apache Spark MLlib component and statistical analysis.

Twitter API Integration for Tweet Analysis

Analyze the user sentiment based on a tweet. In this Twitter analysis project, you will integrate the Twitter API and use Python or PHP for developing the essential server-side codes. You will carry out filtering, parsing, and aggregation depending on the tweet analysis requirement.

Data Exploration Using Spark SQL – Wikipedia Data Set

In this project, you will be making use of the Spark SQL tool for analyzing Wikipedia data. You will be integrating Spark SQL for batch analysis, Machine Learning, visualizing, and processing of data and ETL processes, along with real-time analysis of data.

Apache Spark & Scala (Live Course)

Scala Course Content

Module 01 – Introduction to Scala

1.1 Introducing Scala
1.2 Deployment of Scala for Big Data applications and Apache Spark analytics
1.3 Scala REPL, lazy values, and control structures in Scala
1.4 Directed Acyclic Graph (DAG)
1.5 First Spark application using SBT/Eclipse
1.6 Spark Web UI
1.7 Spark in the Hadoop ecosystem.

Module 02 – Pattern Matching

2.1 The importance of Scala
2.2 The concept of REPL (Read Evaluate Print Loop)
2.3 Deep dive into Scala pattern matching
2.4 Type interface, higher-order function, currying, traits, application space and Scala for data analysis

Module 03 – Executing the Scala Code

3.1 Learning about the Scala Interpreter
3.2 Static object timer in Scala and testing string equality in Scala
3.3 Implicit classes in Scala
3.4 The concept of currying in Scala
3.5 Various classes in Scala

Module 04 – Classes Concept in Scala

4.1 Learning about the Classes concept
4.2 Understanding the constructor overloading
4.3 Various abstract classes
4.4 The hierarchy types in Scala
4.5 The concept of object equality
4.6 The val and var methods in Scala

Module 05 – Case Classes and Pattern Matching

5.1 Understanding sealed traits, wild, constructor, tuple, variable pattern, and constant pattern

Module 06 – Concepts of Traits with Example

6.1 Understanding traits in Scala
6.2 The advantages of traits
6.3 Linearization of traits
6.4 The Java equivalent
6.5 Avoiding of boilerplate code

Module 07 – Scala–Java Interoperability

7.1 Implementation of traits in Scala and Java
7.2 Handling of multiple traits extending

Module 08 – Scala Collections

8.1 Introduction to Scala collections
8.2 Classification of collections
8.3 The difference between iterator and iterable in Scala
8.4 Example of list sequence in Scala

Module 09 – Mutable Collections Vs. Immutable Collections

9.1 The two types of collections in Scala
9.2 Mutable and immutable collections
9.3 Understanding lists and arrays in Scala
9.4 The list buffer and array buffer
9.6 Queue in Scala
9.7 Double-ended queue Deque, Stacks, Sets, Maps, and Tuples in Scala

Module 10 – Use Case Bobsrockets Package

10.1 Introduction to Scala packages and imports
10.2 The selective imports
10.3 The Scala test classes
10.4 Introduction to JUnit test class
10.5 JUnit interface via JUnit 3 suite for Scala test
10.6 Packaging of Scala applications in the directory structure
10.7 Examples of Spark Split and Spark Scala

Spark Course Content

Module 11 – Introduction to Spark

11.1 Introduction to Spark
11.2 Spark overcomes the drawbacks of working on MapReduce
11.3 Understanding in-memory MapReduce
11.4 Interactive operations on MapReduce
11.5 Spark stack, fine vs. coarse-grained update, Spark stack, Spark Hadoop YARN, HDFS Revision, and YARN Revision
11.6 The overview of Spark and how it is better than Hadoop
11.7 Deploying Spark without Hadoop
11.8 Spark history server and Cloudera distribution

Module 12 – Spark Basics

12.1 Spark installation guide
12.2 Spark configuration
12.3 Memory management
12.4 Executor memory vs. driver memory
12.5 Working with Spark Shell
12.6 The concept of resilient distributed datasets (RDD)
12.7 Learning to do functional programming in Spark
12.8 The architecture of Spark

Module 13 – Working with RDDs in Spark

13.1 Spark RDD
13.2 Creating RDDs
13.3 RDD partitioning
13.4 Operations and transformation in RDD
13.5 Deep dive into Spark RDDs
13.6 The RDD general operations
13.7 Read-only partitioned collection of records
13.8 Using the concept of RDD for faster and efficient data processing
13.9 RDD action for the collect, count, collects map, save-as-text-files, and pair RDD functions

Module 14 – Aggregating Data with Pair RDDs

14.1 Understanding the concept of key-value pair in RDDs
14.2 Learning how Spark makes MapReduce operations faster
14.3 Various operations of RDD
14.4 MapReduce interactive operations
14.5 Fine and coarse-grained update
14.6 Spark stack

Module 15 – Writing and Deploying Spark Applications

15.1 Comparing the Spark applications with Spark Shell
15.2 Creating a Spark application using Scala or Java
15.3 Deploying a Spark application
15.4 Scala built application
15.5 Creation of the mutable list, set and set operations, list, tuple, and concatenating list
15.6 Creating an application using SBT
15.7 Deploying an application using Maven
15.8 The web user interface of Spark application
15.9 A real-world example of Spark
15.10 Configuring of Spark

Module 16 – Parallel Processing

16.1 Learning about Spark parallel processing
16.2 Deploying on a cluster
16.3 Introduction to Spark partitions
16.4 File-based partitioning of RDDs
16.5 Understanding of HDFS and data locality
16.6 Mastering the technique of parallel operations
16.7 Comparing repartition and coalesce
16.8 RDD actions

Module 17 – Spark RDD Persistence

17.1 The execution flow in Spark
17.2 Understanding the RDD persistence overview
17.3 Spark execution flow, and Spark terminology
17.4 Distribution shared memory vs. RDD
17.5 RDD limitations
17.6 Spark shell arguments
17.7 Distributed persistence
17.8 RDD lineage
17.9 Key-value pair for sorting implicit conversions like CountByKey, ReduceByKey, SortByKey, and AggregateByKey

Module 18 – Spark MLlib

18.1 Introduction to Machine Learning
18.2 Types of Machine Learning
18.3 Introduction to MLlib
18.4 Various ML algorithms supported by MLlib
18.5 Linear regression, logistic regression, decision tree, random forest, and K-means clustering techniques

Hands-on Exercise: 
1. Building a Recommendation Engine

Module 19 – Integrating Apache Flume and Apache Kafka

19.1 Why Kafka and what is Kafka?
19.2 Kafka architecture
19.3 Kafka workflow
19.4 Configuring Kafka cluster
19.5 Operations
19.6 Kafka monitoring tools
19.7 Integrating Apache Flume and Apache Kafka

Hands-on Exercise: 
1. Configuring Single Node Single Broker Cluster
2. Configuring Single Node Multi Broker Cluster
3. Producing and consuming messages
4. Integrating Apache Flume and Apache Kafka

Module 20 – Spark Streaming

20.1 Introduction to Spark Streaming
20.2 Features of Spark Streaming
20.3 Spark Streaming workflow
20.4 Initializing StreamingContext, discretized Streams (DStreams), input DStreams and Receivers
20.5 Transformations on DStreams, output operations on DStreams, windowed operators and why it is useful
20.6 Important windowed operators and stateful operators

Hands-on Exercise: 
1. Twitter Sentiment analysis
2. Streaming using Netcat server
3. Kafka–Spark streaming
4. Spark–Flume streaming

Module 21 – Improving Spark Performance

21.1 Introduction to various variables in Spark like shared variables and broadcast variables
21.2 Learning about accumulators
21.3 The common performance issues
21.4 Troubleshooting the performance problems

Module 22 – Spark SQL and Data Frames

22.1 Learning about Spark SQL
22.2 The context of SQL in Spark for providing structured data processing
22.3 JSON support in Spark SQL
22.4 Working with XML data
22.5 Parquet files
22.6 Creating Hive context
22.7 Writing data frame to Hive
22.8 Reading JDBC files
22.9 Understanding the data frames in Spark
22.10 Creating Data Frames
22.11 Manual inferring of schema
22.12 Working with CSV files
22.13 Reading JDBC tables
22.14 Data frame to JDBC
22.15 User-defined functions in Spark SQL
22.16 Shared variables and accumulators
22.17 Learning to query and transform data in data frames
22.18 Data frame provides the benefit of both Spark RDD and Spark SQL
22.19 Deploying Hive on Spark as the execution engine

Module 23 – Scheduling/Partitioning

23.1 Learning about the scheduling and partitioning in Spark
23.2 Hash partition
23.3 Range partition
23.4 Scheduling within and around applications
23.5 Static partitioning, dynamic sharing, and fair scheduling
23.6 Map partition with index, the Zip, and GroupByKey
23.7 Spark master high availability, standby masters with ZooKeeper, single-node recovery with the local file system and high order functions

Spark and Scala Projects

Movie Recommendation

Deploy Apache Spark for a movie recommendation system. Through this project, you will be working with Spark MLlib, collaborative filtering, clustering, regression, and dimensionality reduction. By the completion of this project, you will be proficient in working with streaming data, sampling, testing, and statistics.

Twitter API Integration for Tweet Analysis

integrate Twitter API for analyzing tweets. You can use any of the scripting languages, like PHP, Ruby, or Python, for requesting the Twitter API and get the results in JSON format. You will have to perform aggregation, filtering, and parsing as per the requirement for the tweet analysis.

Data Exploration Using Spark SQL – Wikipedia Data

This project will allow you to work with Spark SQL and combine it with ETL applications, real-time analysis of data, performing batch analysis, deploying Machine Learning, creating visualizations, and processing of graphs.

Splunk Developer & Admin (Live Course)

Module 1 – Splunk Development Concepts

1.1 Introduction to Splunk and Splunk developer roles and responsibilities

Module 2 – Basic Searching

2.1 Writing Splunk query for search
2.2 Auto-complete to build a search
2.3 Time range
2.4 Refine search
2.5 Working with events
2.6 Identifying the contents of search
2.7 Controlling a search job

Hands-on Exercise –
Write a basic search query

Module 3 – Using Fields in Searches

3.1 What is a Field
3.2 How to use Fields in search
3.3 Deploying Fields Sidebar and Field Extractor for REGEX field extraction
3.4 Delimiting Field Extraction using FX

Hands-on Exercise –

  1. Use Fields in Search
  2. Use Fields Sidebar
  3. Use Field Extractor (FX)
  4. Delimit field Extraction using FX

Module 4 – Saving and Scheduling Searches

4.1 Writing Splunk query for search, sharing, saving, scheduling and exporting search results

Hands-on Exercise –

  1. Schedule a search
  2. Save a search result
  3. Share and export a search result

Module 5: Creating Alerts

5.1 How to create alerts
5.2 Understanding alerts
5.3 Viewing fired alerts

Hands-on Exercise –

  1. Create an alert in Splunk
  2. View the fired alerts

Module 6 – Scheduled Reports

6.1 Describe and configure scheduled reports

Module 7 – Tags and Event Types

7.1 Introduction to Tags in Splunk
7.2 Deploying Tags for Splunk search
7.3 Understanding event types and utility
7.4 Generating and implementing event types in search

Hands-on Exercise –

  1. Deploy tags for Splunk search
  2. Generate and implement event types in search

Module 8 – Creating and Using Macros

8.1 What is a Macro
8.2 What are variables and arguments in Macros

Hands-on Exercise –

  1. First, you define a Macro with arguments and then use variables with in it

Module 9 – Workflow

9.1 Creating get, post and search workflow actions

Hands-on Exercise –

  1. Create get, post and search workflow actions

Module 10 – Splunk Search Commands

10.1 Studying the search command
10.2 The general search practices
10.3 What is a search pipeline
10.4 How to specify indexes in search
10.5 Highlighting the syntax
10.6 Deploying the various search commands like fields, tables, sort, rename, rex and erex

Hands-on Exercise –

  1. Steps to create a search pipeline
  2. Search index specification
  3. How to highlight syntax
  4. Using the auto complete feature
  5. Deploying the various search commands like sort, fields, tables, rename, rex and erex

Module 11 – Transforming Commands

11.1 Using top, rare and stats commands

Hands-on Exercise –

  1. Use top, rare and stats commands

Module 12 – Reporting Commands

12.1 Using following commands and their functions: addcoltotals, addtotals, top, rare and stats

Hands-on Exercise –

  1. Create reports using following commands and their functions: addcoltotals and addtotals

Module 13 – Mapping and Single Value Commands

13.1 iplocation, geostats, geom and addtotals commands

Hands-on Exercise –

  1. Track IP using iplocation and get geo data using geostats

Module 14 – Splunk Reports and Visualizations

14.1 Explore the available visualizations
14.2 Create charts and time charts
14.3 Omit null values and format results

Hands-on Exercise –

  1. Create time charts
  2. Omit null values
  3. Format results

Module 15 – Analyzing, Calculating and Formatting Results

15.1 Calculating and analyzing results
15.2 Value conversion
15.3 Roundoff and format values
15.4 Using the eval command
15.5 Conditional statements
15.6 Filtering calculated search results

Hands-on Exercise –

  1. Calculate and analyze results
  2. Perform conversion on a data value
  3. Roundoff numbers
  4. Use the eval command
  5. Write conditional statements
  6. Apply filters on calculated search results

Module 16 – Correlating Events

16.1 How to search the transactions
16.2 Creating report on transactions
16.3 Grouping events using time and fields
16.4 Comparing transactions with stats

Hands-on Exercise –

  1. Generate report on transactions
  2. Group events using fields and time

Module 17 – Enriching Data with Lookups

17.1 Learning data lookups
17.2 Examples and lookup tables
17.3 Defining and configuring automatic lookups
17.4 Deploying lookups in reports and searches

Hands-on Exercise –

  1. Define and configure automatic lookups
  2. Deploy lookups in reports and searches

Module 18 – Creating Reports and Dashboards

18.1 Creating search charts, reports and dashboards
18.2 Editing reports and dashboards
18.3 Adding reports to dashboards

Hands-on Exercise –

  1. Create search charts, reports and dashboards
  2. Edit reports and dashboards
  3. Add reports to dashboards

Module 19 – Getting Started with Parsing

19.1 Working with raw data for data extraction, transformation, parsing and preview

Hands-on Exercise –

  1. Extract useful data from raw data
  2. Perform transformation
  3. Parse different values and preview

Module 20 – Using Pivot

20.1 Describe pivot
20.2 Relationship between data model and pivot
20.3 Select a data model object
20.4 Create a pivot report
20.5 Create instant pivot from a search
20.6 Add a pivot report to dashboard

Hands-on Exercise –

  1. Select a data model object
  2. Create a pivot report
  3. Create instant pivot from a search
  4. Add a pivot report to dashboard

Module 21 – Common Information Model (CIM) Add-On

21.1 What is a Splunk CIM
21.2 Using the CIM Add-On to normalize data

Hands-on Exercise –

  1. Use the CIM Add-On to normalize data

Splunk Administration Topics

Module 22 – Overview of Splunk

22.1 Introduction to the architecture of Splunk
22.2 Various server settings
22.3 How to set up alerts
22.4 Various types of licenses
22.5 Important features of Splunk tool
22.6 The requirements of hardware and conditions needed for installation of Splunk

Module 23 – Splunk Installation

23.1 How to install and configure Splunk
23.2 The creation of index
23.3 Standalone server’s input configuration
23.4 The preferences for search
23.5 Linux environment Splunk installation
23.6 The administering and architecting of Splunk

Module 24 – Splunk Installation in Linux

24.1 How to install Splunk in the Linux environment
24.2 The conditions needed for Splunk
24.3 Configuring Splunk in the Linux environment

Module 25 – Distributed Management Console

25.1 Introducing Splunk distributed management console
25.2 Indexing of clusters
25.3 How to deploy distributed search in Splunk environment
25.4 Forwarder management
25.5 User authentication and access control

Module 26 – Introduction to Splunk App

26.1 Introduction to the Splunk app
26.2 How to develop Splunk apps
26.3 Splunk app management
26.4 Splunk app add-ons
26.5 Using Splunk-base for installation and deletion of apps
26.6 Different app permissions and implementation
26.7 How to use the Splunk app
26.8 Apps on forwarder

Module 27 – Splunk Indexes and Users

27.1 Details of the index time configuration file
27.2 The search time configuration file

Module 28 – Splunk Configuration Files

28.1 Understanding of Index time and search time configuration filesin Splunk
28.2 Forwarder installation
28.3 Input and output configuration
28.4 Universal Forwarder management
28.5 Splunk Universal Forwarder highlights

Module 29 – Splunk Deployment Management

29.1 Implementing the Splunk tool
29.2 Deploying it on the server
29.3 Splunk environment setup
29.4 Splunk client group deployment

Module 30 – Splunk Indexes

30.1 Understanding the Splunk Indexes
30.2 The default Splunk Indexes
30.3 Segregating the Splunk Indexes
30.4 Learning Splunk Buckets and Bucket Classification
30.5 Estimating Index storage
30.6 Creating new Index

Module 31 – User Roles and Authentication

31.1 Understanding the concept of role inheritance
31.2 Splunk authentications
31.3 Native authentications
31.4 LDAP authentications

Module 32 – Splunk Administration Environment

32.1 Splunk installation, configuration
32.2 Data inputs
32.3 App management
32.4 Splunk important concepts
32.5 Parsing machine-generated data
32.6 Search indexer and forwarder

Module 33 – Basic Production Environment

33.1 Introduction to Splunk Configuration Files
33.2 Universal Forwarder
33.3 Forwarder Management
33.4 Data management, troubleshooting and monitoring

Module 34 – Splunk Search Engine

34.1 Converting machine-generated data into operational intelligence
34.2 Setting up the dashboard, reports and charts
34.3 Integrating Search Head Clustering and Indexer Clustering

Module 35 – Various Splunk Input Methods

35.1 Understanding the input methods
35.2 Deploying scripted, Windows and network
35.3 Agentless input types and fine-tuning them all

Module 36 – Splunk User and Index Management

36.1 Splunk user authentication and job role assignment
36.2 Learning to manage, monitor and optimize Splunk Indexes

Module 37 – Machine Data Parsing

37.1 Understanding parsing of machine-generated data
37.2 Manipulation of raw data
37.3 Previewing and parsing
37.4 Data field extraction
37.5 Comparing single-line and multi-line events

Module 38 – Search Scaling and Monitoring

38.1 Distributed search concepts
38.2 Improving search performance
38.3 Large-scale deployment and overcoming execution hurdles
38.4 Working with Splunk Distributed Management Console for monitoring the entire operation

Module 39 – Splunk Cluster Implementation

39.1 Cluster indexing
39.2 Configuring individual nodes
39.3 Configuring the cluster behavior, index and search behavior
39.4 Setting node type to handle different aspects of cluster like master node, peer node and search head

What projects I will be working on this Splunk Developer and Admin training?

Project 1 : Creating an Employee Database of a Company

Industry : General

Problem Statement : How to build a Splunk dashboard where employee details are readily available

Topics : In this project, you will create a text file of employee data with details like full name, salary, designation, ID and so on. You will index the data based on various parameters, use various Splunk commands for evaluating and extracting the information. Finally, you will create a dashboard and add various reports to it.

Highlights :

  • Splunk search and index commands
  • Extracting field in search and saving results
  • Editing event types and adding tags

Project 2 : Building an Organizational Dashboard with Splunk

Industry :  E-commerce

Problem Statement : How to analyze website traffic and gather insights

Topics :  In this project, you will build an analytics dashboard for a website and create alerts for various conditions. You will capture access logs of the web server andthe sample logs and then the sample are uploaded. You will analyze the top ten users, the average time spent, peak response time of the website, the top ten errors and error code description. You will also create a Splunk dashboard for reporting and analyzing.

Highlights :

  • Creating bar and line charts
  • Sending alerts for various conditions
  • Providing admin rights for dashboard

Project 3 : Field Extraction in Splunk

Industry : General

Problem Statement :How to extract the fields from event data in Splunk

Topics : In this project, you will learn to extract fields from events using the Splunk field extraction technique. You will gain knowledge in the basics of field extractions, understand the use of the field extractor, the field extraction page in Splunk web and field extract configuration in files. You will learn the regular expression and delimiters method of field extraction. Upon the completion of the project, you will gain expertise in building Splunk dashboard and use the extracted fields data in it to create rich visualizations in an enterprise setup.

Highlight :

  • Field extraction using delimiter method
  • Delimit field extracts using FX
  • Extracting fields with the search command

Python for Data Science (Live Course)

Module 01 – Introduction to Data Science using Python

1.1 What is Data Science, what does a data scientist do
1.2 Various examples of Data Science in the industries
1.3 How Python is deployed for Data Science applications
1.4 Various steps in Data Science process like data wrangling, data exploration and selecting the model.
1.5 Introduction to Python programming language
1.6 Important Python features, how is Python different from other programming languages
1.7 Python installation, Anaconda Python distribution for Windows, Linux and Mac
1.8 How to run a sample Python script, Python IDE working mechanism
1.9 Running some Python basic commands
1.10 Python variables, data types and keywords.

Hands-on Exercise – Installing Python Anaconda for the Windows, Linux and Mac

Module 02 – Python basic constructs

2.1 Introduction to a basic construct in Python
2.2 Understanding indentation like tabs and spaces
2.3 Python built-in data types
2.4 Basic operators in Python
2.5 Loop and control statements like break, if, for, continue, else, range() and more.

Hands-on Exercise –
1.Write your first Python program
2. Write a Python function (with and without parameters)
3. Use Lambda expression
4. Write a class
5. Create a member function and a variable
6. Create an object and write a for loop to print all odd numbers

Module 03 – Maths for DS-Statistics & Probability

3.1 Central Tendency
3.2 Variability
3.3 Hypothesis Testing
3.4 Anova
3.5 Correlation
3.6 Regression
3.7 Probability Definitions and Notation
3.8 Joint Probabilities
3.9 The Sum Rule, Conditional Probability, and the Product Rule
3.10 Bayes Theorem

Hands-on Exercise –

1. We will analyze both categorical data and quantitative data
2. Focusing on specific case studies to help solidify the week’s statistical concepts

Module 04 – OOPs in Python (Self paced)

4.1 Understanding the OOP paradigm like encapsulation, inheritance, polymorphism and abstraction
4.2 What are access modifiers, instances, class members
4.3 Classes and objects
4.4 Function parameter and return type functions
4.5 Lambda expressions.

Hands-on Exercise –
1. Writing a Python program and incorporating the OOP concepts

Module 05 – NumPy for mathematical computing

5.1 Introduction to mathematical computing in Python
5.2 What are arrays and matrices, array indexing, array math, Inspecting a NumPy array, NumPy array manipulation

Hands-on Exercise –

1. How to import NumPy module
2. Creating array using ND-array
3. Calculating standard deviation on array of numbers and calculating correlation between two variables.

Module 06 – SciPy for scientific computing

6.1 Introduction to SciPy, building on top of NumPy
6.2 What are the characteristics of SciPy
6.3 Various sub-packages for SciPy like Signal, Integrate, Fftpack, Cluster, Optimize, Stats and more, Bayes Theorem with SciPy.

Hands-on Exercise:

1. Importing of SciPy
2. Applying the Bayes theorem on the given dataset.

Module 07 – Data manipulation

7.1 What is a data Manipulation. Using Pandas library
7.2 NumPy dependency of Pandas library
7.3 Series object in pandas
7.4 DataFrame in Pandas
7.5 Loading and handling data with Pandas
7.6 How to merge data objects
7.7 Concatenation and various types of joins on data objects, exploring dataset

Hands-on Exercise –

1. Doing data manipulation with Pandas by handling tabular datasets that includes variable types like float, integer, double and others.
2. Cleaning dataset, Manipulating dataset, Visualizing dataset

Module 08 – Data visualization with Matplotlib

8.1 Introduction to Matplotlib
8.2 Using Matplotlib for plotting graphs and charts like Scatter, Bar, Pie, Line, Histogram and more
8.3 Matplotlib API

Hands-on Exercise –
1. Deploying Matplotlib for creating pie, scatter, line and histogram.
2. Subplots and Pandas built-in data visualization.

Module 09 – Machine Learning using Python

9.1 Revision of topics in Python (Pandas, Matplotlib, NumPy, scikit-Learn)
9.2 Introduction to machine learning
9.3 Need of Machine learning
9.4 Types of machine learning and workflow of Machine Learning
9.5 Uses Cases in Machine Learning, its various algorithms
9.6 What is supervised learning
9.7 What is Unsupervised Learning

Hands-on Exercise –

1. Demo on ML algorithms

Module 10 – Supervised learning

10.1 What is linear regression
10.2 Step by step calculation of Linear Regression
10.3 Linear regression in Python
10.4 Logistic Regression
10.5 What is classification
10.6 Decision Tree, Confusion Matrix, Random Forest, Naïve Bayes classifier (Self paced), Support Vector Machine(self paced), XGBoost (self paced)

Hands-on Exercise – Using Python library Scikit-Learn for coming up with Random Forest algorithm to implement supervised learning.

Module 11 – Unsupervised Learning

11.1 Introduction to unsupervised learning
11.2 Use cases of unsupervised learning
11.3 What is clustering
11.4 Types of clustering(self-paced)-Exclusive clustering, Overlapping Clustering, Hierarchical Clustering(self-paced)
11.5 What is K-means clustering
11.6 Step by step calculation of k-means algorithm
11.7 Association Rule Mining(self-paced), Market Basket Analysis(self-paced), Measures in association rule mining(self-paced)-support, confidence, lift
11.8 Apriori Algorithm

Hands-on Exercise –
1. Setting up the Jupyter notebook environment
2. Loading of a dataset in Jupyter
3. Algorithms in Scikit-Learn package for performing Machine Learning techniques and training a model to search a grid.
4. Practice on k-means using Scikit
5. Practice on Apriori

Module 12 – Python integration with Spark (Self paced)

12.1 Introduction to PySpark
12.2 Who uses PySpark, need of spark with python
12.3 PySpark installation
12.4 PySpark fundamentals
12.5 Advantage over MapReduce, PySpark
12.6 Use-cases PySpark and demo.

Hands-on Exercise:
1. Demonstrating Loops and Conditional Statements
2. Tuple – related operations, properties, list, etc.
3. List – operations, related properties
4. Set – properties, associated operations, dictionary – operations, related properties.

Module 13 – Dimensionality Reduction

13.1 Introduction to Dimensionality
13.2 Why Dimensionality Reduction
13.3 PCA
13.4 Factor Analysis
13.5 LDA

Hands-on Exercise –
Practice Dimensionality reduction Techniques : PCA, Factor Analysis, t-SNE, Random Forest, Forward and Backward feature

Module 14 – Time Series Forecasting

14.1 White Noise
14.2 AR model
14.3 MA model
14.4 ARMA model
14.5 ARIMA model
14.6 Stationarity
14.7 ACF & PACF

Hands-on Exercise –
1. Create AR model
2. Create MA model
3. Create ARMA model

Data Science with Python Projects

Analyzing the Trends of COVID-19 With Python

In this project, you will use Pandas to accumulate data from multiple data files, Plotly (visualization library) to create interactive visualizations, and Facebook’s Prophet library to make time series models. You will also be visualizing the prediction by combining these technologies.

Analyzing the Naming Trends Using Python

In this project, you will use Python Programming and Algorithms to understand the applications of data manipulation, extract files with useful data only, and concepts of data visualization. You will be required to analyze baby names by sorting out the top 100 birth counts.

Performing Analysis on Customer Churn Dataset

Through this project, you will be analyzing employment reliability in the telecom industry. The project will require you to work on real-time analysis of data with multiple labels, data visualization for reliability factor, visual analysis of various columns to verify, and plotting charts to substantiate the findings in total.

Netflix-Recommendation System

Analysis of movies dataset and recommendation of movies with respect to ratings. You will be working with the combined data of movies and their ratings, performing data analysis on various labels in the data, finding the distribution of different ratings in the dataset, and training the SVD for the prediction of the model.

Python Web Scraping for Data Science

In this project, you will learn web scraping using Python. You will work on Beautiful Soup, web scraping libraries, common data and page format on the web, the important kinds of objects, Navigable String, the searching tree deployment, navigation options, parser, search tree, searching by CSS class, list, function, and keyword argument.

OOPS in Python

Creating multiple methods using OOPS. You will work on methods like “check_balance’ to check the remaining balance in an account, “withdraw” to withdraw from the bank, find the distribution of different ratings in the dataset, and override the “withdraw” to ensure that the minimum balance is maintained. You will also work with Parameterization and Classes.

Working With NumPy

In this case study, you will be working with the NumPy library to solve various problems in Python. You will create 2D arrays, initialize a NumPy array of 5*5 dimensions, and perform simple arithmetic operations on the two arrays. To carry out this case study successfully, you will have to be familiar with NumPy.

Visualizing and Analyzing the Customer Churn dataset using Python

This case study will require you to analyze data by building aesthetic graphs to make better sense of the data. You will be working with the ggplot2 package, bar plots and its applications, histogram graphs for data analysis, and box-plots and outliers in them.

Building Models With the Help of Machine Learning Algorithms

You will be designing tree-based models on the ‘Heart’ dataset, performing real-time data manipulation on the heart dataset, data-visualization for multiple columnar data, building a tree-based model on top of the database, and designing a probabilistic classification model on the database. You will have to be familiar with ML Algorithms.

Pyspark (Live Course)

Introduction to the Basics of Python

  • Explaining Python and Highlighting Its Importance
  • Setting up Python Environment and Discussing Flow Control
  • Running Python Scripts and Exploring Python Editors and IDEs

Sequence and File Operations

  • Defining Reserve Keywords and Command Line Arguments
  • Describing Flow Control and Sequencing
  • Indexing and Slicing
  • Learning the xrange() Function
  • Working Around Dictionaries and Sets
  • Working with Files

Functions, Sorting, Errors and Exception, Regular Expressions, and Packages

  • Explaining Functions and Various Forms of Function Arguments
  • Learning Variable Scope, Function Parameters, and Lambda Functions
  • Sorting Using Python
  • Exception Handling
  • Package Installation
  • Regular Expressions

Python: An OOP Implementation

  • Using Class, Objects, and Attributes
  • Developing Applications Based on OOP
  • Learning About Classes, Objects and How They Function Together
  • Explaining OOPs Concepts Including Inheritance, Encapsulation, and Polymorphism, Among Others

Debugging and Databases

  • Debugging Python Scripts Using pdb and IDE
  • Classifying Errors and Developing Test Units
  • Implementing Databases Using SQLite
  • Performing CRUD Operations

Introduction to Big Data and Apache Spark

  • What is Big Data?
  • 5 V’s of Big Data
  • Problems related to Big Data: Use Case
  • What tools available for handling Big Data?
  • What is Hadoop?
  • Why do we need Hadoop?
  • Key Characteristics of Hadoop
  • Important Hadoop ecosystem concepts
  • MapReduce and HDFS
  • Introduction to Apache Spark
  • What is Apache Spark?
  • Why do we need Apache Spark?
  • Who uses Spark in the industry?
  • Apache Spark architecture
  • Spark Vs. Hadoop
  • Various Big data applications using Apache Spark

Python for Spark

  • Introduction to PySpark
  • Who uses PySpark?
  • Why Python for Spark?
  • Values, Types, Variables
  • Operands and Expressions
  • Conditional Statements
  • Loops
  • Numbers
  • Python files I/O Functions
  • Strings and associated operations
  • Sets and associated operations
  • Lists and associated operations
  • Tuples and associated operations
  • Dictionaries and associated operations

Hands-On:

  • Demonstrating Loops and Conditional Statements
  • Tuple – related operations, properties, list, etc.
  • List – operations, related properties
  • Set – properties, associated operations
  • Dictionary – operations, related properties

Python for Spark: Functional and Object-Oriented Model

  • Functions
  • Lambda Functions
  • Global Variables, its Scope, and Returning Values
  • Standard Libraries
  • Object-Oriented Concepts
  • Modules Used in Python
  • The Import Statements
  • Module Search Path
  • Package Installation Ways

Hands-On:

  • Lambda – Features, Options, Syntax, Compared with the Functions
  • Functions – Syntax, Return Values, Arguments, and Keyword Arguments
  • Errors and Exceptions – Issue Types, Remediation
  • Packages and Modules – Import Options, Modules, sys Path

Apache Spark Framework and RDDs

  • Spark Components & its Architecture
  • Spark Deployment Modes
  • Spark Web UI
  • Introduction to PySpark Shell
  • Submitting PySpark Job
  • Writing your first PySpark Job Using Jupyter Notebook
  • What is Spark RDDs?
  • Stopgaps in existing computing methodologies
  • How RDD solve the problem?
  • What are the ways to create RDD in PySpark?
  • RDD persistence and caching
  • General operations: Transformation, Actions, and Functions
  • Concept of Key-Value pair in RDDs
  • Other pair, two pair RDDs
  • RDD Lineage
  • RDD Persistence
  • WordCount Program Using RDD Concepts
  • RDD Partitioning & How it Helps Achieve Parallelization
  • Passing Functions to Spark

Hands-On:

  • Building and Running Spark Application
  • Spark Application Web UI
  • Loading data in RDDs
  • Saving data through RDDs
  • RDD Transformations
  • RDD Actions and Functions
  • RDD Partitions
  • WordCount program using RDD’s in Python

PySpark SQL and Data Frames

  • Need for Spark SQL
  • What is Spark SQL
  • Spark SQL Architecture
  • SQL Context in Spark SQL
  • User-Defined Functions
  • Data Frames
  • Interoperating with RDDs
  • Loading Data through Different Sources
  • Performance Tuning
  • Spark-Hive Integration

Hands-On:

  • Spark SQL – Creating data frames
  • Loading and transforming data through different sources
  • Spark-Hive Integration

Apache Kafka and Flume

  • Why Kafka
  • What is Kafka?
  • Kafka Workflow
  • Kafka Architecture
  • Kafka Cluster Configuring
  • Kafka Monitoring tools
  • Basic operations
  • What is Apache Flume?
  • Integrating Apache Flume and Apache Kafka

Hands-On:

  • Single Broker Kafka Cluster
  • Multi-Broker Kafka Cluster
  • Topic Operations
  • Integrating Apache Flume and Apache Kafka

PySpark Streaming

  • Introduction to Spark Streaming
  • Features of Spark Streaming
  • Spark Streaming Workflow
  • StreamingContext Initializing
  • Discretized Streams (DStreams)
  • Input DStreams, Receivers
  • Transformations on DStreams
  • DStreams Output Operations
  • Describe Windowed Operators and Why it is Useful
  • Stateful Operators
  • Vital Windowed Operators
  • Twitter Sentiment Analysis
  • Streaming using Netcat server
  • WordCount program using Kafka-Spark Streaming

Hands-On:

  • Twitter Sentiment Analysis
  • Streaming using Netcat server
  • WordCount program using Kafka-Spark Streaming
  • Spark-flume Integration

Introduction to PySpark Machine Learning

  • Introduction to Machine Learning- What, Why and Where?
  • Use Case
  • Types of Machine Learning Techniques
  • Why use Machine Learning for Spark?
  • Applications of Machine Learning (general)
  • Applications of Machine Learning with Spark
  • Introduction to MLlib
  • Features of MLlib and MLlib Tools
  • Various ML algorithms supported by MLlib
  • Supervised Learning Algorithms
  • Unsupervised Learning Algorithms
  • ML workflow utilities

Hands-On:

  • K- Means Clustering
  • Linear Regression
  • Logistic Regression
  • Decision Tree
  • Random Forest

MongoDB (Live Course)

Introduction to NoSQL and MongoDB

RDBMS, types of relational databases, challenges of RDBMS, NoSQL database, its significance, how NoSQL suits Big Data needs, introduction to MongoDB and its advantages, MongoDB installation, JSON features, data types and examples

MongoDB Installation

Installing MongoDB, basic MongoDB commands and operations, MongoChef (MongoGUI) installation and MongoDB data types

Hands-on Exercise: Install MongoDB and install MongoChef (MongoGUI)

Importance of NoSQL

The need for NoSQL, types of NoSQL databases, OLTP, OLAP, limitations of RDBMS, ACID properties, CAP Theorem, Base property, learning about JSON/BSON, database collection and documentation, MongoDB uses, MongoDB write concern—acknowledged, replica acknowledged, unacknowledged, journaled—and Fsync

Hands-on Exercise: Write a JSON document

CRUD Operations

Understanding CRUD and its functionality, CRUD concepts, MongoDB query and syntax and read and write queries and query optimization

Hands-on Exercise:Use insert query to create a data entry, use find query to read data, use update and replace queries to update and use delete query operations on a DB file

Data Modeling and Schema Design

Concepts of data modelling, difference between MongoDB and RDBMS modelling, model tree structure, operational strategies, monitoring and backup

Hands-on Exercise: Write a data model tree structure for a family hierarchy

Data Management and Administration

In this module, you will learn MongoDB® Administration activities such as health check, backup, recovery, database sharding and profiling, data import/export, performance tuning, etc.

Hands-on Exercise: Use shard key and hashed shard keys, perform backup and recovery of a dummy dataset, import data from a CSV file and export data to a CSV file

Data Indexing and Aggregation

Concepts of data aggregation and types and data indexing concepts, properties and variations

Hands-on Exercise: Do aggregation using pipeline, sort, skip and limit and create index on data using single key and using multi-key

MongoDB Security

Understanding database security risks, MongoDB security concept and security approach and MongoDB integration with Java and Robomongo

Hands-on Exercise: MongoDB integration with Java and Robomongo

Working with Unstructured Data

Implementing techniques to work with variety of unstructured data like images, videos, log data and others and understanding GridFS MongoDB file system for storing data

Hands-on Exercise: Work with variety of unstructured data like images, videos, log data and others

What projects I will be working on this MongoDB training?

Project: Working with the MongoDB Java Driver

Industry: General

Problem Statement: How to create table for video insertion using Java

Topics: In this project, you will work with MongoDB Java Driver and become proficient in creating a table for inserting video using Java programming. You will work with collections and documents and understand the read and write basics of MongoDB database and the Java virtual machine libraries.

Highlights:

  • Setting up MongoDB JDBC Driver
  • Connecting to the database
  • Java virtual machine libraries

AWS Big Data (Live Course)

Introduction to Big Data and Data Collection

  • Introduction to Big Data
  • Big Data tools available in AWS
  • Why Big Data on AWS?
  • What is AWS Kinesis?
  • How Kinesis works?
  • Features of AWS Kinesis
  • AWS Kinesis Components
  • Kinesis Data Streams
  • Enhanced Fan-Out in AWS Kinesis
  • Kinesis Data Firehose
  • Amazon SQS
  • AWS Data Pipeline

Hands-on Exercise:

Creating, Deleting, Managing an AWS Kinesis Stream.

Introduction to Cloud Computing & AWS

  • What is Cloud Computing
  • Cloud Service & Deployment Models
  • How AWS is the leader in the cloud domain
  • Various cloud computing products offered by AWS
  • Introduction to AWS S3, EC2, VPC, EBS, ELB, AMI
  • AWS architecture and the AWS Management Console, virtualization in AWS (Xen hypervisor)
  • What is auto-scaling
  • AWS EC2 best practices and cost involved.

Hands-on Exercise – Setting up of AWS account, how to launch an EC2 instance, the process of hosting a website and launching a Linux Virtual Machine using an AWS EC2 instance.

Elastic Compute and Storage Volumes

  • Introduction to EC2
  • Regions & Availability Zones(AZs)
  • Pre-EC2, EC2 instance types
  • Comparing Public IP and Elastic IP
  • Demonstrating how to launch an AWS EC2 instance
  • Introduction to AMIs, Creating and Copying an AMI
  • Introduction to EBS
  • EBS volume types
  • EBS Snapshots
  • Introduction to EFS
  • Instance tenancy- Reserved and Spot instances
  • Pricing and Design Patterns.

Hands-on Exercise –

  • Launching an EC2 instance
  • Creating an AMI of the launched instance
  • Copying the AMI to another region
  • Creating an EBS volume
  • Attaching the EBS volume with an instance
  • Taking backup of an EBS volume
  • Creating an EFS volume and mounting the EFS volume to two instances.

Virtual Private Cloud

  • What is Amazon VPC,
  • VPC as a networking layer for EC2,
  • IP address and CIDR notations,
  • Components of VPC – network interfaces, route tables, internet gateway, NAT,
  • Security in VPC – security groups and NACL, types of VPC, what is a subnet, VPC peering with scenarios, VPC endpoints, VPC pricing and design patterns.

Hands-on Exercise –

  • Creating a VPC and subnets,
  • Creating a 3 Tier architecture with security groups,
  • NACL, Internet gateway and NAT gateway,
  • Creating a complete VPC architecture.

Storage – Simple Storage Service (S3)

  • Introduction to AWS storage
  • Pre-S3 – online cloud storage
  • API, S3 consistency models
  • Storage hierarchy, buckets in S3
  • Objects in S3, metadata and storage classes, object versioning, object lifecycle management, cross-region replication, data encryption, connecting using VPC endpoint, S3 pricing.

Hands-on Exercise –

  • Creating an S3 bucket
  • Uploading objects to the S3 bucket
  • Enabling object versioning in the S3 bucket
  • Setting up lifecycle management for only a few objects
  • Setting up lifecycle management for all objects with the same tag
  • Static website hosting using S3.

Databases and In-Memory DataStores

  • What is a database, types of databases, databases on AWS
  • Introduction to Amazon RDS
  • Multi-AZ deployments, features of RDS
  • Read replicas in RDS, reserved DB instances
  • RDS pricing and design patterns
  • Introduction to Amazon Aurora, benefits of Aurora, Aurora pricing and design patterns
  • Introduction to DynamoDB, components of DynamoDB, DynamoDB pricing and design patterns
  • What is Amazon Redshift, advantages of Redshift
  • What is ElastiCache, why ElastiCache.

Hands-on Exercise –

  • Launching a MySQL RDS instance
  • Modifying an RDS instance
  • Connecting to the DB instance from your machine
  • Creating a multi-az deployment
  • Create an Aurora DB cluster
  • Creating an Aurora replica
  • Creating a DynamoDB table.

Data Storage

  • What is S3 Glacier?
  • Accessing Amazon S3 Glacier
  • Glacier Vaults
  • Glacier Archives
  • What is Amazon DynamoDB?
  • How does DynamoDB work?
  • Accessing DynamoDB through Portal and CLI
  • DynamoDB Tables and Items
  • DynamoDB Indexes
  • DynamoDB Streams and Replication
  • Dynamo Backup and Restore
  • DynamoDB Best Practices
  • Introduction to RDS
  • Basics of RDS

Hands-on Exercise:

Creating a table and loading data, Replicating data to another table and backing up, Creating a MySQL database.

Data Processing

  • Amazon EMR
  • Apache Hadoop
  • Hue with EMR
  • HBase with EMR
  • Spark with EMR
  • AWS Lambda for Big Data Ecosystem
  • Hcatalog
  • Glue
  • Glue Lab

Hands-on Exercise:

EMR Cluster creation, Adding steps to EMR, Using Hue with EMR, Using HBase with EMR, Using Spark with EMR, Using HCatalog with Hive on EMR, Using Glue.

Data Analysis

  • What is Amazon Redshift?
  • Data Warehouse System Architecture
  • Redshift Concepts
  • Designing tables
  • Loading Data to Redshift
  • Redshift Workload Management
  • Tuning Query Performance
  • Best Practices using Redshift
  • Amazon Machine Learning
  • Amazon ML Key Concepts
  • Using Amazon ML
  • What is Amazon Athena?
  • When should you use Athena?
  • Running Queries using Athena
  • What Is Amazon Elasticsearch Service?
  • Features of Amazon Elasticsearch Service
  • ES Domains

Hands-on Exercise:

Creating a Redshift Cluster, Creating Read Replicas,  Loading data into the cluster, Running queries using the Redshift Query Editor, Backing up the cluster, Running a sample ML model in Amazon ML, Creating a database in Athena and running queries.

Data Visualization and Data Security

  • What is Amazon QuickSight?
  • How does Amazon QuickSight work?
  • QuickSight SPICE
  • Setting up Amazon QuickSight
  • Data Sources and Data Sets
  • Creating your own Analysis in QuickSight
  • QuickSight Visualization
  • QuickSight Dashboards
  • Security Best Practices
  • EMR Security
  • Redshift Security
  • Introduction to Microstrategy

Hands-on Exercise:

Setting up Amazon QuickSight, Creating a Data Set in QuickSight, Creating various Visualizations using the data set, Creating a QuickSight dashboard of the created visuals.

What are the projects I will be working on during this AWS Big Data certification training?

Project 1: Integration of Big Data with AWS

Domain: Cloud

Problem Statement: Integrate your organization’s Big Data to the AWS cloud to make it accessible

Topics: ML Regression Model, Redshift, and Glacier

Highlights:

  • Implementing ML regression Model
  • Adapting Redshift
  • Using Glacier

Project 2: Big Data Analysis

Domain: Big Data

Problem Statement: As an IT professional, analyze the Big Data of your organization

Topics: Kinesis Data Streams, Apache Hadoop, Amazon QuickSight

Highlights:

  • Understanding Kinesis Data Streams
  • Utilizing Apache Hadoop
  • Handling Amazon QuickSight visualization

Hadoop Testing (Self-paced)

Introduction to Hadoop and Its Ecosystem, MapReduce and HDFS

Introduction to Hadoop and its constituent ecosystem, understanding MapReduce and HDFS, Big Data, factors constituting Big Data, Hadoop and Hadoop Ecosystem, MapReduce: concepts of Map, Reduce, ordering, concurrency, shuffle and reducing, Hadoop Distributed File System (HDFS) concepts and its importance, deep dive into MapReduce, execution framework, partitioner, combiner, data types, key pairs, HDFS deep dive: architecture, data replication, name node, data node, dataflow, parallel copying with DISTCP and Hadoop archives

Hands-on Exercises:

Installing Hadoop in pseudo-distributed mode, understanding important configuration files, their properties and Demon Threads, accessing HDFS from Command Line, MapReduce: basic exercises, understanding Hadoop ecosystem, introduction to Sqoop, use cases and installation, introduction to Hive, use cases and installation, introduction to Pig, use cases and installation, introduction to Oozie, use cases and installation, introduction to Flume, use cases and installation and introduction to YarnMini Project:

Importing MySQL data using Sqoop and querying it using Hive

MapReduce

How to develop a MapReduce application, writing unit test, the best practices for developing and writing and debugging MapReduce applications

Introduction to Pig and Its Features

What is Pig, Pig’s features, Pig use cases, interacting with Pig, basic data analysis with Pig, Pig Latin Syntax, loading data, simple data types, field definitions, data output, viewing the schema, filtering and sorting data and commonly-used functions

Hands-on Exercise: Using Pig for ETL processing

Introduction to Hive

What is Hive, Hive schema and data storage, comparing Hive to traditional databases, Hive vs. Pig, Hive use cases, interacting with Hive, relational data analysis with Hive, Hive databases and tables, Basic HiveQL Syntax, data types, joining data sets and common built-in functions

Hands-on Exercise: Running Hive queries on the Shell, Scripts and Hue

Hadoop Stack Integration Testing

Why Hadoop testing is important, unit testing, integration testing, performance testing, diagnostics, nightly QA test, benchmark and end-to-end tests, functional testing, release certification testing, security testing, scalability testing, commissioning and decommissioning of data nodes testing, reliability testing and release testing

Roles and Responsibilities of Hadoop Testing

Understanding the requirement, preparation of the testing estimation, test cases, test data, test bed creation, test execution, defect reporting, defect retest, daily status report delivery, test completion, ETL testing at every stage (HDFS, Hive and HBase) while loading the input (logs, files, records, etc.) using Sqoop/Flume which includes but not limited to data verification, reconciliation, user authorization and authentication testing (groups, users, privileges, etc.), report defects to the development team or manager and driving them to closure, consolidate all the defects and create defect reports and validating new feature and issues in core Hadoop

Framework Called MRUnit for Testing of MapReduce Programs

Report defects to the development team or manager and driving them to closure, consolidate all the defects and create defect reports, validating new feature and issues in core Hadoop and responsible for creating a testing framework called MRUnit for testing of MapReduce programs

Unit Testing

Automation testing using the Oozie and data validation using the query surge tool

Test Execution of Hadoop: Customized

Test plan for HDFS upgrade and test automation and result

Test Plan Strategy Test Cases of Hadoop Testing

How to test install and configure

What projects I will be working on this Hadoop Testing training?

Project Works

Project 1: Working with MapReduce, Hive and Sqoop

Problem Statement: It describes how to import MySQL data using Sqoop and querying it using hive and also describes how to run the word count MapReduce job.

Project 2: Testing Hadoop Using MRUnit

Industry: General

Problem Statement: How to test the Hadoop application using MRUnit testing

Topics: This project involves working with MRUnit for testing the Hadoop application without spinning a cluster. You will learn how to do the map and reduce test in an application.

Highlights:

  • Hadoop testing in isolation using MRUnit
  • Craft the test input and push through mapper and reducer
  • Deploy MapReduce driver

Apache Storm (Self-paced)

Understanding the Architecture of Storm

Big Data characteristics, understanding Hadoop distributed computing, the Bayesian Law, deploying Storm for real-time analytics, Apache Storm features, comparing Storm with Hadoop, Storm execution and learning about Tuple, Spout and Bolt.

Installation of Apache Storm

Installing Apache Storm and various types of run modes of Storm.

Introduction to Apache Storm

Understanding Apache Storm and the data model.

Apache Kafka Installation

Installation of Apache Kafka and its configuration.

Apache Storm Advanced

Understanding advanced Storm topics like Spouts, Bolts, Stream Groupings and Topology and its life cycle and learning about guaranteed message processing

Storm Topology

Various grouping types in Storm, reliable and unreliable messages, Bolt structure and life cycle, understanding Trident topology for failure handling, process and call log analysis topology for analyzing call logs for calls made from one number to another.

Overview of Trident

Understanding of Trident spouts and its different types, various Trident spout interface and components, familiarizing with Trident filter, aggregator and functions and a practical and hands-on use case on solving call log problem using Storm Trident

Storm Components and Classes

Various components, classes and interfaces in Storm like Base Rich Bolt Class, i RichBolt Interface, i RichSpout Interface and Base Rich Spout Class and various methodologies of working with them.

Cassandra Introduction

Understanding Cassandra, its core concepts, its strengths and deployment.

Boot Stripping

Twitter Boot Stripping, detailed understanding of Boot Stripping, concepts of Storm, Storm development environment.

What projects I will be working on this Apache Storm training?

Topics: In this project, you will be working on call logs to decipher the data and gather valuable insights using Apache Storm Trident. You will extensively work with data about calls made from one number to another. The aim of this project is to resolve the call log issues with Trident stream processing and low latency distributed querying. You will gain hands-on experience in working with Spouts and Bolts, along with various Trident functions, filters, aggregation, joins and grouping.

Project 2: Twitter Data Analysis Using Trident

Topics: This is a project that involves working with Twitter data and processing it to extract patterns out of it. The Apache Storm Trident is the perfect framework for the real-time analysis of tweets. While working with Trident, you will be able to simplify the task of live Twitter feed analysis. In this project, you will gain real-world experience of working with Spouts, Bolts and Trident filters, joins, aggregation, functions and grouping.

Project 3: The US Presidential Election Results Analysis Using Trident DRPC Query

Topics: This is a project that lets you work on the US presidential election results and predict who is leading and trailing on a real-time basis. For this, you exclusively work with Trident distributed remote procedure call server. After the completion of the project, you will learn how to access data residing in a remote computer or network and deploy it for real-time processing, analysis and prediction.

Apache Kafka (Self-paced)

What is Kafka – An Introduction

Understanding what is Apache Kafka, the various components and use cases of Kafka, implementing Kafka on a single node.

Multi Broker Kafka Implementation

Learning about the Kafka terminology, deploying single node Kafka with independent Zookeeper, adding replication in Kafka, working with Partitioning and Brokers, understanding Kafka consumers, the Kafka Writes terminology, various failure handling scenarios in Kafka.

Multi Node Cluster Setup

Introduction to multi node cluster setup in Kafka, the various administration commands, leadership balancing and partition rebalancing, graceful shutdown of kafka Brokers and tasks, working with the Partition Reassignment Tool, cluster expending, assigning Custom Partition, removing of a Broker and improving Replication Factor of Partitions.

Integrate Flume with Kafka

Understanding the need for Kafka Integration, successfully integrating it with Apache Flume, steps in integration of Flume with Kafka as a Source.

Kafka API

Detailed understanding of the Kafka and Flume Integration, deploying Kafka as a Sink and as a Channel, introduction to PyKafka API and setting up the PyKafka Environment.

Producers & Consumers

Connecting Kafka using PyKafka, writing your own Kafka Producers and Consumers, writing a random JSON Producer, writing a Consumer to read the messages from a topic, writing and working with a File Reader Producer, writing a Consumer to store topics data into a file.

What projects I will be working on this Kafka training?

Type : Multi Broker Kafka Implementation

Topics : In this project you will learn about the Apache Kakfa which is a platform for handling real-time data feeds. You will exclusively work with Kafka brokers, understand partitioning, Kafka consumers, the terminology used for Kafka writes and failure handling in Kafka, understand how to deploy a single node Kafka with independent Zookeeper. Upon completion of the project you will gain considerable experience in working in a real world scenario for processing streaming data within an enterprise infrastructure.

Apache Cassandra (Self-paced)

Advantages and Usage of Cassandra

Introduction to Cassandra, its strengths and deployment areas

CAP Theorem and No SQL DataBase

Significance of NoSQL, RDBMS Replication, Key Challenges, types of NoSQL, benefits and drawbacks, salient features of NoSQL database. CAP Theorem, Consistency.

Cassandra fundamentals, Data model, Installation and setup

Installationintroduction to Cassandra, key concepts and deployment of non relational database, column-oriented database, Data Model – column, column family,

Cassandra Configuration

Token calculation, Configuration overview, Node tool, Validators, Comparators, Expiring column, QA

Summarization, node tool commands, cluster, Indexes, Cassandra & MapReduce, Installing Ops-center

How Cassandra modelling varies from Relational database modelling, Cassandra modelling steps, introduction to Time Series modelling, comparing Column family Vs. Super Column family, Counter column family, Partitioners, Partitioners strategies, Replication, Gossip protocols, Read operation, Consistency, Comparison

Multi Cluster setup

Creation of multi node cluster, node settings, Key and Row cache, System Key space, understanding of Read Operation, Cassandra Commands overview, VNodes, Column family

Thrift/Avro/Json/Hector Client

JSON, Hector client, AVRO, Thrift, JAVA code writing method, Hector tag

Datastax installation part,· Secondary index

Cassandra management, commands of node tool, MapReduce and Cassandra, Secondary index, Datastax Installation

Advance Modelling

Rules of Cassandra data modelling, increasing data writes, duplication, and reducing data reads, modelling data around queries, creating table for data queries

Deploying the IDE for Cassandra applications

Understanding the Java application creation methodology, learning key drivers, deploying the IDE for Cassandra applications,cluster connection and data query implementation

Cassandra Administration

Learning about Node Tool Utility, cluster management using Command Line Interface, Cassandra management and monitoring via DataStax Ops Center.

Cassandra API and Summarization and Thrift

Cassandra client connectivity, connection pool internals, API, important features and concepts of Hector client, Thrift, JAVA code, Summarization.

What projects I will be working on this Cassandra training?

Type : Deploying the IDE for Cassandra applications

Topics : This project gives you a hands-on experience in installing and working with Apache Cassandra which is a high performance and extremely scalable database for distributed data with no single point of failure. You will deploy the Java Integrated Development Environment for running Cassandra, learn about the key drivers, work with Cassandra applications in a cluster setup and implement data querying techniques.

Java (Self-paced)

Core Java Concepts

Introduction to Java Programming, Defining Java, Need for Java, Platform Independent in Java, Define JRE,JVM, JDK, Important Features and Evolution of Java

Writing Java Programs using Java Principles

Overview of Coding basics, Setting up the required environment, Knowing the available IDEs, Writing a Basic-level Java Program, Define Package, What are Java Comments?, Understanding the concept of Reserved Words, Introduction to Java Statements, What are Blocks in Java, Explain a Class, Different Methods

Language Conceptuals

Overview of the Language, Defining Identifiers, What are Constraints and Variables, What is an Encoding Set?, Concept of Separators, Define Primitives, How to make Primitive Conversions?, Various Operators in Java

Operating with Java Statements

Module Overview, Learn how to write If Statement, Understanding While Statement, Working with Do-while Statement, How to use For Statement?, Using Break Statement, What is Continue Statement, Working of Switch Statement

Concept of Objects and Classes

General Review of the Module, Defining Object and Classes in Java, What are Encapsulation, Static Members and Access Control?, Use and importance of ‘this’ Keyword, Deining Method Overloading with an example, ‘By Value’ vs. ‘By Reference’, Loading, Defining Initialization and Linking, How to Compare Objects in Java?, What is Garbage Collector?

Introduction to Core Classes

General Review, Concept of Object in Java, Define Core Class, What is System?, Explain String Classes, How do Arrays work?, Concept of Boxing & Unboxing, Use of ‘varargs’, ‘format’ and ‘printf’ Methods

Inheritance in Java

Introduction, Define Inheritance with an example, Accessibility concept, Method Overriding, Learning how to call a Superclass’ Constructor, What is Type Casting?, Familiarity with ’instanceof’ Keyword

Exception Handling in Detail

Getting started with exception Handling, Defining an Exception, How to use Constructs to deal with exceptions?, Classification of exceptions, Throw Exceptions, How to create an exception class?, stack Trace analysis

Getting started with Interfaces and Abstract Classes

General Review, Defining Interface, Use and Create and Interface, Concept of Extending interfaces, How to implement multiple interfaces?, What are abstract classes?, How to create and use abstract classes?, Comparison between interface and abstract classes, Concept of Nested Classes, What are Nested Classes?, Nested Classes Types, Working of an Inner Class, What is a Local Inner Class?, Anonymous Classes in java, What is a Static Nested Class

Overview of Nested Classes

What are Nested Classes?, Types of Nested Classes, What is an Inner Class?, Understanding local inner class, Anonymous Inner Class, Nested Class – Static

Getting started with Java Threads

What is a Thread?, How to create and start a Thread?, States of a Thread, Blocking the Execution of a Thread, Concept of Sleep Thread, Understanding the priorities in a thread, Synchronisation in Java Threads, Interaction between threads

Overview of Java Collections

Introduction to Collection Framework, Preeminent Interfaces, What are Comparable and Comparator?, Working with Lists, Working with Maps, Working with Sets, Working with Queues

Understanding JDBC

Define JDBC, Different types of Drivers, How to access the drivers?, What is Connection in Java?, What is a Statement?, Explaining CRUD Operations with examples, Prepared Statement and Callable Statement

Java Generics

Overview of important topics included, Important and Frequently-Used Features, Defining Generic List, What is Generic Map in Java?, Java Generic Classes & Methods, For Loop Generic, What is Generic Wild Card?

Input/Output in Java

Brief Introduction, Learning about Input and output streams in java, Concept of byte Oriented Streams, Defining Character Oriented Streams?, Explain Object Serialisation, Input and Output Based on Channel

Getting started with Java Annotations

Introduction and Definition of Annotations, How they are useful for Java programmers?, Placements in Annotations, What are Built-in Java Annotations, Defining Custom Annotations

Reflection and its Usage

Getting started, Define Java Reflection?, What is a Class Object?, Concept of Constructors, Using Fields, Applying Methods, Implementing Annotations in Your Java Program

What projects I will be working on this Java training?

Project – Library Management System

Problem Statement – It creates library management system project which includes following functionalities:

Add book, Add Member, Issue Book, Return Book, Available Book etc.