Big Data Training in Chennai
Are you looking for training in big data analytics? BITA Academy provides Big Data Training in Chennai, enabling you to obtain an in-depth understanding of data analysis. You will gain the ability to analyze large sets of data, extract information (such as hidden patterns, data connectedness, market trends, and customer preferences) and then help individuals or businesses make decisions based on the trends that have been analyzed.
What is Big Data Analytics?
Big data describes data sets too big or intricate for conventional data-processing application software to handle. Big data analytics is the application of cutting-edge analytical methods to massive, heterogeneous data sets that comprise structured, semi-structured, and unstructured information. These data sets can range in size from terabytes to zettabytes and come from many sources.
Roles and Responsibilities of Big Data Analytics
- Prominent data analysts must locate, gather, analyze, visualize, and communicate market data to inform future business decisions.
- Data extraction from primary and secondary sources using automated technologies
- cleaning up corrupted data, resolving coding issues, and other relevant issues
- Rearranging data in a usable way through the creation and Maintenance of databases and data systems
- Analyzing data to determine its value and quality
- Review reports and performance indicators to filter data and find and fix coding issues.
- It is finding, analyzing, and interpreting patterns and trends in large, complicated data sets that may be useful for diagnosis and forecasting using statistical tools.
- It creates reports for the management, including projections, trends, and patterns based on pertinent data.
Syllabus of Big Data Training in Chennai
PART 1 : INTRODUCTION
- CCA 175 Spark and Hadoop Developer
PART 2 : GET STARTED
- Introduction and Curriculum
- How to Set up Environment in different ways
-Options
-Local
-Cloudera Quickstart VM - -Using Windows
-Putty and WinSCP
-Cygwin
- HDFS Quick Preview
- YARN Quick Preview
- Setup Data Sets
PART 3 : Big Data HADOOP CONCEPTS
- Hadoop Commands
PART 4 : FUNDAMENTALS of SCALA
- Introduction and Setting up of Scala
- How to Setup Scala on Windows
- Understand Basic Programming Constructs
Know about Function - Object Oriented Concepts
- Types of Collections – Sequence, Set and Map
- Understand how to filter and sort in mapreduce program
- Set up Data Sets for Basic I/O Operations
- Basic I/O Operations
- How to use Scala Collections APIs
- Understand the concepts of Tuples
- Understand Development Cycle
-Develop Source code
-Compile the source code to jar using SBT
-Setup SBT on Windows
-Compile changes and run jar with arguments
-Setup IntelliJ with Scala
-Develop Scala application using SBT in IntelliJ
PART 5 : DATA INGESTION – APACHE SQOOP
- Introduction and Objective
- How to Access Sqoop Documentation
- Preview of MySQL on labs
- Sqoop connect string and validate using list commands
- Run queries in MySQL using eval
- Sqoop Import
-Simple Import
-Execution Life Cycle
-Manage Directories
-Use split by
-auto reset to one mapper
-Different file formats
-How to Use compression
-How to Use Boundary Query
-columns and query
-Delimiters and handling nulls
-Incremental Loads - Sqoop Import – Hive
-Create Hive Database
-Simple Hive Import
-Managing Hive tablesImport all tables - Role of Sqoop in typical data processing life cycle
- Sqoop Export
-Simple export with delimiters
-Lets Understand export behaviour
-Column Mapping
-Update and insert
-Stage Tables
PART 6 : TRANSFORM,STAGE,STORE – SPARK
- Introduction to Spark
How to Set up Spark on Windows
overview about Spark documentation - Initialise Spark job using spark shell
- Create and preview data from Resilient Distributed Data Sets (RDD)
- How to Read different file formats
Transformations Overview - Manipulate Strings
- Row level transformations using map and flatM
- So How to Filter the data
- How to Join data sets
- Aggregations
-Getting Started
-using actions (reduce and countByKey)
-understanding combiner
-least preferred API for aggregations
-How to use reduceByKey
-How to use aggregateByKey - Sort data using sortByKey
- Global Ranking
- Key Ranking
- So How to Utilise Get topNPrices and Get topNPricedProducts
How to Get topNproducts by category using groupByKey, flatMap and Scala function - Set Operations – union, intersect, distinct as well as minus
- Save data in Text Input Format using Compression
- Save data in standard file formats
- Revision of Problem Statement and Design the solution
- Steps for Solution – Get Daily Revenue per Product
-Launch Spark Shell
-Read and join orders and order_items
-Compute daily revenue per product id
-Read products data and create RDD
-Sort and save to HDFS
-Add spark dependencies to sbt
-So Develop as Scala based application
-Run in local host using spark-submit
-Ship and run it on big data cluster
PART 7 : DATA ANALYSIS – SPARK SQL OR HQL
- How to run Hive queries through different interfaces
- Create Hive tables
- How to load data in text file format
- Do you know to load ORC file format
- How to Use spark-shell
- Functions
- So How to Manipulate Strings and Dates in Functions
- So How to Use Aggregation and CASE in Functions
- Understand the ways to do Row level transformations
- Joins
- Aggregations
- Sorting
- Analytics Functions
- Windowing Functions
- So Create Data Frame and Register as Temp table
- How to Write Spark SQL Applications
Dataframe Operations for Analytics
PART 8 : Data Ingest – Real time, near real time and streaming analytics
- Introduction
- Overview of Flume
- Flume – Web Server Logs to HDFS
-Introduction
-Setup Data
-Source execution
-Deep dive to memory channel - Flume – Web Server Logs to HDFS – Sink HDFS
-Getting Started
-Customize properties - High Level Architecture of Kafka
- Flume and Kafka in Streaming analytics
- Spark Streaming
-Overview
-Set up netcat
-Develop Word Count program
-Ship and run word count program on the cluster
-Data Structure (DStream) and APIs overview - How to stream data pipelines through Project Demo
- Flume and Kafka integration
-Develop configuration file
-Run and validate - Know about Kafka and Spark Streaming
-Add dependencies
-Develop and build application
-Run and Validate
PART 9 : Sample scenarios with solutions
- Introduction to Sample Scenarios and Solutions
- Problem Statements
- Initialise the job
- So How to Get crime count per type per month
-Lets Understand the Data
-Implement Core API and Data Frames logic
-Validate the Output - So How to Get inactive customers
-How to Use Core Spark API (left Outer Join)
-How to Use Data Frames and SQL
- Top 3 crimes in RESIDENCE
-How to Use Core Spark API
-How to Use Data Frames and SQL - Convert NYSE data from text file format to parquet file format
- Get word count – with custom control arguments, num keys and file format