The Big Data and Hadoop training course from Tonar Systems is designed to enhance your knowledge and skills to become a successful Hadoop developer. In-depth knowledge of core concepts will be covered in the course along with implementation on varied industry use-cases.
Who should go for this course? Today, Hadoop has become a cornerstone of every business technology professional. To stay ahead in the game, Hadoop has become a must-know technology for the following professionals: 1. Analytics professionals 2. BI /ETL/DW professionals 3. Project managers 4. Testing professionals 5. Mainframe professionals 6. Software developers and architects 7. Graduates aiming to build a successful career around Big Data
Why learn Big Data and Hadoop? CIOs are making Hadoop their platform of choice in 2015. For better career prospects, bigger job opportunities and financial growth, Hadoop is a must-know.
By the end of the course, you will: 1. Master the concepts of HDFS and MapReduce framework 2. Understand Hadoop 2.x Architecture 3. Setup Hadoop Cluster and write Complex MapReduce programs 4. Learn data loading techniques using Sqoop and Flume 5. Perform data analytics using Pig, Hive and YARN 6. Implement HBase and MapReduce integration 7. Implement Advanced Usage and Indexing 8. Schedule jobs using Oozie 9. Implement best practices for Hadoop development 10. Work on a real
Introduction to Big Data and Hadoop
What is Big Data?
Why all industries are talking about Big Data?
What are the issues in Big Data? o Storage
What are the challenges for storing big data? o Processing
What are the challenges for processing big data?
What are the technologies support big data?
o Hadoop o Data Bases
What is Hadoop?
History of hadoop?
Advantages and Disadvantages of hadoop?
Importance of different Ecosystems of hadoop?
Importance if Integration with other Bigdata solutions.
HDFS(Hadoop Distributed File System)
Name Node o Importance of name node o What are the roles of name node o What are the drawbacks in name
Secondary Name Node o Importance of secondary name node
o What are the roles of secondary name node
o What are the drawbacks in
secondary name node
Data Node o Importance of data node o What are the roles of data node o What are the drawbacks in data
Data storage in HDFS o How blocks are storing in data nodes
o How replication works in data nodes
o How to write the files in HDFS o How to read the files in HDFS
HDFS Block Size o Importance of HDFS block size o Why block size is so large?
o How it is related to Map Reduce split size
HDFS Commands o Importance of each command o How to execute the command
How to overcome the Drawbacks in HDFS o Name node failures o Secondary name node failures o Data node failures o Exploring the Apache HDFS Web
How to configure the Hadoop Cluster o How to add the new
o How to remove the existing
o How to verify the dead nodes o How to start the dead nodes
Hadoop 2.x.x version features o Introduction to namenode
o Introduction to namenode high availability o Difference between hadoop 1.x.x. and hadoop 2.x.x versions
Map Reduce architecture
JobTracker o Importance of jobtracker o What are the roles of jobtracker o What are the drawbacks in
TaskTracker o Importance of Tasktracker o What are the roles of Tasktracker o What are the drawbacks in Tasktracker
Data Types in hadoop o What are the data types in map reduce
o Why these are important in map reduce
o Can we write custom data types in
Input formats in map reduce
Output formats in map reduce
What is mapper in map reduce job Why we need mapper?
What are the advantages and disadvantages of mapper
Writing mapper programs
What is reducer in map reduce job Why we need reducer?
What are advantages and disadvantages of reducer
Writing reducer programs
What is combiner in map reduce job Why we need combiner?
What are the advantages and disadvantages of reducer
Writing combiner programs
What is partitioner in map reduce job Why we need partitioner?
What are the advantages and disadvantages of partitioner
Writing partitioner programs
What is distributed cache in map reduce job
Importance of distributed cache in map reduce job
What are advantages and disadvantages of distributed cache
What is counter in map reduce job
Map side join o What is the importance of map side join
o Where we are using it
Reduce side join o What is the importance of reduce side join
o Where we are using it
Importance of compression techniques in production environment
Compression codecs o Default,Gzip,Bzip,Snappy and LZO
Enabling and disabling these techniques
Map Reduce Schedulers
Map Reduce programming model
How to write the map reduce jobs in java
Running the map reduce jobs in local mode
Running the map reduce jobs in pseudo mode
Running the map reduce jobs in cluster mode
Debuggind Map reduce jobs
How to debug map reduce jobs in local mode
How to debug map reduce jobs in remote mode
YARN (Next Generation Map Reduce)
What is YARN?
What is the importance of YARN?
What is the differenc e between YARN and map reduce
What is data locality?
Will hadoop follows data locality?
What is Speculative execution
Will hadoop follows speculative execution
Introduction to Apache pig
Map Reduce vs Apache pig
SQL vs Apache pig
Modes of execution in pig
Map reduce mode Apace HIVE
SQL vs hive QL
Hive installation and configuration
Introduction to zookeeper
Pseudo mode installations
Introduction to sqoop
MySQL client and server installation
How to connect to relational database using sqoop
sqoop commands and examples on import and export commands
Introduction to flume
Flume agent usage and flume examples execution
Advanced and new technologies architecture discussions
Mahout(Machine Learning Algorithms)
Storm(Real time data streaming)
Cassandra (NOSQL database)
Mongo DB (NOSQL database)
Ganglia (Monitoring Tools)
Cloudera, Hortonworks, mapR, Amazon EMR(Distributions)