Big Data Computing on Hadoop

OVERVIEW

With this 3 days long, hands-on Big Data and Apache Hadoop course, you will be able to understand how the different components of the Big Data ecosystem fit together. Through an overview of use cases and a comparison of current Big Data technologies, at the end of the course, you will be able to reason about how to solve your Big Data problems and which tool you should use for them.

The topics in the course were explained clearly. Use cases and the course content practice related to real world problems and it was well conceptualised.

S. P. JAIN – Mumbai – Student

drwaing guy
drawing dots drwaing quote

MODULES

Big Data / Hadoop. The Hadoop ecosystem overview. Distributed computing workshop

HDFS and MapReduce, HDFS deep dive, MapReduce design patterns

YARN, Hadoop’s resource manager

MapReduce applications using the Hadoop Streaming API

SQL based solutions: Hive, HCatalog, interoperability with Spark SQL

Spark overview, the Spark DataFrame API and Spark SQL

Big Data in the cloud: Cloud computing basics, Amazon Web Services. EMR, Hadoop in the cloud.

ETL and Orchestration systems & the Luigi Framework

Technical Big Data Architecture use cases