Big Data Computing on Hadoop

OVERVIEW

With this 3 days long, hands-on Big Data and Apache Hadoop course, you will be able to understand how the different components of the Big Data ecosystem fit together. Through an overview of use cases and a comparison of current Big Data technologies, at the end of the course, you will be able to reason about how to solve your Big Data problems and which tool you should use for them.

MODULES

Based on your requirements we handcraft a course covering some or all of the following topics:

  • Big Data / Hadoop. The Hadoop ecosystem overview. Distributed computing workshop
  • HDFS and MapReduce, HDFS deep dive, MapReduce design patterns
  • YARN, Hadoop’s resource manager
  • MapReduce applications using the Hadoop Streaming API
  • SQL based solutions: Hive, HCatalog, interoperability with Spark SQL
  • Spark overview, the Spark DataFrame API and Spark SQL
  • Big Data in the cloud: Cloud computing basics, Amazon Web Services. EMR, Hadoop in the cloud.
  • ETL and Orchestration systems & the Luigi Framework
  • Technical Big Data Architecture use cases