Big Data Analytics on Apache Spark

Overview

This 3-day training will teach you how to get the most out of the latest version of Apache Spark when it comes to Spark development and analytics.

The whole course is completely hands-on, and you will go through many exercises and workshops for both programming and analytics use-cases. The training is designed to accommodate both developers and analysts. Participants need basic programming skills, but you don’t need to have prior Big Data or Spark experience.

The recommended length of this training is three days. Based on your specific requests we can stretch it to four days by covering the more advanced topics.

Objectives

At the end of this course you will understand and be hands-on with:

Spark’s capabilities and its place in the Big Data Ecosystem Spark SQL, DataFrames, Datasets. Real-time scalable data analytics with Spark Streaming Machine Learning using Spark Writing performant Spark Applications by executing Spark’s internals and optimisations.

S. P. JAIN – Mumbai – Student

The topics in the course were explained clearly. Use cases and the course content practice related to real world problems and it was well conceptualised.

drwaing guy
drawing dots drwaing quote

MODULES

Spark Overview

Data Analytics with Spark:

DataFrames

Using Spark SQL and interconnecting them withDataFrames

Unified analysis of data coming from different sources and formats

Programming Spark:

Using the RDD API

Using the Spark UI to evaluate Spark applications:

Analyzing Spark internal mechanics through the Spark UI

Writing performant Spark applications

Debugging Spark Applications

Tungsten and Catalyst: Advanced Optimisations

Spark Streaming

Real-time data processing with Spark Implementing Continuous

Applications with Structured

Streaming Machine Learning with Spark

Implementing machine learning pipelines with Spark

Supervised learning: Model Building, Predictions and Validation

Spark in the cloud:

How to configure and use Spark on Amazon Web Services.

OPTIONAL MODULES

Scala Tutorial

Introduction to Machine Learning

Advanced Spark programming

Spark Deployment techniques

Advanced Machine Learning with H2O. Integrating H2O into Spark.

PySpark and the iPython notebook

Using Spark in R