Big Data Analytics on Apache Spark


This 3-day training will teach you how to get the most out of the latest version of Apache Spark when it comes to Spark development and analytics.

The whole course is completely hands-on, and you will go through many exercises and workshops for both programming and analytics use-cases. The training is designed to accommodate both developers and analysts. Participants need basic programming skills, but you don’t need to have prior Big Data or Spark experience.

The recommended length of this training is three days. Based on your specific requests we can stretch it to four days by covering the more advanced topics.


At the end of this course you will understand and be hands-on with:

  • Spark’s capabilities and its place in the Big Data Ecosystem
  • Spark SQL, DataFrames, Datasets.
  • Real-time scalable data analytics with Spark Streaming
  • Machine Learning using Spark
  • Writing performant Spark Applications by executing Spark’s internals and optimisations.


  • Spark Overview
  • Data Analytics with Spark:
    • DataFrames
    • Using Spark SQL and interconnecting them with DataFrames
    • Unified analysis of data coming from different sources and formats
  • Programming Spark:
    • Using the RDD API
    • Using the Spark UI to evaluate Spark applications:
    • Analyzing Spark internal mechanics through the Spark UI
    • Writing performant Spark applications
    • Debugging Spark Applications
    • Tungsten and Catalyst: Advanced Optimisations
  • Spark Streaming
    • Real-time data processing with Spark
    • Implementing Continuous Applications with Structured Streaming
  • Machine Learning with Spark
    • Implementing machine learning pipelines with Spark
    • Supervised learning: Model Building, Predictions and Validation
  • Spark in the cloud:
    • How to configure and use Spark on Amazon Web Services.


  • Scala Tutorial
  • Introduction to Machine Learning
  • Advanced Spark programming
  • Spark Deployment techniques
  • Advanced Machine Learning with H2O. Integrating H2O into Spark.
  • PySpark and the iPython notebook
  • Using Spark in R