Databricks Certified Spark Trainings

We deliver certified public and private courses on behalf of Databricks. You can always find our latest offerings on the Databricks Training Site.

OVERVIEW

This one-day course is for data engineers, analysts, and architects; software engineers; IT operations; and technical managers interested in a brief hands-on overview of Apache Spark.

The course covers core APIs for using Spark, basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs. Each topic includes slide and lecture content along with hands-on use of a Spark cluster through a web-based notebook environment.

Spark’s capabilities and its place in the Big Data Ecosystem Spark SQL, DataFrames, Datasets. Real-time scalable data analytics with Spark Streaming Machine Learning using Spark Writing performant Spark Applications by executing Spark’s internals and optimisations.

OBJECTIVES

After taking this class, you will be able to:

Experiment with use cases for Spark and Databricks, including extract-transform-load operations, data analytics, data visualization, batch analysis, machine learning, graph processing, and stream processing.

Identify Spark and Databricks capabilities appropriate to your business needs.

Communicate with team members and engineers using appropriate terminology.

Build data pipelines and query large data sets using Spark SQL and DataFrames.

Execute and modify extract-transform-load (ETL) jobs to process big data using the Spark API, DataFrames, and Resilient Distributed Datasets (RDD).

Analyze Spark jobs using the administration UIs and logs inside Databricks.

Find answers to common Spark and Databricks questions using the documentation and other resources.

MODULES

Spark Overview

RDD Fundamentals

SparkSQL and DataFrames

Spark Job Execution

Intro to Spark Streaming

Machine Learning Basics