We deliver certified public and private courses on behalf of Databricks. You can always find our latest offerings on the Databricks Training Site.
This one-day course is for data engineers, analysts, and architects; software engineers; IT operations; and technical managers interested in a brief hands-on overview of Apache Spark.
The course covers core APIs for using Spark, basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs. Each topic includes slide and lecture content along with hands-on use of a Spark cluster through a web-based notebook environment.
Spark’s capabilities and its place in the Big Data Ecosystem Spark SQL, DataFrames, Datasets. Real-time scalable data analytics with Spark Streaming Machine Learning using Spark Writing performant Spark Applications by executing Spark’s internals and optimisations.
After taking this class, you will be able to:
Experiment with use cases for Spark and Databricks, including extract-transform-load operations, data analytics, data visualization, batch analysis, machine learning, graph processing, and stream processing.
Identify Spark and Databricks capabilities appropriate to your business needs.
Communicate with team members and engineers using appropriate terminology.
Build data pipelines and query large data sets using Spark SQL and DataFrames.
Execute and modify extract-transform-load (ETL) jobs to process big data using the Spark API, DataFrames, and Resilient Distributed Datasets (RDD).
Analyze Spark jobs using the administration UIs and logs inside Databricks.
Find answers to common Spark and Databricks questions using the documentation and other resources.
Spark Overview
RDD Fundamentals
SparkSQL and DataFrames
Spark Job Execution
Intro to Spark Streaming
Machine Learning Basics