New in Databricks Q1 2023: DATAPAO’s Round-Up of Databricks Features (Part 1)

Balázs Aszódi
Balázs Aszódi
17 Apr 2023 · 9 min read

Azure Databricks provides a range of tools and features to help businesses and data teams derive insights from their data. With each release, Databricks adds new enhancements and capabilities that make it easier to build and deploy data-driven applications. 

Today, we’ll look at three new features from Q1 2023, as seen in Databricks Runtime LTS 12.2 and Beta 13.0. For data leaders and data practitioners, it’s essential to know the latest tools and technologies to derive insights from your data, automate machine learning workflows, and make your data engineering pipelines more robust.

In this guide, we’ll explore:

  1. Serverless Warehouse
  2. Model Serving
  3. Databricks Extension for VS Code

Meanwhile, in the second part of our Databricks Features Round-up for 2023, you can read about updates to workflows, governance and runtime.



What Is the Serverless Warehouse?

SQL Warehouse (formerly known as SQL Endpoints) is a scalable compute resource that lets you run SQL commands on data objects within your Lakehouse in the Databricks environment. 

A compute resource contains multiple clusters as per your chosen Warehouse size – which follows so-called t-shirt sizing, with different instance types in Azure, AWS, and GCP.

Serverless Warehouse setups serve your business intelligence and SQL workloads and lets you create, store and visualize your own queries and create your own dashboards. 

Note that there is no lock-in. You can connect your business intelligence tools, such as Power BI, Tableau, or Looker to empower your teams to analyze the data on their own. 

Databricks feature updates - DATAPAO
The SQL Warehouse Ecosystem

Types of Warehouses in Databricks

Databricks differentiates three types of Warehouses: Classic, Pro, and Serverless.

  1. The Classic SQL Warehouse is the traditional, self-managed SQL Warehouse, including Databricks SQL editor and record-setting Photon engine, Query History & Profile, Data Explorer, and Managed Delta Sharing.
  2. The Pro Warehouse provides a better experience than Classic, including additional best-performance features such as Predictive I/O, and integration features like workflows enable SQL ETL use cases. Additional features are expected to follow soon that will expand the SQL experience for machine learning and data science on the Lakehouse.
  3. Even more advanced is the Serverless Warehouse, which combines many key benefits of the above with the speed of instant compute and easier scaling. It also is the most flexible option for scaling and is managed by Databricks.

The Classic and Pro are self-managed on your account, meaning you need to have the computation resources on your cloud account. You need to have your own capacity management and it takes up to some minutes to get access to a Warehouse. 

You need to have long-running warehouses to avoid the idle time for the initialization (while you will pay for that time) and they also have reduced concurrency. 

To solve these pain points, the SQL Serveless Warehouse can be the best option for many. Let’s explore why.

Advantages of SQL Serverless Database

SQL Serverless provides an excellent customer experience with a fully managed, instant, and elastic serverless data warehouse:

Instant, elastic compute

  • Fastest query execution with instant compute: No need to bother with idle time and cost; you pay when your users start running reports or queries.
  • Scales fast and intelligently thanks to Intelligent Workload Management. Databricks SQL learns from the history of the workloads. Using this history for new queries to determine whether a query should run immediately or scale up to run it without disrupting running queries.
  • High concurrency built-in, automatic load balancing using dual queues to avoid large queries to block small ones.
  • Faster – reads from cloud storage.

Zero management

  • Databricks manages the capacity and the pools. The compute is on the Databricks account. 
  • The compute resource is being created in the same cloud region as the Databricks resource.

Which SQL Warehouse Type Should You Use?

Use ClassicUse ProUse Serverless
if you have relational data, or for development purposes when concurrency, idle time, and cost don’t matter and Serverless is unavailable in your region. for optimized cloud-based workloads, including ETLs and Production workloads and you want to rely on the above-mentioned features.for the best experience for both types of workloads and instant compute with the best available performance. Rely on Databricks management and industry-leading solutions to focus on what you are really good at, build workloads, and derive insights.
The type of workloads and data you have will define the best warehouse type for you.

What Is Model Serving and What Is It For?

A new capability on Databricks, Model Serving offers a production-ready environment for serving your ML models. Previously known as Serverless Real-Time Inference, this service exposes your machine learning models as scalable REST API endpoints and provides highly available and low-latency services for deploying models.

Productionizing ML models has its pain points. In fact, most machine learning models don’t actually make it into production. There are several challenges that come with building real-time ML models, including:

  1. ML infrastructure is hard.
  • Real-time ML symptoms require fast and scalable serving infrastructure, which can be costly to build and maintain.
  1. Deploying real-time models requires specific tools.
  • Data teams use diverse tools to develop models.
  • Some organizations use separate platforms for data, ML, and serving, adding complexity and cost.
  1. Operating production ML requires expert resources.
  • Deployment tools have a steep learning curve.
  • Model deployment is bottlenecked by limited engineering resources, which limits the ability to scale AI.
Databricks feature updates - Machine Learning Pipeline Lifecycle - DATAPAO
Machine Learning Pipeline Lifecycle



The Benefits of Databricks Model Serving

Databricks’ Model Serving can help you solve many challenges of productionizing machine learning models. You don’t need to manage infrastructure or spend time creating a scalable infrastructure. Deploying models with Model Serving does not require engineering resources, and it is very simple to server models.

Production-grade servingAccelerate deployments with Lakehouse integrationSimplified deployment
Highly available, low latency, scalable serving that works for small and large workloadsFully integrates with other products on your Lakehouse

Provides automatic feature lookups, monitoring, and unified governance
Simple and flexible deployment through UI or API

Create model endpoints with few clicks and deploy your models to them
Benefits of model serving
Databricks feature updates - DATAPAO
Databricks Model Serving

Model Serving Modes

Model Serving can be implemented with various modes, depending on the specific requirements and constraints of your application.

As code / containersBatch & streamingRealtime (low latency)Embedded (device, edge)
Ensures reproducibility in other systems, resolves compatibility and environment issuesBatch: Great for high-latency scenarios. Leverages databases or object storage. Fast retrieval of stored predictions

Streaming: Stream processing, fast scoring on new data
Low latency scoring and high availability. Usually uses REST Special case deployments, great for limited connectivity with cloud services
Available model serving modes
Databricks feature updates - DATAPAO
The Four Model Serving Modes of Databricks

What Is the Databricks Extension for Visual Studio Code?

Databricks Visual Studio Code extension is a powerful tool that allows you to develop and manage your Databricks projects directly from your local Visual Studio Code environment. 

The Databricks Extension for VS Code was announced on Valentine’s Day 2023, and what a lovely day it was!

With this extension, you can easily synchronize your local code with code in remote workspaces, and securely connect and manage remote clusters by starting and stopping them. 

You can also run local Python code files on Databricks clusters in remote workspaces and run local Python, SQL, R and Scala notebooks as automated Databricks workflow jobs in remote workspaces.

The Databricks VS Code extension provides a native experience that allows you to write, test and run your code. This means you can minimize context switching and use your current workflow to develop your projects. Also, you can run your unit tests locally –  for example, run Python unit tests using pytest – apply software engineering best practices and utilize VS Code’s native capabilities for editing, refactoring, testing and CI/CD for your data and AI applications.

The idea is to provide the best experience for teams who rely on IDEs for their development process. Additional support will be rolling out for other IDEs and additional tools, such as the Databricks Connect V2, which we will cover in our Q2 round-up.

Summing Up

The new Databricks features we’ve looked into above can enhance collaboration, productivity and flexibility across the board. 

By taking advantage of ways to ingest, store and query your data more efficiently and with better compatibility, you can work better, reduce strain on resources, and do even more with your data.

For those looking to take advantage of the vast potential of machine learning and artificial intelligence at large in their operations, the Model Building feature will prove a key tool. 

We hope you’ve enjoyed our breakdown of these Databricks updates, and make sure you also join us for the second part of this exploration, available here