Databricks Updates for Q2 2023: What’s New and What’s Hot

Balázs Aszódi
Balázs Aszódi
14 Jul 2023 · 7 min read

Databricks is ever-improving and ever-updating, committed to frequently releasing new features to continue to support organizational infrastructures and improve productivity. In this guide, we’re going to look at the most important, useful, and innovative features on Databricks released in Q2 2023 – from April 1 to June 30, including Databricks Runtime updates in versions 13.0 and 13.1 and more. Of course, it would be all but impossible to list everything that’s changed and improved. 

However, these are the updates we’ve been most excited about here at DATAPAO, and we’re presenting them below. Without further ado, let’s dive in.

Databricks Runtime 13.0 and 13.1

Generally available since April 2023 for AWS, Azure and Google Cloud Platform, and subsequently updated to 13.2 Beta in late June, Databricks Runtime 13.0 boasts a bounty of features itself.

Major updates in 13.0 include:

  • Apache Spark 3.4: Including Spark Connect, new SQL functionality, and streaming improvements.
  • Databricks Runtime 13.0 for ML: Machine learning enablement via popular ML libraries, including TensorFlow, AutoML, XGBoost, and PyTorch.
  • Delta Lake upgrade to 2.3.0: Zero-copy convert to Delta from Iceberg tables, supports Shallow Clone and Create Table Like commands, etc. 
  • New features & support for predictive I/O: Support for sliding frames, CSV, AVRO, etc. 

Updates in 13.1 include:

  • Read Kafka with SQL
  • Add, change or delete data in streaming tables
  • New SQL built-in functions
  • Unity Catalog support for cluster-scoped Python libraries 
  • Delta Clone for Unity Catalog

Availability: Databricks Runtime 13.0 is Generally Available on AWS, Azure, and GCP.

Cluster Metrics

This new, native cluster metrics tool enables the gathering of key hardware and Spark metrics. 

Until this update, cluster metrics were only possible via Ganglia, recorded in 15-minute blocks, and available externally on a limited basis. Now embedded into the Databricks UI, the new tool is customizable and available with various filters and at node level, providing a variety of CPU, Spark, and GPU metrics charts.

Availability: Generally Available on AWS and Azure.

CERTIFIED TO THE HIGHEST STANDARDS, DATAPAO IS A PREFERRED DATABRICKS PARTNER, IDEALLY PLACED TO BOOST YOUR DATA INFRASTRUCTURE. 

DISCOVER OUR ONE-STOP MIGRATION SOLUTON TO LEARN HOW TO MIGRATE SMARTER

Workspace Files

Formerly called “Files in Repos,” Workspace files are enabled everywhere by default for 11.3 LTS and above (but can be disabled if required). 

For many workspace file types, Databricks provides similar functionality to local development. They can be created and edited (via a built-in file editor), while you can also manage access to them.

Availability: This feature is Generally Available on AWS, Google Cloud Platform, and Azure.

File-Based SQL Queries in Workflows

Conveniently, you can now retrieve SQL queries from a remote Git repo when adding an SQL task to a Databricks job. 

You also have the option to clone your repository into a Databricks repo, if you prefer to use Databricks to host your source code instead.

Availability: The file-based SQL queries in Workflows feature is Generally Available on AWS and Microsoft Azure.

Cluster metrics for the last 24 hours
An example of cluster metrics

Workspace-Catalog Binding 

With this update, first introduced on AWS in May, you can limit unity catalog access to specific workspaces in your account, thus streamlining user access control.

This is a useful tool for those who use workspaces to isolate user data access. What you’ll be doing is sharing the catalog only with the workspaces attached to the current metastore. Admins or catalog owners can set this up using either the Data Explorer or the Unity Catalog REST API.

Availability: The update is Generally Available on AWS and Google Cloud Platform and in Public Preview on Azure

Databricks Notebooks in SQL Warehouses

As SQL is the second most popular language in Notebooks, Databricks focused on expanding its support for it. SQL warehouses deliver better price-performance for SQL execution compared to all-purpose clusters, so now Databricks notebooks are available in SQL warehouses.

Keep in mind that while attached to an SQL warehouse, only the SQL cells in your notebook will execute – not Python or Scala cells or cells in any other language. 

Availability: This new feature is in Public Preview on AWS, Azure, and Google Cloud Platform.

more cluster dialog with SQL warehouse selected
Workspace-catalog binding

Databricks Connect V2 

With the handy client library dubbed Databricks Connect V2, you can connect various IDEs and other custom applications to Databricks clusters. 

To do so, you’ll be writing jobs using Spark APIs and running them remotely on a Databricks cluster instead of a local Spark session. As a result, you can iterate quickly when developing libraries, shut down idle clusters without losing work, run large-scale Spark jobs from any Python application, and debug code in your IDE even when working with a remote cluster.

Availability: This feature was made Generally Available for Python in early June and can be used across AWS, GCP, and Azure.

Usage Monitoring via System Tables

System tables can be used across your account to observe and assess historical data. There are currently three types of system tables hosted on Databricks: audit logs, billable usage logs, and table and column lineage.

By monitoring system tables, once admins have set up the appropriate permissions, you can gain more insights into an account’s operational data – easily accessed, account-wide and Databricks-hosted.

Availability: This feature is in Public Preview on AWS and Azure.

Databricks Marketplace

Also warmly welcomed in Q2 has been the vast ecosystem of the Databricks Marketplace, touted as “an open forum for exchanging data products” and constituting a bridge between data providers and data consumers to help facilitate the discovery and delivery of data sets. 

Powered by Delta Sharing, it provides a wide array of data products, all in a convenient and safe location. The data solutions available on this platform – including datasets, notebooks, ML models, and more –  do not need to be on the Databricks platform. As a result, consumers can avoid vendor lock-in, while providers can broaden their reach.

Announced in late April, the Databricks Marketplace has been well-received across the board, as it helps address some of the most common pain points experienced both by data providers and consumers.

Availability: Generally Available on AWS, GCP, Microsoft Azure.

Databricks Marketplace
The Databricks Marketplace

Other Key Updates

Last but not least, we wanted to give you a list of additional updates that we were excited to see but don’t have time to elaborate on, all seen in Q2 of 2023. Of course, should you want to discuss any of these further, the best thing to do is to get in touch with the DATAPAO team right away.

  • Load data using a Unity Catalog external location [in Public Preview on AWS, Azure, GCP]
  • Share notebooks using Delta Sharing [Generally Available on AWS, Azure, GCP]
  • Run a job as a service principal [Public Preview on AWS, Azure, GCP]
  • Databricks command line interface (CLI) [Public Preview on AWS, Azure, GCP]
  • Govern models from Unity Catalog [Public Preview on AWS, Azure, GCP]
  • Databricks SDK for Python (Beta) and for Go (Beta) 
  • Unified schema browser to view data from notebooks, SQL editor, and Data Explorer [Public Preview on AWS, Azure, GCP]
  • Unified navigation [Generally Available on AWS, Azure, GCP]

Q3 and Beyond

We hope you’ve enjoyed our rundown of the biggest, most useful, and most exciting new features, updates, and changes in Databricks over the second quarter of 2023, which are already making handling data, workloads, and admin more efficient, productive, and easy. 

Stay tuned as we’ll continue our quarterly presentations of Databricks updates and what they mean for those of us who have embraced the platform. Also, always keep in mind that your Databricks account can take up to a week or more to be updated after each release as releases are staged. 

At DATAPAO, we’ve found these updates to be the most relevant for us and our clients, and we would love to discuss them with you further. Whether you already are on Databricks or are considering migrating, contact us today.