Databricks January 2024 Feature Recap: Key Updates for Data & AI Teams

Péter Kaszt
Péter Kaszt
20 Feb 2024 · 11 min read

Databricks is ever-improving and ever-updating, committed to frequently releasing new features to support organizational infrastructures and improve productivity. 

In this guide, we’re going to look at the most important, useful, and innovative features Databricks released in January 2024 in four categories:

  • The main headliners, including the Databricks Runtime 14.3 LTS and 14.3 ML LTS
  • Marketplace and Delta Sharing-related features
  • Convenience and UI updates
  • Monitoring-related updates

For the full breakdown of every feature and their respective links, scroll to the bottom of the article.

The main headliners

Databricks Runtime 14.3 LTS and 14.3 ML LTS

Availability: AWS, Azure, GCP

Note: generally available as non-beta since early February.

Databricks Runtime 14.3 LTS is now in beta. This new runtime version is expected to bring many improvements and new features, along with the usual bunch of library updates and various Spark fixes. This is the first Long-Term-Support version this year, which means it receives full support from Databricks for the next 3 years.

Some important examples of the new features:

Native XML file format support (Public Preview)

Availability: AWS, Azure, GCP

Native XML file format support is now in Public Preview. XML file format support enables ingestion, querying, and parsing of XML data for batch processing or streaming. It comes with schema inference and evolution with Auto Loader, data rescue capability, and delivering all of these without any external dependencies (without needing external jars, or Python libraries). This elevates XML to a first-class citizen in the Databricks ecosystem.

This can be a game-changer for you if you have a lot of XML format data or need to integrate with systems that output XML.

Databricks Marketplace and Delta Sharing-related updates

Another Marketplace. What’s in it for me?

Image by the author – generated with Copilot

Imagine that you are tasked with developing a new machine-learning model or analytics solution that uses your company’s internal wealth of data and may also need some external data (e.g. weather, Bloomberg or other financial data, scraped data from the web, etc.).

You will face several important decision points:

  • Should we develop a new model from the ground up or use something already available?
  • What data do we need for training, evaluation, and inference?
  • Do we need external datasets to make our model more precise or advanced?
  • If we need external data, how do we obtain it (procuring ready-made data or scraping, is it even available publicly)?

You can go the hard way and try to develop a solution from scratch. You can build an elaborate web scraping solution to obtain publicly available data, but why reinvent the wheel?

You can save a lot of development time and effort, along with operational costs (not to mention the headaches) by checking if there is a sufficiently good similar solution available. If there’s one, just roll with that. Or you can build your solution on an existing model, fine-tune it for your use case, or train it with additional data.

Let’s say you are on the other side: You might have developed a game-changing model for forecasting the stock market or crypto movement, or you have already scraped the web for different datasets that you have in a neat tabular format. So, how do you monetize it?

That’s where Databricks Marketplace comes in. It provides a secure platform built on top of open-source sharing protocols to share data, models, notebooks, and even complete solutions with other companies.

Share AI models using Databricks Marketplace (Public Preview)

Availability: AWS, Azure, GCP

You can now use Marketplace to share models registered in Unity Catalog. This can be a great way to monetize your existing AI models or securely use models developed and trained by trusted companies from across the Databricks ecosystem.

Share AI models using Delta Sharing (Public Preview)

Availability: AWS, Azure, GCP

Sharing AI models using Delta Sharing is now in Public Preview. This feature will facilitate the sharing and collaboration of AI models among different teams and organizations.

Databricks Marketplace supports volume sharing

Availability: AWS, Azure, GCP

Databricks Marketplace now supports volume sharing. This feature will help your organization share and access data volumes through the Databricks Marketplace.

This can be interesting if you have curated non-tabular datasets for ML training, or on the other hand if you need some data. It might turn out to be obtainable as a Data Product on the Databricks Marketplace.

Note to the Marketplace and Delta Sharing-related features above. Both the provider and consumer workspaces must be enabled for Unity Catalog to participate in model or volume sharing. (Since November 8 and 9, 2023, all new workspaces should be Unity Catalog enabled by default on AWS and Azure.)

WANT TO GET THE MOST OF THE LATEST DATABRICKS FEATURES?

CONTACT US TO LEARN MORE

Convenience and UI updates

Workspace file size limit is now 500MB

Availability: AWS, Azure, GCP

The workspace file size limit has been increased to 500MB. This will allow you to work with larger files directly within your Databricks workspace.

Workspace path update

Availability: AWS, Azure, GCP

Historically, users were required to include the /Workspace path prefix for some Databricks APIs (%sh) but not for others (%run, REST API inputs). Now, you can provide workspace paths with the /Workspace prefix everywhere in Databricks.

URI path-based access to Unity Catalog external volumes

Availability: AWS, Azure, GCP

You can now use cloud storage URIs for path-based access to data governed by Unity Catalog and stored in external volumes.

Updated UI for notebook cells (Public Preview)

Availability: AWS, Azure, GCP

The UI for notebook cells has been updated and is now in Public Preview. This update is expected to improve the user experience while working with notebooks in Databricks. Some features seem quite handy, while others require a bit of getting used to if you have been using the old UI for a long time. More info here.

Quick Fix helps with syntax errors in the notebook

Availability: AWS, Azure, GCP

Databricks now provides Quick Fix help with syntax errors in the notebook. This feature will be a boon for developers and data scientists who spend a significant amount of time coding in notebooks, bringing it even closer to an IDE-like experience.

Monitoring-related updates

Monitor GPU model serving workloads using inference tables

Availability: AWS, Azure

You can now monitor your GPU model serving workloads using inference tables. This feature will be particularly useful if you are heavily involved in machine learning and AI.

Marketplace listing events system table now available (Public Preview)

Availability: AWS, Azure

You can use this table to monitor consumer actions on your Marketplace listings.

Warehouse events system table is now available (Public Preview)

Availability: AWS, Azure

You can use this table to monitor the SQL Warehouses in your workspaces.

System tables are a Databricks-hosted analytical store of your account’s operational data found in the system catalog. System tables can be used for historical observability across your account.
They can be instrumental if you want to build a solution for monitoring a host of different aspects of your Databricks environment: for example, costs, table lineage, or audit-related logs.
See official docs for more details.

Removed Features

Feature removal notice for legacy Git integration in Databricks

Affects: AWS, Azure, GCP

A notice has been issued for removing the legacy Git integration feature in Databricks. Users are advised to update their workflows to avoid any disruptions.

The complete update table:

FeatureAzureAWSGCP
Databricks Runtime 14.3 LTS (Updated: Generally Available since 01 Feb)AvailableAvailableAvailable
Native XML file format supportPublic PreviewPublic PreviewPublic Preview
Share AI models using Databricks MarketplacePublic PreviewPublic PreviewPublic Preview
Updates for network security group rulesAvailable
Workspace path updateAvailableAvailableAvailable
Support for Azure Storage firewall from serverless computeAvailable
Streamlined creation of Azure Databricks/Databricks jobsAvailableAvailableAvailable
Monitor GPU model serving workloads using inference tablesAvailableAvailable
Support for Databricks managed service principalsAvailable
URI path-based access to Unity Catalog external volumesAvailableAvailableAvailable
Access controls lists can be enabled on upgraded workspacesAvailableAvailableAvailable
Marketplace listing events system table now availablePublic PreviewPublic Preview
Updated UI for notebook cellsPublic PreviewPublic PreviewPublic Preview
Quick Fix help with syntax errors in the notebookAvailableAvailableAvailable
Share AI models using Delta SharingPublic PreviewPublic PreviewPublic Preview
Databricks Marketplace supports volume sharingAvailableAvailableAvailable
Create widgets from the Databricks UIAvailableAvailableAvailable
Libraries now supported in compute policiesPublic Preview
Warehouse events system table is now availablePublic PreviewPublic Preview
UI experience for OAuth app registrationAvailable
Reuse subnets across workspaces for customer-managed VPCsAvailable
Workspace file size limit is now 500MBAvailableAvailableAvailable
Feature removal notice for legacy Git integration in DatabricksEnd-of-LifeEnd-of-LifeEnd-of-Life
Databricks ODBC driver 2.7.7AvailableAvailableAvailable
AI assistive features are enabled by defaultAvailable

That’s all for the January 2024 updates. Stay tuned for more updates in the coming months!