Databricks December 2023 Feature Recap: Key Updates for Data & AI Teams

Péter Kaszt
Péter Kaszt
15 Jan 2024 · 9 min read

With 2023 in the books, we have one last Databricks feature recap to dissect. In December, Databricks has announced its latest updates and statuses, with some being in Public Preview while others are Generally Available already. Here’s the quick breakdown:

  • AI and ML updates
  • Delta Sharing updates
  • Init scripts updates 

At the end of this article, we also included a complete feature update table with links to the cloud-specific documentation pages. Let’s dive in!

AI and ML-related updates

Making your work within the AI & ML space much easier, here are a few new features released in December.

Feature & Function Serving is in Public Preview

Availability: AWS, Azure

Imagine you’re a data scientist or a machine learning engineer. You spent countless hours developing and fine-tuning your Machine Learning models. Now, you want to integrate these models into your applications and systems. But there’s a catch – you need a way to serve the features and functions that your models rely on. 

This is where Feature & Function Serving comes into play.

Here are the perks of Feature & Function Serving:

  • Simplicity. No more juggling (and copying data) between different platforms. With a single API call, Databricks creates a production-ready serving environment for you.
  • High availability and scalability. Your features and functions are always ready to be served, whether you’re serving a small team or a large enterprise.
  • Security. Your data is always protected, and it is served by dedicated compute from a secure network boundary all handled and managed for you by Databricks.

Foundation Model APIs is in Public Preview

Availability: AWS, Azure

Access and query state-of-the-art open models like Llama 2, BGE Large, MPT, and even their fine-tuned variants – all without building and configuring a GPU-accelerated development machine or spinning up your own GPU-accelerated cluster in your preferred cloud.

As Databricks summarizes it there are numerous use cases where you can benefit from this flexibility:

  • Query a generalized LLM to verify a project’s validity before investing more resources.
  • Query a generalized LLM to create a quick proof-of-concept for an LLM-based application before investing in training and deploying a custom model.
  • Use a foundation model, along with a vector database, to build a chatbot using retrieval augmented generation (RAG).
  • Replace proprietary models with open alternatives to optimize for cost and performance.
  • Efficiently compare LLMs to see which is the best candidate for your use case, or swap a production model with a better-performing one.
  • Build an LLM application for development or production on top of a scalable, SLA-backed LLM serving solution that can support your production traffic spikes.

External models support in Model Serving is in Public Preview

Availability: AWS, Azure

Another improvement from the Governance point of view is external model support in Databricks Model Serving. With this, you can now access, manage, and govern third-party hosted models, referred to as external models. This support allows you to add endpoints for accessing models hosted outside of Databricks, for example, Azure OpenAI GPT models, Anthropic Claude, or AWS Bedrock. Once configured, you can grant teams and applications access to these models, enabling them to query via a standard interface without exposing credentials.

Key benefits:

  • Unified endpoint to handle specific LLM-related requests.
  • Centralized credential management.
  • Rate limiting in some cases.

Delta Sharing Related Updates

What is Delta Sharing and why is it important?

Delta Share's role in the data architecture
Delta Share’s role in the data architecture. Source: databricks.com

The official documentation summarizes the “What” quite nicely:

Delta Sharing is an open protocol developed by Databricks for secure data sharing with other organizations regardless of the computing platforms they use.”

So, why is this important?

Imagine that you are part of a bigger – even enterprise-scale – organization. You have data products in multiple geographic regions or multiple clouds, or you are planning to utilize your datasets across multiple business units. Then sooner or later you will be running into some common questions:

  • How to securely share data between multiple regions or clouds without constantly copying and duplicating it (which would increase your cloud costs)?
  • How do you share data with users or clients who are not using Databricks?
  • How do you control data governance for the shared data in an efficient centralized way?

Utilizing Delta Sharing coupled with Unity Catalog can be the answer to these questions. And this December, some interesting new features extend the possibilities for Delta Sharing.

Note: Although Delta Sharing can be used to securely share data with recipients outside of the Databricks platform, both of these new features require Databricks-to-Databricks sharing. Other requirements like minimum DBR version might be different for each cloud provider.

Share volumes using Delta Sharing is in Public Preview

Availability: AWS, Azure, GCP

Previously you could share tables, and if you and the receiving party both used a Unity Catalog-enabled workspace, even notebook files. But you couldn’t really share unstructured data (e.g. images or other files to train your machine learning model on) in a similar built-in way.

With this new feature entering the Public Preview phase in December, you can now use Delta Sharing to share volumes between Databricks workspaces on different Unity Catalog metastores (including workspaces on different Databricks accounts and different clouds).

Volumes are Unity Catalog objects that represent a logical volume of storage in a cloud object storage location. They primarily provide governance over non-tabular data assets. Delta Sharing on Databricks provides a native integration with Unity Catalog that allows you to manage, govern, audit, and track the usage of shared volumes of data.

WANT TO GET THE MOST OF THE LATEST DATABRICKS FEATURES?

BOOK A CALL WITH OUR EXPERT TO LEARN MORE.

Share dynamic views using Delta Sharing is in Public Preview

Availability: AWS, Azure, GCP

With this new Public Preview feature, you can use Delta Sharing to share dynamic views that restrict access to certain table data based on recipient properties. This can be especially useful if you need to enforce some security or data privacy restrictions (for example GDPR or CCPA) or you want to do some data masking that can be different for each recipient, based on geography, business unit, or some other criteria, etc. This can be easily configured both on the row and column level.

Init scripts on DBFS, and legacy global & cluster-named init scripts are end-of-life

Affects: AWS, Azure, GCP

Although not a new feature, it is important to note that if you have been using Databricks for a long time (i.e. created your workspace before February 21, 2023), you might have some custom init scripts defined which were stored in DBFS or fit into the “legacy global and cluster-named” category.

These are now officially end-of-life for several reasons, most importantly to increase security. Hence we advise you to audit your Databricks Workspaces to discover if these files are still in use, and if so, then migrate them away as soon as possible to prevent future disruptions in your affected pipelines. If you need help with this or any other Databricks security, cost optimization, or performance-related topic, then our experts at Datapao are happy to help!

General recommendations for init script locations:

EnvironmentRecommendation
Databricks Runtime 13.3 LTS and above with Unity CatalogStore init scripts in Unity Catalog volumes.
Databricks Runtime 11.3 LTS and above without Unity CatalogStore init scripts as workspace files. (File size limit is 200 MB).
Databricks Runtime 10.4 LTS and belowStore init scripts using cloud object storage.

The Complete Update Table

FeatureAzureAWSGCP
Share dynamic views using Delta SharingPublic PreviewPublic PreviewPublic Preview
Share volumes using Delta SharingPublic PreviewPublic PreviewPublic Preview
Entity Relationship Diagram for primary keys and foreign keysAvailableAvailableAvailable
Unity Catalog volume file upload size limit increaseAvailableAvailableAvailable
New notebook cell results rendering availablePublic PreviewPublic PreviewPublic Preview
Notebook editor themes availableAvailableAvailableAvailable
External models support in Model ServingPublic PreviewPublic Preview
Databricks Online Tables is Public PreviewPublic PreviewPublic Preview
Repos & Git Integration Settings UI now correctly notes support for GitHub Enterprise ServerAvailableAvailable
Databricks JDBC driver 2.6.36AvailableAvailable
Support for referencing workspace files from init scriptsAvailableAvailable
Feature & Function ServingPublic PreviewPublic Preview
Foundation Model APIsPublic PreviewPublic Preview
New unified admin settings UIAvailableAvailableAvailable
Init scripts on DBFS are end-of-lifeEnd-of-LifeEnd-of-LifeEnd-of-Life
Legacy global and cluster-named init scripts are end-of-lifeEnd-of-LifeEnd-of-Life
Compute created in the UI now uses the “Auto” availability zone by defaultAvailable

Wrap up

Thanks for tuning in for the December roundup of Databricks features – if you are interested in future articles on new Databricks releases, follow us on LinkedIn to not miss the January update.