
New in Databricks Q1 2023: DATAPAO’s Round-Up of Updates to Workflows, Governance and Runtime (Part 2)
Our quarterly report on new Databricks features.
Databricks is ever-improving and ever-updating, committed to frequently releasing new features to continue to support organizational infrastructures and improve productivity. In this guide, we’re going to look at the most important, useful, and innovative features on Databricks released in Q2 2023 – from April 1 to June 30, including Databricks Runtime updates in versions 13.0 and 13.1 and more. Of course, it would be all but impossible to list everything that’s changed and improved.
However, these are the updates we’ve been most excited about here at DATAPAO, and we’re presenting them below. Without further ado, let’s dive in.
Generally available since April 2023 for AWS, Azure and Google Cloud Platform, and subsequently updated to 13.2 Beta in late June, Databricks Runtime 13.0 boasts a bounty of features itself.
Major updates in 13.0 include:
Updates in 13.1 include:
Availability: Databricks Runtime 13.0 is Generally Available on AWS, Azure, and GCP.
This new, native cluster metrics tool enables the gathering of key hardware and Spark metrics.
Until this update, cluster metrics were only possible via Ganglia, recorded in 15-minute blocks, and available externally on a limited basis. Now embedded into the Databricks UI, the new tool is customizable and available with various filters and at node level, providing a variety of CPU, Spark, and GPU metrics charts.
Availability: Generally Available on AWS and Azure.
CERTIFIED TO THE HIGHEST STANDARDS, DATAPAO IS A PREFERRED DATABRICKS PARTNER, IDEALLY PLACED TO BOOST YOUR DATA INFRASTRUCTURE.
DISCOVER OUR ONE-STOP MIGRATION SOLUTON TO LEARN HOW TO MIGRATE SMARTER
Formerly called “Files in Repos,” Workspace files are enabled everywhere by default for 11.3 LTS and above (but can be disabled if required).
For many workspace file types, Databricks provides similar functionality to local development. They can be created and edited (via a built-in file editor), while you can also manage access to them.
Availability: This feature is Generally Available on AWS, Google Cloud Platform, and Azure.
Conveniently, you can now retrieve SQL queries from a remote Git repo when adding an SQL task to a Databricks job.
You also have the option to clone your repository into a Databricks repo, if you prefer to use Databricks to host your source code instead.
Availability: The file-based SQL queries in Workflows feature is Generally Available on AWS and Microsoft Azure.
With this update, first introduced on AWS in May, you can limit unity catalog access to specific workspaces in your account, thus streamlining user access control.
This is a useful tool for those who use workspaces to isolate user data access. What you’ll be doing is sharing the catalog only with the workspaces attached to the current metastore. Admins or catalog owners can set this up using either the Data Explorer or the Unity Catalog REST API.
Availability: The update is Generally Available on AWS and Google Cloud Platform and in Public Preview on Azure.
As SQL is the second most popular language in Notebooks, Databricks focused on expanding its support for it. SQL warehouses deliver better price-performance for SQL execution compared to all-purpose clusters, so now Databricks notebooks are available in SQL warehouses.
Keep in mind that while attached to an SQL warehouse, only the SQL cells in your notebook will execute – not Python or Scala cells or cells in any other language.
Availability: This new feature is in Public Preview on AWS, Azure, and Google Cloud Platform.
With the handy client library dubbed Databricks Connect V2, you can connect various IDEs and other custom applications to Databricks clusters.
To do so, you’ll be writing jobs using Spark APIs and running them remotely on a Databricks cluster instead of a local Spark session. As a result, you can iterate quickly when developing libraries, shut down idle clusters without losing work, run large-scale Spark jobs from any Python application, and debug code in your IDE even when working with a remote cluster.
Availability: This feature was made Generally Available for Python in early June and can be used across AWS, GCP, and Azure.
System tables can be used across your account to observe and assess historical data. There are currently three types of system tables hosted on Databricks: audit logs, billable usage logs, and table and column lineage.
By monitoring system tables, once admins have set up the appropriate permissions, you can gain more insights into an account’s operational data – easily accessed, account-wide and Databricks-hosted.
Availability: This feature is in Public Preview on AWS and Azure.
Also warmly welcomed in Q2 has been the vast ecosystem of the Databricks Marketplace, touted as “an open forum for exchanging data products” and constituting a bridge between data providers and data consumers to help facilitate the discovery and delivery of data sets.
Powered by Delta Sharing, it provides a wide array of data products, all in a convenient and safe location. The data solutions available on this platform – including datasets, notebooks, ML models, and more – do not need to be on the Databricks platform. As a result, consumers can avoid vendor lock-in, while providers can broaden their reach.
Announced in late April, the Databricks Marketplace has been well-received across the board, as it helps address some of the most common pain points experienced both by data providers and consumers.
Availability: Generally Available on AWS, GCP, Microsoft Azure.
Last but not least, we wanted to give you a list of additional updates that we were excited to see but don’t have time to elaborate on, all seen in Q2 of 2023. Of course, should you want to discuss any of these further, the best thing to do is to get in touch with the DATAPAO team right away.
We hope you’ve enjoyed our rundown of the biggest, most useful, and most exciting new features, updates, and changes in Databricks over the second quarter of 2023, which are already making handling data, workloads, and admin more efficient, productive, and easy.
Stay tuned as we’ll continue our quarterly presentations of Databricks updates and what they mean for those of us who have embraced the platform. Also, always keep in mind that your Databricks account can take up to a week or more to be updated after each release as releases are staged.
At DATAPAO, we’ve found these updates to be the most relevant for us and our clients, and we would love to discuss them with you further. Whether you already are on Databricks or are considering migrating, contact us today.