
Smart Migration to the Cloud: Everything You Want to Know
Cloud data infrastructure can boost innovation and optimize operations for any company. But how do you migrate smartly?
Azure Databricks provides a range of tools and features to help businesses and data teams derive insights from their data. With each release, Databricks adds new enhancements and capabilities that make it easier to build and deploy data-driven applications.
Today, we’ll look at three new features from Q1 2023, as seen in Databricks Runtime LTS 12.2 and Beta 13.0. For data leaders and data practitioners, it’s essential to know the latest tools and technologies to derive insights from your data, automate machine learning workflows, and make your data engineering pipelines more robust.
In this guide, we’ll explore:
Meanwhile, in the second part of our Databricks Features Round-up for 2023, you can read about updates to workflows, governance and runtime.
DO YOU WANT TO OPTIMIZE DATABRICKS SETUP AND MAKE THE MOST OF ITS NEW FEATURES?
SPEAK TO US TODAY.
SQL Warehouse (formerly known as SQL Endpoints) is a scalable compute resource that lets you run SQL commands on data objects within your Lakehouse in the Databricks environment.
A compute resource contains multiple clusters as per your chosen Warehouse size – which follows so-called t-shirt sizing, with different instance types in Azure, AWS, and GCP.
Serverless Warehouse setups serve your business intelligence and SQL workloads and lets you create, store and visualize your own queries and create your own dashboards.
Note that there is no lock-in. You can connect your business intelligence tools, such as Power BI, Tableau, or Looker to empower your teams to analyze the data on their own.
Databricks differentiates three types of Warehouses: Classic, Pro, and Serverless.
The Classic and Pro are self-managed on your account, meaning you need to have the computation resources on your cloud account. You need to have your own capacity management and it takes up to some minutes to get access to a Warehouse.
You need to have long-running warehouses to avoid the idle time for the initialization (while you will pay for that time) and they also have reduced concurrency.
To solve these pain points, the SQL Serveless Warehouse can be the best option for many. Let’s explore why.
SQL Serverless provides an excellent customer experience with a fully managed, instant, and elastic serverless data warehouse:
Instant, elastic compute
Zero management
Use Classic | Use Pro | Use Serverless |
---|---|---|
if you have relational data, or for development purposes when concurrency, idle time, and cost don’t matter and Serverless is unavailable in your region. | for optimized cloud-based workloads, including ETLs and Production workloads and you want to rely on the above-mentioned features. | for the best experience for both types of workloads and instant compute with the best available performance. Rely on Databricks management and industry-leading solutions to focus on what you are really good at, build workloads, and derive insights. |
A new capability on Databricks, Model Serving offers a production-ready environment for serving your ML models. Previously known as Serverless Real-Time Inference, this service exposes your machine learning models as scalable REST API endpoints and provides highly available and low-latency services for deploying models.
Productionizing ML models has its pain points. In fact, most machine learning models don’t actually make it into production. There are several challenges that come with building real-time ML models, including:
DO YOU HAVE A DATA CHALLENGE?
WE CAN HELP. GET IN TOUCH TODAY TO GET YOUR FREE 30-MINUTE CONSULTATION!
Databricks’ Model Serving can help you solve many challenges of productionizing machine learning models. You don’t need to manage infrastructure or spend time creating a scalable infrastructure. Deploying models with Model Serving does not require engineering resources, and it is very simple to server models.
Production-grade serving | Accelerate deployments with Lakehouse integration | Simplified deployment |
---|---|---|
Highly available, low latency, scalable serving that works for small and large workloads | Fully integrates with other products on your Lakehouse Provides automatic feature lookups, monitoring, and unified governance | Simple and flexible deployment through UI or API Create model endpoints with few clicks and deploy your models to them |
Model Serving can be implemented with various modes, depending on the specific requirements and constraints of your application.
As code / containers | Batch & streaming | Realtime (low latency) | Embedded (device, edge) |
---|---|---|---|
Ensures reproducibility in other systems, resolves compatibility and environment issues | Batch: Great for high-latency scenarios. Leverages databases or object storage. Fast retrieval of stored predictions Streaming: Stream processing, fast scoring on new data | Low latency scoring and high availability. Usually uses REST | Special case deployments, great for limited connectivity with cloud services |
Databricks Visual Studio Code extension is a powerful tool that allows you to develop and manage your Databricks projects directly from your local Visual Studio Code environment.
The Databricks Extension for VS Code was announced on Valentine’s Day 2023, and what a lovely day it was!
With this extension, you can easily synchronize your local code with code in remote workspaces, and securely connect and manage remote clusters by starting and stopping them.
You can also run local Python code files on Databricks clusters in remote workspaces and run local Python, SQL, R and Scala notebooks as automated Databricks workflows jobs in remote workspaces.
The Databricks VS Code extension provides a native experience that allows you to write, test and run your code. This means you can minimize context switching and use your current workflow to develop your projects. Also, you can run your unit tests locally – for example, run Python unit tests using pytest – apply software engineering best practices and utilize VS Code’s native capabilities for editing, refactoring, testing and CI/CD for your data and AI applications.
The idea is to provide the best experience for teams who rely on IDEs for their development process. Additional support will be rolling out for other IDEs and additional tools, such as the Databricks Connect V2, which we will cover in our Q2 round-up.
The new Databricks features we’ve looked into above can enhance collaboration, productivity and flexibility across the board.
By taking advantage of ways to ingest, store and query your data more efficiently and with better compatibility, you can work better, reduce strain on resources, and do even more with your data.
For those looking to take advantage of the vast potential of machine learning and artificial intelligence at large in their operations, the Model Building feature will prove a key tool.
We hope you’ve enjoyed our breakdown of these Databricks updates, and make sure you also join us for the second part of this exploration, available here.