2.3x faster using the Go plugin to replace Lua virtual machine

Abstract We’re excited to share with you the latest update on our open-source project Talaria. In our efforts to improve performance and overcome infrastructure limitations, we’ve made significant strides by implementing the Go plugin to replace Lua VM. Our team has found that the Go plugin is roughly 2.3x faster and uses 2.3x less memory […]

Continue Reading

Building Data Applications on the Lakehouse With the Databricks SQL Driver for Node.js

We are excited to announce the general availability of the Databricks SQL Driver for NodeJS. This follows the recent general availability of Databricks SQL Driver for GO and the earlier Databricks SQL Connector for Python. Node.js developers can now easily build data applications on the lakehouse in pure Javascript or TypeScript. The NodeJS driver offers […]

Continue Reading

Announcing the Public Preview of Predictive I/O for Updates

Previously, we’ve shown you how a new technology called Predictive I/O could improve selective reads by up to 35x for CDW customers without any knobs. Today, we are excited to announce the public preview of another innovative leap, Predictive I/O for Updates, providing you with up to 10x faster MERGE, UPDATE, and DELETE query performance. […]

Continue Reading

Announcing the General Availability of Predictive I/O for Reads

Today, we are excited to announce the general availability of Predictive I/O for Databricks SQL (DB SQL): a machine learning powered feature to make your point lookups faster and cheaper. Predictive I/O leverages the years of experience Databricks has in building large AI/ML systems to make the Lakehouse the smartest data warehouse with no additional […]

Continue Reading

Actioning Customer Reviews at Scale with Databricks SQL AI Functions

Every morning Susan walks straight into a storm of messages, and doesn’t know where to start! Susan is a customer success specialist at a global retailer, and her primary objective is to ensure customers are happy and receive personalised service whenever they encounter issues. Overnight the company receives hundreds of reviews and feedback across multiple […]

Continue Reading

Unifying Your Data Ecosystem with Delta Lake Integration

As organizations are maturing their data infrastructure and accumulating more data than ever before in their data lakes, Open and Reliable table formats such as Delta Lake become a critical necessity. Thousands of companies are already using Delta Lake in production, and open-sourcing all of Delta Lake (as announced in June 2022) has further increased […]

Continue Reading

Securing Databricks cluster init scripts

This blog was co-authored by Elia Florio, Sr. Director of Detection & Response at Databricks and Florian Roth and Marius Bartholdy, security researchers with SEC-Consult.   Protecting the Databricks platform and continuously raising the bar with security improvements is the mission of our Security team and the main reason why we invest in our bug […]

Continue Reading

Safer deployment of streaming applications

The Flink framework has gained popularity as a real-time stateful stream processing solution for distributed stream and batch data processing. Flink also provides data distribution, communication, and fault tolerance for distributed computations over data streams. To fully leverage Flink’s features, Coban, Grab’s real-time data platform team, has adopted Flink as part of our service offerings. In […]

Continue Reading

eBay’s Blazingly Fast Billion-Scale Vector Similarity Engine

Introduction Often, ecommerce marketplaces provide buyers with listings similar to those previously visited by the buyer, as well as a personalized shopping experience based on profiles, past shopping histories and behavior signals such as clicks, views and additions to cart. These are vital to the shopping experience, and so it’s equally vital that we continuously […]

Continue Reading

Databricks ❤️ Hugging Face – The Databricks Blog

Generative AI has been taking the world by storm. As the data and AI company, we have been on this journey with the release of the open source large language model Dolly, as well as the internally crowdsourced dataset licensed for research and commercial use that we used to fine-tune it, the databricks-dolly-15k. Both the […]

Continue Reading