Introducing Lakehouse Federation Capabilities in Unity Catalog

Data teams face many challenges to quickly access the right data primarily due to data fragmentation, time and cost involved in consolidating data, and difficulties in managing data governance across many systems. That’s why today at Data+AI Summit, we are thrilled to announce Lakehouse Federation capabilities in Unity Catalog that allow organizations to build a […]

Continue Reading

Project Lightspeed Update – Advancing Apache Spark Structured Streaming

In this blog post, we will review the advancements in Spark Structured Streaming since we announced Project Lightspeed a year ago, from performance improvements to ecosystem expansion and beyond. Before we discuss specific innovations, let’s review a bit of background on how we arrived at the need for Project Lightspeed in the first place. Background […]

Continue Reading

Announcing Delta Lake 3.0 with New Universal Format and Liquid Clustering

We are excited to announce Delta Lake 3.0, the next major release of the Linux Foundation open source Delta Lake Project, available in preview now. We extend our sincere appreciation to the Delta Lake community for their invaluable contributions to this release. Delta Lake 3.0 introduces the following powerful features: Delta Universal Format (UniForm) enables […]

Continue Reading

Introducing English as the New Programming Language for Apache Spark

Introduction We are thrilled to unveil the English SDK for Apache Spark, a transformative tool designed to enrich your Spark experience. Apache Spark™, celebrated globally with over a billion annual downloads from 208 countries and regions, has significantly advanced large-scale data analytics. With the innovative application of Generative AI, our English SDK seeks to expand […]

Continue Reading

Databricks Expands Brickbuilder Solutions for Manufacturing

The combination of scalable, cloud-based advanced analytics with Edge compute is rapidly changing real-time decision-making for Industry 4.0 or Intelligent Manufacturing use cases. When implemented correctly, this combination lowers analytics costs, eliminates data transfer latency and enables higher business impact across the manufacturing value chain. Today, we’re excited to announce that Databricks has collaborated with […]

Continue Reading

Seamlessly Migrate Your Apache Parquet Data Lake to Delta Lake

Apache Parquet is one of the most popular open source file formats in the big data world today. Being column-oriented, Apache Parquet allows for efficient data storage and retrieval, and this has led many organizations over the past decade to adopt it as an essential way to store data in data lakes. Some of these […]

Continue Reading

Adaptive Query Execution in Structured Streaming

In Databricks Runtime, Adaptive Query Execution (AQE) is a performance feature that continuously re-optimizes batch queries using runtime statistics during query execution. Starting from Databricks Runtime 13.1, real-time streaming queries that use the ForeachBatch Sink will also leverage AQE for dynamic re-optimizations as part of Project Lightspeed. Limitations with Static Planning and Statistics At Databricks, […]

Continue Reading

eBay’s Common Automation Solution for Platform Evolution

For any large online business, the platform is a foundational piece. eBay’s platform contains software frameworks and infrastructure in its backend. Because the platform is so important, updates are essential to keeping the applications — including fundamental operations like search and checkout — stable and reliable. At eBay, there are more than 3,000 site applications […]

Continue Reading