Introducing Materialized Views and Streaming Tables for Databricks SQL

[ad_1] We are thrilled to announce that materialized views and streaming tables are now publicly available in Databricks SQL on AWS and Azure. Streaming tables provide incremental ingest from cloud storage and message queues. Materialized views are automatically and incrementally updated as new data arrives. Together, these two capabilities enable infrastructure-free data pipelines that are […]

Continue Reading

Introducing Lakehouse Federation Capabilities in Unity Catalog

[ad_1] Data teams face many challenges to quickly access the right data primarily due to data fragmentation, time and cost involved in consolidating data, and difficulties in managing data governance across many systems. That’s why today at Data+AI Summit, we are thrilled to announce Lakehouse Federation capabilities in Unity Catalog that allow organizations to build […]

Continue Reading

Project Lightspeed Update – Advancing Apache Spark Structured Streaming

[ad_1] In this blog post, we will review the advancements in Spark Structured Streaming since we announced Project Lightspeed a year ago, from performance improvements to ecosystem expansion and beyond. Before we discuss specific innovations, let’s review a bit of background on how we arrived at the need for Project Lightspeed in the first place. […]

Continue Reading

Announcing Delta Lake 3.0 with New Universal Format and Liquid Clustering

[ad_1] We are excited to announce Delta Lake 3.0, the next major release of the Linux Foundation open source Delta Lake Project, available in preview now. We extend our sincere appreciation to the Delta Lake community for their invaluable contributions to this release. Delta Lake 3.0 introduces the following powerful features: Delta Universal Format (UniForm) […]

Continue Reading

Introducing English as the New Programming Language for Apache Spark

[ad_1] Introduction We are thrilled to unveil the English SDK for Apache Spark, a transformative tool designed to enrich your Spark experience. Apache Spark™, celebrated globally with over a billion annual downloads from 208 countries and regions, has significantly advanced large-scale data analytics. With the innovative application of Generative AI, our English SDK seeks to […]

Continue Reading

Databricks Expands Brickbuilder Solutions for Manufacturing

[ad_1] The combination of scalable, cloud-based advanced analytics with Edge compute is rapidly changing real-time decision-making for Industry 4.0 or Intelligent Manufacturing use cases. When implemented correctly, this combination lowers analytics costs, eliminates data transfer latency and enables higher business impact across the manufacturing value chain. Today, we’re excited to announce that Databricks has collaborated […]

Continue Reading

Seamlessly Migrate Your Apache Parquet Data Lake to Delta Lake

[ad_1] Apache Parquet is one of the most popular open source file formats in the big data world today. Being column-oriented, Apache Parquet allows for efficient data storage and retrieval, and this has led many organizations over the past decade to adopt it as an essential way to store data in data lakes. Some of […]

Continue Reading

Adaptive Query Execution in Structured Streaming

[ad_1] In Databricks Runtime, Adaptive Query Execution (AQE) is a performance feature that continuously re-optimizes batch queries using runtime statistics during query execution. Starting from Databricks Runtime 13.1, real-time streaming queries that use the ForeachBatch Sink will also leverage AQE for dynamic re-optimizations as part of Project Lightspeed. Limitations with Static Planning and Statistics At […]

Continue Reading