Databricks Expands Brickbuilder Solutions for Manufacturing

The combination of scalable, cloud-based advanced analytics with Edge compute is rapidly changing real-time decision-making for Industry 4.0 or Intelligent Manufacturing use cases. When implemented correctly, this combination lowers analytics costs, eliminates data transfer latency and enables higher business impact across the manufacturing value chain. Today, we’re excited to announce that Databricks has collaborated with […]

Continue Reading

Seamlessly Migrate Your Apache Parquet Data Lake to Delta Lake

Apache Parquet is one of the most popular open source file formats in the big data world today. Being column-oriented, Apache Parquet allows for efficient data storage and retrieval, and this has led many organizations over the past decade to adopt it as an essential way to store data in data lakes. Some of these […]

Continue Reading

Adaptive Query Execution in Structured Streaming

In Databricks Runtime, Adaptive Query Execution (AQE) is a performance feature that continuously re-optimizes batch queries using runtime statistics during query execution. Starting from Databricks Runtime 13.1, real-time streaming queries that use the ForeachBatch Sink will also leverage AQE for dynamic re-optimizations as part of Project Lightspeed. Limitations with Static Planning and Statistics At Databricks, […]

Continue Reading

eBay’s Common Automation Solution for Platform Evolution

For any large online business, the platform is a foundational piece. eBay’s platform contains software frameworks and infrastructure in its backend. Because the platform is so important, updates are essential to keeping the applications — including fundamental operations like search and checkout — stable and reliable. At eBay, there are more than 3,000 site applications […]

Continue Reading

Performance bottlenecks of Go application on Kubernetes with non-integer (floating) CPU allocation

Grab’s real-time data platform team, Coban, has been running its stream processing framework on Kubernetes, as detailed in Plumbing at scale. We’ve also written another article (Scaling Kafka consumers) about vertical pod autoscaling (VPA) and the benefits of using it. In this article, we cover the performance bottlenecks and other issues we came across for […]

Continue Reading

How we improved our iOS CI infrastructure with observability tools

Note: Timestamps used in this article are in UTC+8 Singapore time, unless stated otherwise. Background When we upgraded to Xcode 13.1 in April 2022, we noticed a few issues such as instability of the CI tests and other problems related to the switch to Xcode 13.1.  After taking a step back, we investigated this issue […]

Continue Reading

Understanding Caching in Databricks SQL: UI, Result, and Disk Caches

Caching is an essential technique for improving the performance of data warehouse systems by avoiding the need to recompute or fetch the same data multiple times. In Databricks SQL, caching can significantly speed up query execution and minimize warehouse usage, resulting in lower costs and more efficient resource utilization. This article will explore the benefits […]

Continue Reading

Announcing the General Availability of Databricks SQL Serverless !

Today, we are thrilled to announce that serverless compute for Databricks SQL is Generally Available on AWS and Azure! Databricks SQL (DB SQL) Serverless provides the best performance with instant and elastic compute, lowers costs, and enables you to focus on delivering the most value to your business rather than managing infrastructure. With GA, you […]

Continue Reading

Latency goes subsecond in Apache Spark Structured Streaming

Apache Spark Structured Streaming is the leading open source stream processing platform. It is also the core technology that powers streaming on the Databricks Lakehouse Platform and provides a unified API for batch and stream processing. As the adoption of streaming is growing rapidly, diverse applications want to take advantage of it for real time […]

Continue Reading