Adaptive Query Execution in Structured Streaming

In Databricks Runtime, Adaptive Query Execution (AQE) is a performance feature that continuously re-optimizes batch queries using runtime statistics during query execution. Starting from Databricks Runtime 13.1, real-time streaming queries that use the ForeachBatch Sink will also leverage AQE for dynamic re-optimizations as part of Project Lightspeed. Limitations with Static Planning and Statistics At Databricks, […]

Continue Reading

eBay’s Common Automation Solution for Platform Evolution

For any large online business, the platform is a foundational piece. eBay’s platform contains software frameworks and infrastructure in its backend. Because the platform is so important, updates are essential to keeping the applications — including fundamental operations like search and checkout — stable and reliable. At eBay, there are more than 3,000 site applications […]

Continue Reading

Performance bottlenecks of Go application on Kubernetes with non-integer (floating) CPU allocation

Grab’s real-time data platform team, Coban, has been running its stream processing framework on Kubernetes, as detailed in Plumbing at scale. We’ve also written another article (Scaling Kafka consumers) about vertical pod autoscaling (VPA) and the benefits of using it. In this article, we cover the performance bottlenecks and other issues we came across for […]

Continue Reading

How we improved our iOS CI infrastructure with observability tools

Note: Timestamps used in this article are in UTC+8 Singapore time, unless stated otherwise. Background When we upgraded to Xcode 13.1 in April 2022, we noticed a few issues such as instability of the CI tests and other problems related to the switch to Xcode 13.1.  After taking a step back, we investigated this issue […]

Continue Reading

Understanding Caching in Databricks SQL: UI, Result, and Disk Caches

Caching is an essential technique for improving the performance of data warehouse systems by avoiding the need to recompute or fetch the same data multiple times. In Databricks SQL, caching can significantly speed up query execution and minimize warehouse usage, resulting in lower costs and more efficient resource utilization. This article will explore the benefits […]

Continue Reading

Announcing the General Availability of Databricks SQL Serverless !

Today, we are thrilled to announce that serverless compute for Databricks SQL is Generally Available on AWS and Azure! Databricks SQL (DB SQL) Serverless provides the best performance with instant and elastic compute, lowers costs, and enables you to focus on delivering the most value to your business rather than managing infrastructure. With GA, you […]

Continue Reading

Latency goes subsecond in Apache Spark Structured Streaming

Apache Spark Structured Streaming is the leading open source stream processing platform. It is also the core technology that powers streaming on the Databricks Lakehouse Platform and provides a unified API for batch and stream processing. As the adoption of streaming is growing rapidly, diverse applications want to take advantage of it for real time […]

Continue Reading

2.3x faster using the Go plugin to replace Lua virtual machine

Abstract We’re excited to share with you the latest update on our open-source project Talaria. In our efforts to improve performance and overcome infrastructure limitations, we’ve made significant strides by implementing the Go plugin to replace Lua VM. Our team has found that the Go plugin is roughly 2.3x faster and uses 2.3x less memory […]

Continue Reading

Building Data Applications on the Lakehouse With the Databricks SQL Driver for Node.js

We are excited to announce the general availability of the Databricks SQL Driver for NodeJS. This follows the recent general availability of Databricks SQL Driver for GO and the earlier Databricks SQL Connector for Python. Node.js developers can now easily build data applications on the lakehouse in pure Javascript or TypeScript. The NodeJS driver offers […]

Continue Reading