Engineering Archives - Page 19 of 30

Introducing Lakehouse Federation Capabilities in Unity Catalog

July 6, 2023July 6, 2023Posted by adminLeave a Comment

Data teams face many challenges to quickly access the right data primarily due to data fragmentation, time and cost involved in consolidating data, and difficulties in managing data governance across many systems. That’s why today at Data+AI Summit, we are thrilled to announce Lakehouse Federation capabilities in Unity Catalog that allow organizations to build a […]

Project Lightspeed Update – Advancing Apache Spark Structured Streaming

July 6, 2023July 6, 2023Posted by adminLeave a Comment

In this blog post, we will review the advancements in Spark Structured Streaming since we announced Project Lightspeed a year ago, from performance improvements to ecosystem expansion and beyond. Before we discuss specific innovations, let’s review a bit of background on how we arrived at the need for Project Lightspeed in the first place. Background […]

Announcing Delta Lake 3.0 with New Universal Format and Liquid Clustering

July 6, 2023July 6, 2023Posted by adminLeave a Comment

We are excited to announce Delta Lake 3.0, the next major release of the Linux Foundation open source Delta Lake Project, available in preview now. We extend our sincere appreciation to the Delta Lake community for their invaluable contributions to this release. Delta Lake 3.0 introduces the following powerful features: Delta Universal Format (UniForm) enables […]

Introducing English as the New Programming Language for Apache Spark

July 6, 2023July 6, 2023Posted by adminLeave a Comment

Introduction We are thrilled to unveil the English SDK for Apache Spark, a transformative tool designed to enrich your Spark experience. Apache Spark™, celebrated globally with over a billion annual downloads from 208 countries and regions, has significantly advanced large-scale data analytics. With the innovative application of Generative AI, our English SDK seeks to expand […]

Go module proxy at Grab

July 6, 2023July 6, 2023Posted by adminLeave a Comment

At Grab, we rely heavily on a large Go monorepo for backend development, which offers benefits like code reusability and discoverability. However, as we continue to grow, managing a large monorepo brings about its own set of unique challenges. As an example, using Go commands such as go get and go list can be incredibly slow when fetching Go […]

Databricks Expands Brickbuilder Solutions for Manufacturing

June 7, 2023June 7, 2023Posted by adminLeave a Comment

The combination of scalable, cloud-based advanced analytics with Edge compute is rapidly changing real-time decision-making for Industry 4.0 or Intelligent Manufacturing use cases. When implemented correctly, this combination lowers analytics costs, eliminates data transfer latency and enables higher business impact across the manufacturing value chain. Today, we’re excited to announce that Databricks has collaborated with […]

Seamlessly Migrate Your Apache Parquet Data Lake to Delta Lake

June 6, 2023June 6, 2023Posted by adminLeave a Comment

Apache Parquet is one of the most popular open source file formats in the big data world today. Being column-oriented, Apache Parquet allows for efficient data storage and retrieval, and this has led many organizations over the past decade to adopt it as an essential way to store data in data lakes. Some of these […]

Adaptive Query Execution in Structured Streaming

June 2, 2023June 2, 2023Posted by adminLeave a Comment

In Databricks Runtime, Adaptive Query Execution (AQE) is a performance feature that continuously re-optimizes batch queries using runtime statistics during query execution. Starting from Databricks Runtime 13.1, real-time streaming queries that use the ForeachBatch Sink will also leverage AQE for dynamic re-optimizations as part of Project Lightspeed. Limitations with Static Planning and Statistics At Databricks, […]

PII masking for privacy-grade machine learning

June 1, 2023June 1, 2023Posted by adminLeave a Comment

At Grab, data engineers work with large sets of data on a daily basis. They design and build advanced machine learning models that provide strategic insights using all of the data that flow through the Grab Platform. This enables us to provide a better experience to our users, for example by increasing the supply of […]

eBay’s Common Automation Solution for Platform Evolution

May 30, 2023May 30, 2023Posted by adminLeave a Comment

For any large online business, the platform is a foundational piece. eBay’s platform contains software frameworks and infrastructure in its backend. Because the platform is so important, updates are essential to keeping the applications — including fundamental operations like search and checkout — stable and reliable. At eBay, there are more than 3,000 site applications […]

Category: Engineering

Introducing Lakehouse Federation Capabilities in Unity Catalog

Project Lightspeed Update – Advancing Apache Spark Structured Streaming

Announcing Delta Lake 3.0 with New Universal Format and Liquid Clustering

Introducing English as the New Programming Language for Apache Spark

Go module proxy at Grab

Databricks Expands Brickbuilder Solutions for Manufacturing

Seamlessly Migrate Your Apache Parquet Data Lake to Delta Lake

Adaptive Query Execution in Structured Streaming

PII masking for privacy-grade machine learning

eBay’s Common Automation Solution for Platform Evolution

Categories

Latest News

Local and landscape scale factors influence pollinators at solar parks – The Applied Ecologist

World News in Brief: Casualties in Ukraine, Burkina Faso aid helicopter blast, Uganda urged to release opposition leaders

Secretary of State Marco Rubio with Margaret Brennan of CBS Face the Nation

UN highlights need for peaceful resolution, as Trump and Putin prepare to meet on Ukraine

From dough to dough: Bahraini chefs rise with sweet, spicy success

Understanding tick immunity may be key to preventing killer viruses from spreading

Pages

Enjoy this blog? Please spread the word :)