Enabling near real-time data analytics on the data lake

Introduction In the domain of data processing, data analysts run their ad hoc queries on the data lake. The lake serves as an interface between our analytics and production environment, preventing downstream queries from impacting upstream data ingestion pipelines. To ensure efficient data processing in the data lake, choosing appropriate storage formats is crucial. The […]

Continue Reading

The journey of building a comprehensive attribution platform

The Grab superapp offers a comprehensive array of services from ride-hailing and food delivery to financial services. This creates multifaceted user journeys, traversing homepages, product pages, checkouts, and interactions with diverse content, including advertisements and promo codes. Background: Why ads and attribution matter in our superapp Ads are crucial for Grab in driving user engagement […]

Continue Reading

Meet the Winners of the 5th eBay University Machine Learning Challenge

Five university students are headed to eBay for summer internships this year after claiming top prize at the 2023 eBay University Machine Learning Challenge. The annual competition asks students to dream up innovative solutions to real-world ecommerce problems, rewarding top teams with valuable work experience while recruiting promising new talent to the company. This year’s […]

Continue Reading

Databricks adds new migration Brickbuilder Solutions to help customers succeed with AI

For the past two years, Databricks has collaborated with leading consulting partners to build innovative solutions for industry, migration, and data and AI use cases. Based on a foundation of proven customer deployments, Databricks Brickbuilder Solutions and Accelerators package together the experience and knowledge of our partners to help businesses unlock the full potential of […]

Continue Reading

Lauren Wilcox Named 2023 ACM Distinguished Member

Today, we’re pleased to share that Lauren Wilcox, Sr. Director of Responsible AI at eBay, was named a 2023 ACM Distinguished Member by the Association for Computing Machinery (ACM). This prestigious recognition is awarded to those who have made significant contributions to the field of computing. Wilcox was nominated based on her research contributions in […]

Continue Reading

Grab’s approach to content moderation

In the fast-paced world of on-demand delivery, maintaining safe marketplaces is a complex undertaking. Grab, a leading superapp in Southeast Asia, operates GrabFood and GrabMart, two popular marketplaces that connect consumers with a wide range of food and daily necessities. With more than 100k listings for different items updated daily by our merchants across eight different countries, Grab […]

Continue Reading

Rethinking Stream Processing: Data Exploration

Introduction In this digital age, companies collect multitudes of data that enable the tracking of business metrics and performance. Over the years, data analytics tools for data storage and processing have evolved from the days of Excel sheets and macros to more advanced Map Reduce model tools like Spark, Hadoop, and Hive. This evolution has […]

Continue Reading

Announcing Ray Autoscaling support on Databricks and Apache Spark™

Ray is an open-source unified compute framework that simplifies scaling AI and Python workloads in a distributed environment. Since we introduced support for running Ray on Databricks, we’ve witnessed numerous customers successfully deploying their machine learning use cases, which range from forecasting and deep reinforcement learning to fine-tuning LLMs. With the release of Ray version […]

Continue Reading

LLM Training and Inference with Intel Gaudi 2 AI Accelerators

At Databricks, we want to help our customers build and deploy generative AI applications on their own data without sacrificing data privacy or control. For customers who want to train a custom AI model, we help them do so easily, efficiently, and at a low cost. One lever we have to address this challenge is […]

Continue Reading

Parameterized queries with PySpark | Databricks Blog

PySpark has always provided wonderful SQL and Python APIs for querying data. As of Databricks Runtime 12.1 and Apache Spark 3.4, parameterized queries support safe and expressive ways to query data with SQL using Pythonic programming paradigms. This post explains how to make parameterized queries with PySpark and when this is a good design pattern […]

Continue Reading