Rethinking Stream Processing: Data Exploration

Introduction In this digital age, companies collect multitudes of data that enable the tracking of business metrics and performance. Over the years, data analytics tools for data storage and processing have evolved from the days of Excel sheets and macros to more advanced Map Reduce model tools like Spark, Hadoop, and Hive. This evolution has […]

Continue Reading

Announcing Ray Autoscaling support on Databricks and Apache Spark™

Ray is an open-source unified compute framework that simplifies scaling AI and Python workloads in a distributed environment. Since we introduced support for running Ray on Databricks, we’ve witnessed numerous customers successfully deploying their machine learning use cases, which range from forecasting and deep reinforcement learning to fine-tuning LLMs. With the release of Ray version […]

Continue Reading

LLM Training and Inference with Intel Gaudi 2 AI Accelerators

At Databricks, we want to help our customers build and deploy generative AI applications on their own data without sacrificing data privacy or control. For customers who want to train a custom AI model, we help them do so easily, efficiently, and at a low cost. One lever we have to address this challenge is […]

Continue Reading

Parameterized queries with PySpark | Databricks Blog

PySpark has always provided wonderful SQL and Python APIs for querying data. As of Databricks Runtime 12.1 and Apache Spark 3.4, parameterized queries support safe and expressive ways to query data with SQL using Pythonic programming paradigms. This post explains how to make parameterized queries with PySpark and when this is a good design pattern […]

Continue Reading

Kafka on Kubernetes: Reloaded for fault tolerance

Introduction Coban – Grab’s real-time data streaming platform – has been operating Kafka on Kubernetes with Strimzi in production for about two years. In a previous article (Zero trust with Kafka), we explained how we leveraged Strimzi to enhance the security of our data streaming offering. In this article, we are going to describe how […]

Continue Reading

Introducing Mixtral 8x7B with Databricks Model Serving

Today, Databricks is excited to announce support for Mixtral 8x7B in Model Serving. Mixtral 8x7B is a sparse Mixture of Experts (MoE) open language model that outperforms or matches many state-of-the-art models. It has the ability to handle long context lengths of up to 32k tokens (approximately 50 pages of text), and its MoE architecture […]

Continue Reading

New Social Caption Generator Uses AI to Help Sellers Post More Easily

For eBay sellers, social media can be an invaluable tool for publicizing, promoting, and popularizing their listings — but figuring out the best phrasing can feel challenging. A new eBay feature, powered by generative AI, makes the process of drafting a post as easy as clicking a button. We’re excited to announce AI-generated social media […]

Continue Reading

Grab’s bug bounty programme in 2023

Launched in 2015, Grab’s Security bug bounty programme has achieved remarkable success and forged strong partnerships within a thriving bounty community. By holding quarterly campaigns with HackerOne, Grab has been dedicated to security and giving back to the global security community to research further. Over the years, Grab has paid over $700,000 in cumulative payments […]

Continue Reading

Offline LLM Evaluation: Step-by-Step GenAI Application Assessment on Databricks

Background In an era where Retrieval-Augmented Generation (RAG) is revolutionizing the way we interact with AI-driven applications, ensuring the efficiency and effectiveness of these systems has never been more essential. Databricks and MLflow are at the forefront of this innovation, offering streamlined solutions for the critical evaluation of GenAI applications.  This blog post guides you […]

Continue Reading

Sliding window rate limits in distributed systems

Like many other companies, Grab uses marketing communications to notify users of promotions or other news. If a user receives these notifications from multiple companies, it would be a form of information overload and they might even start considering these communications as spam. Over time, this could lead to some users revoking their consent to receive […]

Continue Reading