Building the Lakehouse for Healthcare and Life Sciences – Processing DICOM images at scale with ease

One of the biggest challenges in understanding patient health status and disease progression is unlocking insights from the vast amounts of semi-structured and unstructured data types in healthcare. DICOM, which stands for Digital Imaging and Communications in Medicine, is the standard for the communication and management of medical imaging information. Medical images, encompassing modalities like […]

Continue Reading

How eBay Made Its New Accessibility Tool — And Made It Available to All

There is sometimes a fundamental gap between the engineering and design teams when creating a new product. Designers want their work to be accessible, but many of the available tools are cumbersome, confusing, and come with processes that aren’t well-defined. This can lead to designers delivering their work to engineers without fully baked accessibility, which […]

Continue Reading

Unsupervised Outlier Detection on Databricks

Kakapo (KAH-kə-poh) implements a standard set of APIs for outlier detection at scale on Databricks. It provides an integration of the vast PyOD library of outlier detection algorithms with MLFlow for tracking and packaging of models and hyperopt for exploring vast, complex and heterogeneous search spaces.   The views expressed in this article are privately […]

Continue Reading

Migrating from Role to Attribute-based Access Control

Grab has always regarded security as one of our top priorities; this is especially important for data platform teams. We need to control access to data and resources in order to protect our consumers and ensure compliance with various, continuously evolving security standards. Additionally, we want to keep the process convenient, simple, and easily scalable […]

Continue Reading

Scalable Spark Structured Streaming for REST API Destinations

Spark Structured Streaming is the widely-used open source engine at the foundation of data streaming on the Databricks Lakehouse Platform. It can elegantly handle diverse logical processing at volumes ranging from small-scale ETL to the largest Internet services. This power has led to adoption in many use cases across industries. Another strength of Structured Streaming […]

Continue Reading

Securing GitOps pipelines

Introduction Grab’s real-time data platform team, Coban, has been managing infrastructure resources via Infrastructure-as-code (IaC). Through the IaC approach, Terraform is used to maintain infrastructure consistency, automation, and ease of deployment of our streaming infrastructure, notably: With Grab’s exponential growth, there needs to be a better way to scale infrastructure automatically. Moving towards GitOps processes […]

Continue Reading

Announcing Ray support on Databricks and Apache Spark Clusters

Ray is a prominent compute framework for running scalable AI and Python workloads, offering a variety of distributed machine learning tools, large-scale hyperparameter tuning capabilities, reinforcement learning algorithms, model serving, and more. Similarly, Apache Spark™ provides a wide variety of high-performance algorithms for distributed machine learning through Spark MLlib and deep integrations with machine learning […]

Continue Reading

New Maven Dependency Resolution Algorithm

Introduction Maven is widely used as a Java project build tool at eBay. As an essential component of Maven, maven-resolver resolves declared dependencies, calculates dependency graphs, mediates conflicts and forms the classpaths for compilation and deployment. This is the so-called dependency resolution process.  One of the main impediments to fast iterations of software development was […]

Continue Reading

New zoom freezing feature for Geohash plugin

Introduction Geohash is an encoding system with a unique identifier for each region on the planet. Therefore, all geohash units can be associated with an individual set of digits and letters. Geohash is a plugin built by Grab that is available in the Java OpenStreetMap Editor (JOSM) tool, which comes in handy for those who […]

Continue Reading

Accelerate your model development with the new MLflow Experiments UI

MLflow is the premier platform for model development and experimentation. Thousands of data scientists use MLflow Experiment Tracking every day to find the best candidate models through a powerful GUI-based experience which allows them to view, filter, and sort models based on parameters, performance metrics, and source information. Today, we are thrilled to announce several […]

Continue Reading