Determine the best technology stack for your web-based projects

In the current technology landscape, startups are developing rapidly. This usually leads to an increase in the number of engineers in teams, with the goal of increasing the speed of product development and delivery frequency. However, this growth often leads to a diverse selection of technology stacks being used by different teams within the same […]

Continue Reading

Fine-Tuning Large Language Models with Hugging Face and DeepSpeed

Large language models (LLMs) are currently in the spotlight following the sensational release of ChatGPT. Many are wondering how to take advantage of models like this in their own applications. However, this is merely one of several advances in transformer-based models, many others of which are open and readily available for tasks like translation, classification, […]

Continue Reading

Building the Lakehouse for Healthcare and Life Sciences – Processing DICOM images at scale with ease

One of the biggest challenges in understanding patient health status and disease progression is unlocking insights from the vast amounts of semi-structured and unstructured data types in healthcare. DICOM, which stands for Digital Imaging and Communications in Medicine, is the standard for the communication and management of medical imaging information. Medical images, encompassing modalities like […]

Continue Reading

How eBay Made Its New Accessibility Tool — And Made It Available to All

There is sometimes a fundamental gap between the engineering and design teams when creating a new product. Designers want their work to be accessible, but many of the available tools are cumbersome, confusing, and come with processes that aren’t well-defined. This can lead to designers delivering their work to engineers without fully baked accessibility, which […]

Continue Reading

Unsupervised Outlier Detection on Databricks

Kakapo (KAH-kə-poh) implements a standard set of APIs for outlier detection at scale on Databricks. It provides an integration of the vast PyOD library of outlier detection algorithms with MLFlow for tracking and packaging of models and hyperopt for exploring vast, complex and heterogeneous search spaces.   The views expressed in this article are privately […]

Continue Reading

Migrating from Role to Attribute-based Access Control

Grab has always regarded security as one of our top priorities; this is especially important for data platform teams. We need to control access to data and resources in order to protect our consumers and ensure compliance with various, continuously evolving security standards. Additionally, we want to keep the process convenient, simple, and easily scalable […]

Continue Reading

Scalable Spark Structured Streaming for REST API Destinations

Spark Structured Streaming is the widely-used open source engine at the foundation of data streaming on the Databricks Lakehouse Platform. It can elegantly handle diverse logical processing at volumes ranging from small-scale ETL to the largest Internet services. This power has led to adoption in many use cases across industries. Another strength of Structured Streaming […]

Continue Reading

Securing GitOps pipelines

Introduction Grab’s real-time data platform team, Coban, has been managing infrastructure resources via Infrastructure-as-code (IaC). Through the IaC approach, Terraform is used to maintain infrastructure consistency, automation, and ease of deployment of our streaming infrastructure, notably: With Grab’s exponential growth, there needs to be a better way to scale infrastructure automatically. Moving towards GitOps processes […]

Continue Reading

Announcing Ray support on Databricks and Apache Spark Clusters

Ray is a prominent compute framework for running scalable AI and Python workloads, offering a variety of distributed machine learning tools, large-scale hyperparameter tuning capabilities, reinforcement learning algorithms, model serving, and more. Similarly, Apache Spark™ provides a wide variety of high-performance algorithms for distributed machine learning through Spark MLlib and deep integrations with machine learning […]

Continue Reading

New Maven Dependency Resolution Algorithm

Introduction Maven is widely used as a Java project build tool at eBay. As an essential component of Maven, maven-resolver resolves declared dependencies, calculates dependency graphs, mediates conflicts and forms the classpaths for compilation and deployment. This is the so-called dependency resolution process.  One of the main impediments to fast iterations of software development was […]

Continue Reading