Arrow-optimized Python UDFs in Apache Spark™ 3.5

In Apache Spark™, Python User-Defined Functions (UDFs) are among the most popular features. They empower users to craft custom code tailored to their unique data processing needs. However, the current Python UDFs, which rely on cloudpickle for serialization and deserialization, encounter performance bottlenecks, particularly when dealing with large data inputs and outputs. In Apache Spark […]

Continue Reading

eBay’s first Chief AI Officer Nitzan Mekel-Bobrov Recognized in Insider’s AI 100 List

Insider recently compiled its first AI 100 list, a compilation of some of the most important, innovative and influential leaders in the world of artificial intelligence. The list includes representatives from many top-tier technology companies as well as startups, research organizations and labs.  eBay’s Chief AI Officer, Nitzan Mekel-Bobrov, was included on the list of […]

Continue Reading

eBay Exec on How Artificial Intelligence Will Bring a ‘Paradigm Shift’ to Ecommerce

Insider recently published a story analyzing AI’s role in the evolution of ecommerce, sharing insight from our own Chief AI Officer Nitzan Mekel-Bobrov. Nitzan says that a larger paradigm shift is on its way, and that our platform’s massive data scale is helping eBay take the lead in generative AI for ecommerce. Nitzan discussed the […]

Continue Reading

Announcing MLflow 2.8 LLM-as-a-judge metrics and Best Practices for LLM Evaluation of RAG Applications, Part 2

Today we’re excited to announce MLflow 2.8 supports our LLM-as-a-judge metrics which can help save time and costs while providing an approximation of human-judged metrics. In our previous report, we discussed how the LLM-as-a-judge technique helped us boost efficiency, cut costs, and maintain over 80% consistency with human scores in the Databricks Documentation AI Assistant, […]

Continue Reading

[Big Book of MLOps Updated for Generative AI]

Last year, we published the Big Book of MLOps, outlining guiding principles, design considerations, and reference architectures for Machine Learning Operations (MLOps). Since then, Databricks has added key features simplifying MLOps, and Generative AI has brought new requirements to MLOps platforms and processes. We are excited to announce a new version of the Big Book […]

Continue Reading

Databricks Workspace Administration – Best Practices for Account, Workspace and Metastore Admins

This blog is part of our Admin Essentials series, where we discuss topics relevant to Databricks administrators. Other blogs include our Workspace Management Best Practices, DR Strategies with Terraform, and many more! Keep an eye out for more content coming soon. In past admin-focused blogs, we have discussed how to establish and maintain a strong […]

Continue Reading

Training LLMs at Scale with AMD MI250 GPUs

Figure 1: Training performance of LLM Foundry and MPT-7B on a multi-node AMD MI250 cluster. As we increase the number of GPUs from 4 x MI250 to 128 x MI250, we see near-linear scaling of training performance (TFLOP/s) and throughput (tokens/sec). Introduction Four months ago, we shared how AMD had emerged as a capable platform […]

Continue Reading

LLM-powered data classification for data entities at scale

Introduction At Grab, we deal with PetaByte-level data and manage countless data entities ranging from database tables to Kafka message schemas. Understanding the data inside is crucial for us, as it not only streamlines the data access management to safeguard the data of our users, drivers and merchant-partners, but also improves the data discovery process […]

Continue Reading

Scaling marketing for merchants with targeted and intelligent promos

Introduction A promotional campaign is a marketing effort that aims to increase sales, customer engagement, or brand awareness for a product, service, or company. The target is to have more orders and sales by assigning promos to consumers within a given budget during the campaign period. Figure 1 – Merchant feedback on marketing From our […]

Continue Reading

A Pattern for the Lightweight Deployment of Distributed XGBoost and LightGBM Models

A common challenge data scientists encounter when developing machine learning solutions is training a model on a dataset that is too large to fit into a server’s memory. We encounter this when we wish to train a model to predict customer churn or propensity and need to deal with tens of millions of unique customers. […]

Continue Reading