Introducing Python User-Defined Table Functions (UDTFs)

November 7, 2023November 7, 2023Posted by adminLeave a Comment

Apache Spark™ 3.5 and Databricks Runtime 14.0 have brought an exciting feature to the table: Python user-defined table functions (UDTFs). In this blog post, we’ll dive into what UDTFs are, why they are powerful, and how you can use them. What are Python user-defined table functions (UDTFs) A Python user-defined table function (UDTF) is a […]

Arrow-optimized Python UDFs in Apache Spark™ 3.5

November 6, 2023November 6, 2023Posted by adminLeave a Comment

In Apache Spark™, Python User-Defined Functions (UDFs) are among the most popular features. They empower users to craft custom code tailored to their unique data processing needs. However, the current Python UDFs, which rely on cloudpickle for serialization and deserialization, encounter performance bottlenecks, particularly when dealing with large data inputs and outputs. In Apache Spark […]

eBay’s first Chief AI Officer Nitzan Mekel-Bobrov Recognized in Insider’s AI 100 List

November 1, 2023November 1, 2023Posted by adminLeave a Comment

Insider recently compiled its first AI 100 list, a compilation of some of the most important, innovative and influential leaders in the world of artificial intelligence. The list includes representatives from many top-tier technology companies as well as startups, research organizations and labs. eBay’s Chief AI Officer, Nitzan Mekel-Bobrov, was included on the list of […]

eBay Exec on How Artificial Intelligence Will Bring a ‘Paradigm Shift’ to Ecommerce

November 1, 2023November 1, 2023Posted by adminLeave a Comment

Insider recently published a story analyzing AI’s role in the evolution of ecommerce, sharing insight from our own Chief AI Officer Nitzan Mekel-Bobrov. Nitzan says that a larger paradigm shift is on its way, and that our platform’s massive data scale is helping eBay take the lead in generative AI for ecommerce. Nitzan discussed the […]

Announcing MLflow 2.8 LLM-as-a-judge metrics and Best Practices for LLM Evaluation of RAG Applications, Part 2

October 31, 2023October 31, 2023Posted by adminLeave a Comment

Today we’re excited to announce MLflow 2.8 supports our LLM-as-a-judge metrics which can help save time and costs while providing an approximation of human-judged metrics. In our previous report, we discussed how the LLM-as-a-judge technique helped us boost efficiency, cut costs, and maintain over 80% consistency with human scores in the Databricks Documentation AI Assistant, […]

[Big Book of MLOps Updated for Generative AI]

October 30, 2023October 30, 2023Posted by adminLeave a Comment

Last year, we published the Big Book of MLOps, outlining guiding principles, design considerations, and reference architectures for Machine Learning Operations (MLOps). Since then, Databricks has added key features simplifying MLOps, and Generative AI has brought new requirements to MLOps platforms and processes. We are excited to announce a new version of the Big Book […]

Databricks Workspace Administration – Best Practices for Account, Workspace and Metastore Admins

October 30, 2023October 30, 2023Posted by adminLeave a Comment

This blog is part of our Admin Essentials series, where we discuss topics relevant to Databricks administrators. Other blogs include our Workspace Management Best Practices, DR Strategies with Terraform, and many more! Keep an eye out for more content coming soon. In past admin-focused blogs, we have discussed how to establish and maintain a strong […]

Training LLMs at Scale with AMD MI250 GPUs

October 30, 2023October 30, 2023Posted by adminLeave a Comment

Figure 1: Training performance of LLM Foundry and MPT-7B on a multi-node AMD MI250 cluster. As we increase the number of GPUs from 4 x MI250 to 128 x MI250, we see near-linear scaling of training performance (TFLOP/s) and throughput (tokens/sec). Introduction Four months ago, we shared how AMD had emerged as a capable platform […]

LLM-powered data classification for data entities at scale

October 24, 2023October 24, 2023Posted by adminLeave a Comment

Introduction At Grab, we deal with PetaByte-level data and manage countless data entities ranging from database tables to Kafka message schemas. Understanding the data inside is crucial for us, as it not only streamlines the data access management to safeguard the data of our users, drivers and merchant-partners, but also improves the data discovery process […]

Scaling marketing for merchants with targeted and intelligent promos

October 11, 2023October 11, 2023Posted by adminLeave a Comment

Introduction A promotional campaign is a marketing effort that aims to increase sales, customer engagement, or brand awareness for a product, service, or company. The target is to have more orders and sales by assigning promos to consumers within a given budget during the campaign period. Figure 1 – Merchant feedback on marketing From our […]

Category: Engineering