Graph modelling guidelines

Introduction Graph modelling is a highly effective technique for representing and analysing complex and interconnected data across various domains. By deciphering relationships between entities, graph modelling can reveal insights that might be otherwise difficult to identify using traditional data modelling approaches. In this article, we will explore what graph modelling is and guide you through […]

Continue Reading

Introducing Python User-Defined Table Functions (UDTFs)

Apache Spark™ 3.5 and Databricks Runtime 14.0 have brought an exciting feature to the table: Python user-defined table functions (UDTFs). In this blog post, we’ll dive into what UDTFs are, why they are powerful, and how you can use them. What are Python user-defined table functions (UDTFs) A Python user-defined table function (UDTF) is a […]

Continue Reading

Arrow-optimized Python UDFs in Apache Spark™ 3.5

In Apache Spark™, Python User-Defined Functions (UDFs) are among the most popular features. They empower users to craft custom code tailored to their unique data processing needs. However, the current Python UDFs, which rely on cloudpickle for serialization and deserialization, encounter performance bottlenecks, particularly when dealing with large data inputs and outputs. In Apache Spark […]

Continue Reading

eBay’s first Chief AI Officer Nitzan Mekel-Bobrov Recognized in Insider’s AI 100 List

Insider recently compiled its first AI 100 list, a compilation of some of the most important, innovative and influential leaders in the world of artificial intelligence. The list includes representatives from many top-tier technology companies as well as startups, research organizations and labs.  eBay’s Chief AI Officer, Nitzan Mekel-Bobrov, was included on the list of […]

Continue Reading

eBay Exec on How Artificial Intelligence Will Bring a ‘Paradigm Shift’ to Ecommerce

Insider recently published a story analyzing AI’s role in the evolution of ecommerce, sharing insight from our own Chief AI Officer Nitzan Mekel-Bobrov. Nitzan says that a larger paradigm shift is on its way, and that our platform’s massive data scale is helping eBay take the lead in generative AI for ecommerce. Nitzan discussed the […]

Continue Reading

Announcing MLflow 2.8 LLM-as-a-judge metrics and Best Practices for LLM Evaluation of RAG Applications, Part 2

Today we’re excited to announce MLflow 2.8 supports our LLM-as-a-judge metrics which can help save time and costs while providing an approximation of human-judged metrics. In our previous report, we discussed how the LLM-as-a-judge technique helped us boost efficiency, cut costs, and maintain over 80% consistency with human scores in the Databricks Documentation AI Assistant, […]

Continue Reading

[Big Book of MLOps Updated for Generative AI]

Last year, we published the Big Book of MLOps, outlining guiding principles, design considerations, and reference architectures for Machine Learning Operations (MLOps). Since then, Databricks has added key features simplifying MLOps, and Generative AI has brought new requirements to MLOps platforms and processes. We are excited to announce a new version of the Big Book […]

Continue Reading

Databricks Workspace Administration – Best Practices for Account, Workspace and Metastore Admins

This blog is part of our Admin Essentials series, where we discuss topics relevant to Databricks administrators. Other blogs include our Workspace Management Best Practices, DR Strategies with Terraform, and many more! Keep an eye out for more content coming soon. In past admin-focused blogs, we have discussed how to establish and maintain a strong […]

Continue Reading

Training LLMs at Scale with AMD MI250 GPUs

Figure 1: Training performance of LLM Foundry and MPT-7B on a multi-node AMD MI250 cluster. As we increase the number of GPUs from 4 x MI250 to 128 x MI250, we see near-linear scaling of training performance (TFLOP/s) and throughput (tokens/sec). Introduction Four months ago, we shared how AMD had emerged as a capable platform […]

Continue Reading