Databricks Workspace Administration – Best Practices for Account, Workspace and Metastore Admins

This blog is part of our Admin Essentials series, where we discuss topics relevant to Databricks administrators. Other blogs include our Workspace Management Best Practices, DR Strategies with Terraform, and many more! Keep an eye out for more content coming soon. In past admin-focused blogs, we have discussed how to establish and maintain a strong […]

Continue Reading

Training LLMs at Scale with AMD MI250 GPUs

Figure 1: Training performance of LLM Foundry and MPT-7B on a multi-node AMD MI250 cluster. As we increase the number of GPUs from 4 x MI250 to 128 x MI250, we see near-linear scaling of training performance (TFLOP/s) and throughput (tokens/sec). Introduction Four months ago, we shared how AMD had emerged as a capable platform […]

Continue Reading

LLM-powered data classification for data entities at scale

Introduction At Grab, we deal with PetaByte-level data and manage countless data entities ranging from database tables to Kafka message schemas. Understanding the data inside is crucial for us, as it not only streamlines the data access management to safeguard the data of our users, drivers and merchant-partners, but also improves the data discovery process […]

Continue Reading

Scaling marketing for merchants with targeted and intelligent promos

Introduction A promotional campaign is a marketing effort that aims to increase sales, customer engagement, or brand awareness for a product, service, or company. The target is to have more orders and sales by assigning promos to consumers within a given budget during the campaign period. Figure 1 – Merchant feedback on marketing From our […]

Continue Reading

A Pattern for the Lightweight Deployment of Distributed XGBoost and LightGBM Models

A common challenge data scientists encounter when developing machine learning solutions is training a model on a dataset that is too large to fit into a server’s memory. We encounter this when we wish to train a model to predict customer churn or propensity and need to deal with tens of millions of unique customers. […]

Continue Reading

A Personalized User-Based Ranking Model

Recommender systems help users browse the vast inventories found on modern ecommerce websites in a more efficient manner. A core component of many recommender systems is a ranker, which is a machine learning technique that sorts candidate items to show users the items they will like the most. Rankers are used to sort candidate items […]

Continue Reading

Deploy Private LLMs using Databricks Model Serving

We are excited to announce public preview of GPU and LLM optimization support for Databricks Model Serving! With this launch, you can deploy open-source or your own custom AI models of any type, including LLMs and Vision models, on the Lakehouse Platform. Databricks Model Serving automatically optimizes your model for LLM Serving, providing best-in-class performance […]

Continue Reading

Announcing the Public Preview of Lakeview Dashboards!

We are excited to announce the public preview of the next generation of Databricks SQL dashboards, dubbed Lakeview dashboards. Available today, this new dashboarding experience is optimized for ease of use, broad distribution, governance and security. Lakeview provides four major improvements compared to previous generation dashboards: Improved Visualizations: A new visualization engine delivers beautiful, interactive […]

Continue Reading

How Multimodal Embeddings Elevate eBay’s Product Recommendations

Introduction eBay is committed to providing a seamless and enjoyable buying experience for its customers. One area that we’re continuously looking to improve is the quality of our listings, particularly with regards to images and text. In the past, the presence of low-quality images could lead to inaccurate product representation and, in a worst-case scenario, […]

Continue Reading

Making Spark Accessible: My Databricks Summer Internship

My summer internship on the PySpark team was a whirlwind of exciting events. The PySpark team develops the Python APIs of the open source Apache Spark library and Databricks Runtime. Over the course of the 12 weeks, I drove a project to implement a new built-in PySpark test framework. I also contributed to an open […]

Continue Reading