Improve your RAG application response quality with real-time structured data

Retrieval Augmented Generation (RAG) is an efficient mechanism to provide relevant data as context in Gen AI applications. Most RAG applications typically use vector indexes to search for relevant context from unstructured data such as documentation, wikis, and support tickets. Yesterday, we announced Databricks Vector Search Public Preview that helps with exactly that. However, Gen […]

Continue Reading

Introducing Databricks Vector Search Public Preview

Following the announcement we made yesterday around Retrieval Augmented Generation (RAG), today, we’re excited to announce the public preview of Databricks Vector Search. We announced the private preview to a limited set of customers at the Data + AI Summit in June, which is now available to all our customers. Databricks Vector Search enables developers […]

Continue Reading

Building High Quality RAG Applications with Databricks

Retrieval-Augmented-Generation (RAG) has quickly emerged as a powerful way to incorporate proprietary, real-time data into Large Language Model (LLM) applications. Today we are excited to launch a suite of RAG tools to help Databricks users build high-quality, production LLM apps using their enterprise data.  LLMs offered a major breakthrough in the ability to rapidly prototype […]

Continue Reading

eBay VP Ishita Majumdar Featured in Diversity Woman Media’s Power 100 List

Diversity Woman Media, which seeks to advocate for diversity, equity and inclusion, operates publications, workshops and conferences to further the goal of empowering all women. These values are also part of eBay’s DNA, so we’re very proud to share that Ishita Majumdar, eBay’s VP of Data Analytics Platforms, has been recognized in this year’s Diversity […]

Continue Reading

An elegant platform

Coban is Grab’s real-time data streaming platform team. As a platform team, we thrive on providing our internal users from all verticals with self-served data-streaming resources, such as Kafka topics, Flink and Change Data Capture (CDC) pipelines, various kinds of Kafka-Connect connectors, as well as Apache Zeppelin notebooks, so that they can effortlessly leverage real-time data to build intelligent applications […]

Continue Reading

Creating a bespoke LLM for AI- generated documentation

We recently announced our AI-generated documentation feature, which uses large language models (LLMs) to automatically generate documentation for tables and columns in Unity Catalog. We have been humbled by the reception of this feature among our customers. Today, more than 80% of the table metadata updates on Databricks are AI-assisted. In this blog post, we […]

Continue Reading

How We Export Billion-Scale Graphs on Transactional Graph Databases

eBay’s GraphDatabase, NuGraph, benefits many eBay’s internal teams for real-time business decisions through relationship analysis. But as the graph dataset increases, it becomes more and more challenging to validate the graph data quality, check the relationship topology and understand the insight of the graph. For example, eBay’s internal biggest graph has more than 15 billion […]

Continue Reading

Python Dependency Management in Spark Connect

Managing the environment of an application in a distributed computing environment can be challenging. Ensuring that all nodes have the necessary environment to execute code and determining the actual location of the user’s code are complex tasks. Apache Spark™ offers various methods such as Conda, venv, and PEX; see also How to Manage Python Dependencies […]

Continue Reading