eBay VP Ishita Majumdar Featured in Diversity Woman Media’s Power 100 List

Diversity Woman Media, which seeks to advocate for diversity, equity and inclusion, operates publications, workshops and conferences to further the goal of empowering all women. These values are also part of eBay’s DNA, so we’re very proud to share that Ishita Majumdar, eBay’s VP of Data Analytics Platforms, has been recognized in this year’s Diversity […]

Continue Reading

An elegant platform

Coban is Grab’s real-time data streaming platform team. As a platform team, we thrive on providing our internal users from all verticals with self-served data-streaming resources, such as Kafka topics, Flink and Change Data Capture (CDC) pipelines, various kinds of Kafka-Connect connectors, as well as Apache Zeppelin notebooks, so that they can effortlessly leverage real-time data to build intelligent applications […]

Continue Reading

Creating a bespoke LLM for AI- generated documentation

We recently announced our AI-generated documentation feature, which uses large language models (LLMs) to automatically generate documentation for tables and columns in Unity Catalog. We have been humbled by the reception of this feature among our customers. Today, more than 80% of the table metadata updates on Databricks are AI-assisted. In this blog post, we […]

Continue Reading

How We Export Billion-Scale Graphs on Transactional Graph Databases

eBay’s GraphDatabase, NuGraph, benefits many eBay’s internal teams for real-time business decisions through relationship analysis. But as the graph dataset increases, it becomes more and more challenging to validate the graph data quality, check the relationship topology and understand the insight of the graph. For example, eBay’s internal biggest graph has more than 15 billion […]

Continue Reading

Python Dependency Management in Spark Connect

Managing the environment of an application in a distributed computing environment can be challenging. Ensuring that all nodes have the necessary environment to execute code and determining the actual location of the user’s code are complex tasks. Apache Spark™ offers various methods such as Conda, venv, and PEX; see also How to Manage Python Dependencies […]

Continue Reading

Graph modelling guidelines

Introduction Graph modelling is a highly effective technique for representing and analysing complex and interconnected data across various domains. By deciphering relationships between entities, graph modelling can reveal insights that might be otherwise difficult to identify using traditional data modelling approaches. In this article, we will explore what graph modelling is and guide you through […]

Continue Reading

Introducing Python User-Defined Table Functions (UDTFs)

Apache Spark™ 3.5 and Databricks Runtime 14.0 have brought an exciting feature to the table: Python user-defined table functions (UDTFs). In this blog post, we’ll dive into what UDTFs are, why they are powerful, and how you can use them. What are Python user-defined table functions (UDTFs) A Python user-defined table function (UDTF) is a […]

Continue Reading