How eBay Created a Language Model With Three Billion Item Titles

Introduction When shoppers come to eBay, our goal is to help them easily find a product they’ll love. Our newly launched recommendation model, commonly referred to as a “ranker,” now provides more relevant product recommendations by leveraging deep learning Natural Language Processing (NLP) techniques to encode item titles as semantic embeddings via a Bidirectional Encoder […]

Continue Reading

Building Geospatial Data Products – The Databricks Blog

Geospatial data has been driving innovation for centuries, through use of maps, cartography and more recently through digital content. For example, the oldest map has been found etched in a piece of mammoth tusk and dates approximately 25000 BC. This makes geospatial data one of the oldest data sources used by society to make decisions. […]

Continue Reading

Accelerating SIEM Migrations With the SPL to PySpark Transpiler

In this blog post, we introduce transpiler, a Databricks Labs open-source project that automates the translation of Splunk Search Processing Language (SPL) queries into scalable PySpark dataframe operations. This tool was developed in partnership with a large financial services customer to accelerate the migration of cybersecurity workloads into Databricks. SPL is a query language used […]

Continue Reading

How eBay’s Notification Platform Used Fault Injection in New Ways

Background It might sound paradoxical to deliberately break something we’re trying to fix, but sometimes, that’s the most efficient method to do it. Fault injection is the process by which we deliberately introduce faults into the system. We can observe the system behavior with the injected faults to identify the weakness of the system. Within […]

Continue Reading

Spatial Analytics at Any Scale With H3 and Photon

H3’s global grid indexing system is driving new patterns for spatial analytics across a variety of geospatial use-cases. Recently, Databricks added built-in support for H3 expressions to give customers the most performant H3 API available, powered by Photon and ready to tackle these use-cases at any scale. In this blog, you will learn that when […]

Continue Reading

Kubernetes Node Upgrade Using Operators

At Databricks, we run our compute infrastructure on AWS, Azure, and GCP. We orchestrate containerized services using Kubernetes clusters. We develop and manage our own OS images that bootstrap cloud VMs into Kubernetes nodes. These OS images include critical components for Kubernetes, such as the kubelet, container runtime, and kube-proxy, etc. They also contain OS-level […]

Continue Reading

Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Retries

Editor’s note: In December 2021, we made the source code available to the community at github.com/ebay/block-aggregator under Apache License 2.0. In a real-time data injection pipeline for analytical processing, efficient and fast data loading to a columnar database — such as ClickHouse[1] — favors large blocks over individual rows. Therefore, applications often rely on some buffering mechanism, […]

Continue Reading

eBay’s Enhanced Advertising Dashboard

At eBay, we’re constantly improving our advertising tools for sellers. Promoted Listings (PL) creates powerful opportunities for sellers to put their items in front of more buyers, increasing visibility of their inventory across the global eBay marketplace. The latest enhancement in eBay’s advertising portfolio is the improved Advertising dashboard, which combines a technological upgrade with […]

Continue Reading

Simplifying Shipping Signals on eBay

Shipping speed and cost are key factors influencing buyers’ purchasing decisions. A transparent and easily understandable shipping signal is a strong conversion driver that makes shopping easier for buyers, improves sales velocity for sellers and builds trust in the marketplace.  Over the past decade, eBay’s Fast ’N Free shipping indicator has been worn as a […]

Continue Reading