Multiple Stateful Operators in Structured Streaming

In the world of data engineering, there are operations that have been used since the birth of ETL. You filter. You join. You aggregate. Finally, you write the result. While these data operations have remained the same over time, the range of latency and throughput requirements has changed dramatically. Processing a few events at a […]

Continue Reading

Smooth Sailing Ahead | Databricks Blog

The Databricks Container Infra team builds cloud-agnostic infrastructure and tooling for building, storing and distributing container images. Recently, the team worked on scaling Harbor, an open-source container registry. Request loads on Harbor are read-heavy and bursty and it is a critical component of Databricks’ serverless product – anytime new serverless VMs are provisioned, Harbor gets […]

Continue Reading

Unsupervised graph anomaly detection – Catching new fraudulent behaviours

Earlier in this series, we covered the importance of graph networks, graph concepts, graph visualisation, and graph-based fraud detection methods. In this article, we will discuss how to automatically detect new types of fraudulent behaviour and swiftly take action on them. One of the challenges in fraud detection is that fraudsters are incentivised to always […]

Continue Reading

Announcing the MLflow AI Gateway

Large Language Models (LLMs) unlock a wide spectrum of potential use cases to deliver business value, from analyzing the sentiment of text data stored in a SQL warehouse to deploying real-time chat bots that answer nuanced questions about your products. However, democratizing access to powerful SaaS and open source LLMs for these applications comes with […]

Continue Reading

eBay Execs Talk Generative AI and Computer Vision at VentureBeat Transform Conference

Chief AI Officer Nitzan Mekel-Bobrov and Vice President of Seller Experience Xiaodi Zhang appeared at the VentureBeat Transform conference on Tuesday, July 11th, to discuss generative AI, how eBay has been building AI infrastructure for many years and ways that recent evolutions in the technology can help sellers, buyers and employees. Xiaodi also spoke as […]

Continue Reading

Celebrating the 15th Anniversary of eBay’s Mobile App

On July 10, 2008, Apple opened the doors on a totally new kind of store. In one announcement, Apple enabled an entire ecosystem of apps, allowing for third-party developers to create applications specifically for iOS. As eBay was a launch partner for the historic launch of the App Store, it’s also the fifteenth anniversary of […]

Continue Reading

Zero traffic cost for Kafka consumers

Introduction Coban, Grab’s real-time data streaming platform team, has been building an ecosystem around Kafka, serving all Grab verticals. Along with stability and performance, one of our priorities is also cost efficiency. In this article, we explain how the Coban team has substantially reduced Grab’s annual cost for data streaming by enabling Kafka consumers to […]

Continue Reading

eBay Chief Technology Officer Mazen Rawashdeh Talks AI, Embracing Tech Disruption on Bloomberg Podcast

This week, Bloomberg Intelligence’s “Tech Disruptors” podcast featured an interview with eBay Chief Technology Officer Mazen Rawashdeh. Mazen discussed the importance and prevalence of AI to eBay’s future, saying, “AI, in my opinion, is not a product; it’s an ecosystem. It touches every corner of our company.” Reflecting that ubiquity, Mazen noted eBay’s unique position […]

Continue Reading

A guide to Databricks SQL and Data Warehousing talks at Data + AI Summit 2023

It’s been only 18 months since we announced Databricks SQL general availability – the serverless data warehouse on the Lakehouse – and we are thrilled and humbled by the adoption and impact it has gained in the community. With thousands of customers worldwide, and already over $100 million in recurring revenue, Databricks SQL is one […]

Continue Reading

Introducing Materialized Views and Streaming Tables for Databricks SQL

We are thrilled to announce that materialized views and streaming tables are now publicly available in Databricks SQL on AWS and Azure. Streaming tables provide incremental ingest from cloud storage and message queues. Materialized views are automatically and incrementally updated as new data arrives. Together, these two capabilities enable infrastructure-free data pipelines that are simple […]

Continue Reading