How Collective Health uses Delta Live Tables and Structured Streaming for Data Integration

Collective Health is not an insurance company. We’re a technology company that’s fundamentally making health insurance work better for everyone— starting with the 155M+ Americans covered by their employer. We’ve created a powerful, flexible infrastructure to simplify employer-led healthcare, and started a movement that prioritizes the human experience within health benefits. We’ve built smarter technology […]

Continue Reading

Synthetic Data for Better Machine Learning

You’ve likely tried the buzziest advances in generative AI in the past year, tools like ChatGPT and DALL-E. They consume complex data and generate more data in ways that feel startlingly like something intelligent. These and other new ideas (diffusion models, generative adversarial networks or GANs) are entertaining, even frightening to play with. However, the […]

Continue Reading

How eBay Modernized the Most Important Page on Our Platform

Background eBay’s View Item page lives at the center of our e-commerce platform. Our customers load this page over 250 million times each day, and stringent budgets on site speed and availability guarantee the quality of their experience. And yet, this page had its last intentional rewrite ten years ago. A decade of rapid iteration […]

Continue Reading

Pandas-Profiling Now Supports Apache Spark

Data profiling is the process of collecting statistics and summaries of data to assess its quality and other characteristics. It is an essential step in both data discovery and the data science lifecycle because it helps us ensure quality data flows from which we can derive trustworthy and actionable insights. Profiling involves analyzing data across […]

Continue Reading

Saving Mothers with ML: How MLOps Improves Healthcare in High-Risk Obstetrics

In the United States, roughly 7 out of every 1000 mothers suffer from both pregnancy and delivery complications each year1. Of those mothers with pregnancy complications, 700 die but 60% of those deaths are preventable with the right medical attention, according to the CDC. Even among the 3.7 million successful births, 19% have either low […]

Continue Reading

How eBay’s New Search Feature Was Inspired By Window Shopping

We live in a world of discovery where visual appetite reigns supreme. Window shopping, infinite scroll lists, and micro engagements using simple visual cues are the norm. Search engines traditionally interpret a textual query input and match items and/or documents ranked by their relevance to the input query. The relevance of the retrieved results is […]

Continue Reading

Evolution of quality at Grab

To achieve our vision of becoming the leading superapp in Southeast Asia, we constantly need to balance development velocity with maintaining the high quality of the Grab app. Like most tech companies, we started out with the traditional software development lifecycle (SDLC) but as our app evolved, we soon noticed several challenges like high feature bugs and […]

Continue Reading

Determine the best technology stack for your web-based projects

In the current technology landscape, startups are developing rapidly. This usually leads to an increase in the number of engineers in teams, with the goal of increasing the speed of product development and delivery frequency. However, this growth often leads to a diverse selection of technology stacks being used by different teams within the same […]

Continue Reading

Fine-Tuning Large Language Models with Hugging Face and DeepSpeed

Large language models (LLMs) are currently in the spotlight following the sensational release of ChatGPT. Many are wondering how to take advantage of models like this in their own applications. However, this is merely one of several advances in transformer-based models, many others of which are open and readily available for tasks like translation, classification, […]

Continue Reading

Building the Lakehouse for Healthcare and Life Sciences – Processing DICOM images at scale with ease

One of the biggest challenges in understanding patient health status and disease progression is unlocking insights from the vast amounts of semi-structured and unstructured data types in healthcare. DICOM, which stands for Digital Imaging and Communications in Medicine, is the standard for the communication and management of medical imaging information. Medical images, encompassing modalities like […]

Continue Reading