A Pattern for the Lightweight Deployment of Distributed XGBoost and LightGBM Models

[ad_1] A common challenge data scientists encounter when developing machine learning solutions is training a model on a dataset that is too large to fit into a server’s memory. We encounter this when we wish to train a model to predict customer churn or propensity and need to deal with tens of millions of unique […]

Continue Reading

A Personalized User-Based Ranking Model

[ad_1] Recommender systems help users browse the vast inventories found on modern ecommerce websites in a more efficient manner. A core component of many recommender systems is a ranker, which is a machine learning technique that sorts candidate items to show users the items they will like the most. Rankers are used to sort candidate […]

Continue Reading

Deploy Private LLMs using Databricks Model Serving

[ad_1] We are excited to announce public preview of GPU and LLM optimization support for Databricks Model Serving! With this launch, you can deploy open-source or your own custom AI models of any type, including LLMs and Vision models, on the Lakehouse Platform. Databricks Model Serving automatically optimizes your model for LLM Serving, providing best-in-class […]

Continue Reading

Announcing the Public Preview of Lakeview Dashboards!

[ad_1] We are excited to announce the public preview of the next generation of Databricks SQL dashboards, dubbed Lakeview dashboards. Available today, this new dashboarding experience is optimized for ease of use, broad distribution, governance and security. Lakeview provides four major improvements compared to previous generation dashboards: Improved Visualizations: A new visualization engine delivers beautiful, […]

Continue Reading

How Multimodal Embeddings Elevate eBay’s Product Recommendations

[ad_1] Introduction eBay is committed to providing a seamless and enjoyable buying experience for its customers. One area that we’re continuously looking to improve is the quality of our listings, particularly with regards to images and text. In the past, the presence of low-quality images could lead to inaccurate product representation and, in a worst-case […]

Continue Reading

Making Spark Accessible: My Databricks Summer Internship

[ad_1] My summer internship on the PySpark team was a whirlwind of exciting events. The PySpark team develops the Python APIs of the open source Apache Spark library and Databricks Runtime. Over the course of the 12 weeks, I drove a project to implement a new built-in PySpark test framework. I also contributed to an […]

Continue Reading

Stepping up marketing for advertisers: Scalable lookalike audience

[ad_1] The advertising industry is constantly evolving, driven by advancements in technology and changes in consumer behaviour. One of the key challenges in this industry is reaching the right audience, reaching people who are most likely to be interested in your product or service. This is where the concept of a lookalike audience comes into […]

Continue Reading

Apache Spark 3 Apache DataSketches: New Sketch-Based Approximate Distinct Counting

[ad_1] Introduction In this blog post, we’ll explore a set of advanced SQL functions available within Apache Spark that leverage the HyperLogLog algorithm, enabling you to count unique values, merge sketches, and estimate distinct counts with precision and efficiency. These implementations use the Apache Datasketches library for consistency with the open source community and easy […]

Continue Reading

Introducing the Support of Lateral Column Alias

[ad_1] We are thrilled to introduce the support of a new SQL feature in Apache Spark and Databricks: Lateral Column Alias (LCA). This feature simplifies complex SQL queries by allowing users to reuse an expression specified earlier in the same SELECT list, eliminating the need to use nested subqueries and Common Table Expressions (CTEs) in […]

Continue Reading

Introducing Apache Spark™ 3.5 | Databricks Blog

[ad_1] Today, we are happy to announce the availability of Apache Spark™ 3.5 on Databricks as part of Databricks Runtime 14.0. We extend our sincere appreciation to the Apache Spark community for their invaluable contributions to the Spark 3.5 release. Aligned with our mission to make Spark more accessible, versatile, and efficient than ever before, […]

Continue Reading