Memory Profiling in PySpark – The Databricks Blog

There are many factors in a PySpark program’s performance. PySpark supports various profiling tools to expose tight loops of your program and allow you to make performance improvement decisions, see more. However, memory, as one of the key factors of a program’s performance, had been missing in PySpark profiling. A PySpark program on the Spark […]

Continue Reading

Build Reliable and Cost Effective Streaming Data Pipelines With Delta Live Tables’ Enhanced Autoscaling

This year we announced the general availability of Delta Live Tables (DLT), the first ETL framework to use a simple, declarative approach to building reliable data pipelines. Since the launch, Databricks continues to expand DLT with new capabilities. Today we are excited to announce that Enhanced Autoscaling for Delta Live Tables (DLT) is now generally […]

Continue Reading

How facial recognition technology keeps you safe

Facial recognition technology is one of the many modern technologies that previously only appeared in science fiction movies. The roots of this technology can be traced back to the 1960s and have since grown dramatically due to the rise of deep learning techniques and accelerated digital transformation in recent years. In this blog post, we […]

Continue Reading

Graph Networks – 10X investigation with Graph Visualisations

Introduction Detecting fraud schemes used to require investigations using large amounts and varying types of data that come from many different anti-fraud systems. Investigators then need to combine the different types of data and use statistical methods to uncover suspicious claims, which is time consuming and inefficient in most cases. We are always looking for […]

Continue Reading

How we automated FAQ responses at Grab

Overview and initial analysis Knowledge management is often one of the biggest challenges most companies face internally. Teams spend several working hours trying to either inefficiently look for information or constantly asking colleagues about information already documented somewhere. A lot of time is spent on the internal employee communication channels (in our case, Slack) simply […]

Continue Reading

How we store and process millions of orders daily

In the real world, after a passenger places a GrabFood order from the Grab App, the merchant-partner will prepare the order. A driver-partner will then collect the food and deliver it to the passenger. Have you ever wondered what happens in the backend system? The Grab Order Platform is a distributed system that processes millions […]

Continue Reading

Automatic rule backtesting with large quantities of data

Introduction Analysts need to analyse and simulate a rule on historical data to check the performance and accuracy of the rule. Backtesting enables analysts to run simulations of the rules and manage the results from the rule engine UI. Backtesting helps analysts to: Define the desired impact of the rule for our business and users. Evaluate […]

Continue Reading

Using mobile sensor data to encourage safer driving

“Telematics”, a cross between the words telecommunications and informatics, was coined in the late 1970s to refer to the use of communication technologies in facilitating exchange of information. In the modern day, such technologies may include cloud platforms, mobile networks, and wireless transmissions (e.g., Bluetooth). Although the initial intention is for a more general scope, […]

Continue Reading