Data Engineering, Latency and Processing - Technology Performance Pulse

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. This nuanced integration of data and technology empowers us to offer bespoke content recommendations. This queue ensures we are consistently capturing raw events from our global userbase.

Tuning

Tuning Latency Efficiency Storage

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

by Jun He , Yingyi Zhang , and Pawan Dixit Incremental processing is an approach to process new or changed data in workflows. The key advantage is that it only incrementally processes data that are newly added or updated to a dataset, instead of re-processing the complete dataset.

Processing

Processing Big Data Efficiency Engineering

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

The voice service then constructs a message for the device and places it on the message queue, which is then processed and sent to Pushy to deliver to the device. The previous version of the message processor was a Mantis stream-processing job that processed messages from the message queue.

Latency

Latency Cache Tuning Efficiency

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

There are several benefits of such optimizations like saving on storage, faster query time, cheaper downstream processing, and an increase in developer productivity by removing additional ETLs written only for query performance improvement. Some of the optimizations are prerequisites for a high-performance data warehouse.

Storage

Storage Latency Efficiency Data Engineering

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Edge computing has transformed how businesses and industries process and manage data. By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Leverage tiered storage systems that dynamically offload data based on priority.

IoT

IoT Energy Logistics Latency

Choosing an OLAP Engine for Financial Risk Management: What To Consider?

DZone

AUGUST 16, 2023

From a data engineer's point of view, financial risk management is a series of data analysis activities on financial data. The financial sector imposes its unique requirements on data engineering. Before they adopted an OLAP engine, they were using Kettle to collect data.

FinTech

FinTech Engineering Data Engineering Latency

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Data: Fast Data Our main data lake is hosted on S3, organized as Apache Iceberg tables. For ETL and other heavy lifting of data, we mainly rely on Apache Spark. In addition to Spark, we want to support last-mile data processing in Python, addressing use cases such as feature transformations, batch inference, and training.

Systems

Systems Media Cache Open Source

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

This entertaining romp through the tech stack serves as an introduction to how we think about and design systems, the Netflix approach to operational challenges, and how other organizations can apply our thought processes and technologies. We explore all the systems necessary to make and stream content from Netflix.

AWS

AWS Entertainment Open Source Benchmarking

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

Since memory management is not something one usually associates with classification problems, this blog focuses on formulating the problem as an ML problem and the data engineering that goes along with it. We now explore each of these components individually, while highlighting the nuances of the data pipeline and pre-processing.

Big Data

Big Data Cache Engineering Data Engineering

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

In this way, no human intervention is required in the remediation process. We used feature hashing to process the non-numeric values because they come from a high cardinality and dynamic set of values. Upon further profiling, we found that most of the latency came from the candidate generated step (i.e., user name).

Tuning

Tuning Efficiency Big Data Engineering

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

This entertaining romp through the tech stack serves as an introduction to how we think about and design systems, the Netflix approach to operational challenges, and how other organizations can apply our thought processes and technologies. We explore all the systems necessary to make and stream content from Netflix.

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

This entertaining romp through the tech stack serves as an introduction to how we think about and design systems, the Netflix approach to operational challenges, and how other organizations can apply our thought processes and technologies. We explore all the systems necessary to make and stream content from Netflix.

AWS

AWS Entertainment Open Source Benchmarking

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

In recent times, in order to gain valuable insights or to develop the data-driven products companies such as Netflix, Spotify, Uber, AirBnB have built internal data pipelines. If built correctly, data pipelines can offer strategic advantages to the business. Depending on frameworks, data processing units (a.k.a

Latency

Latency Analytics Scalability Engineering

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Custom schedulers.

Big Data

Big Data Storage Benchmarking Hardware

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

In 2018, we anticipate that ETL will either lose relevance or the ETL process will disintegrate and be consumed by new data architectures. Unified data management architecture. A unified data management (UDM) system combines the best of data warehouses, data lakes, and streaming without expensive and error-prone ETL.

Big Data

Big Data Artificial Intelligence Storage Hardware

Part 3: A Survey of Analytics Engineering Work at Netflix

The Netflix TechBlog

JANUARY 6, 2025

Data is a tool that enhances decision-making, complementing the deep expertise and industry knowledge of our creative professionals. Although there was already a process for creating and comparing budgets for new productions against similar past projects, it was highly manual.

Analytics

Analytics Engineering Cache Entertainment

Cloud Efficiency at Netflix

The Netflix TechBlog

DECEMBER 17, 2024

This diverse technological landscape generates extensive and rich data from various infrastructure entities, from which, data engineers and analysts collaborate to provide actionable insights to the engineering organization in a continuous feedback loop that ultimately enhances the business.

Efficiency

Efficiency Cloud Analytics Infrastructure

Technology Performance Pulse

Introducing Impressions at Netflix

Incremental Processing using Netflix Maestro and Apache Iceberg

Trending Sources

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Optimizing data warehouse storage

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Choosing an OLAP Engine for Financial Risk Management: What To Consider?

Supporting Diverse ML Systems at Netflix

Netflix at AWS re:Invent 2019

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Friends don't let friends build data pipelines

Kubernetes for Big Data Workloads

5 data integration trends that will define the future of ETL in 2018

Part 3: A Survey of Analytics Engineering Work at Netflix

Cloud Efficiency at Netflix

Stay Connected