Data Engineering, Engineering and Latency - Technology Performance Pulse

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Every image you hover over isnt just a visual placeholder; its a critical data point that fuels our sophisticated personalization engine. Collecting raw impression events Filtering & Enriching Raw Impressions Once the raw impression events are queued, a stateless Apache Flink job takes charge, meticulously processing this data.

Tuning

Tuning Latency Efficiency Storage

Choosing an OLAP Engine for Financial Risk Management: What To Consider?

DZone

AUGUST 16, 2023

From a data engineer's point of view, financial risk management is a series of data analysis activities on financial data. The financial sector imposes its unique requirements on data engineering. Before they adopted an OLAP engine, they were using Kettle to collect data.

FinTech

FinTech Engineering Data Engineering Latency

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

Dynomite is a Netflix open source wrapper around Redis that provides a few additional features like auto-sharding and cross-region replication, and it provided Pushy with low latency and easy record expiry, both of which are critical for Pushy’s workload. As Pushy’s portfolio grew, we experienced some pain points with Dynomite.

Latency

Latency Cache Tuning Efficiency

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

Some of the optimizations are prerequisites for a high-performance data warehouse. Sometimes Data Engineers write downstream ETLs on ingested data to optimize the data/metadata layouts to make other ETL processes cheaper and faster. Both automatic (event-driven) as well as manual (ad-hoc) optimization.

Storage

Storage Latency Efficiency Data Engineering

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. Please stop by our “Living Room” for an opportunity to connect or reconnect with Netflixers.

AWS

AWS Entertainment Open Source Benchmarking

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Edge computing has transformed how businesses and industries process and manage data. By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. As data streams grow in complexity, processing efficiency can decline.

IoT

IoT Energy Logistics Latency

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

It also improves the engineering productivity by simplifying the existing pipelines and unlocking the new patterns. Whether in analyzing A/B tests, optimizing studio production, training algorithms, investing in content acquisition, detecting security breaches, or optimizing payments, well structured and accurate data is foundational.

Processing

Processing Big Data Efficiency Engineering

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Without these integrations, projects would be stuck at the prototyping stage, or they would have to be maintained as outliers outside the systems maintained by our engineering teams, incurring unsustainable operational overhead. Importantly, all the use cases were engineered by practitioners themselves.

Systems

Systems Media Cache Open Source

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

Since memory management is not something one usually associates with classification problems, this blog focuses on formulating the problem as an ML problem and the data engineering that goes along with it. Some nuances while creating this dataset come from the on-field domain knowledge of our engineers.

Big Data

Big Data Cache Engineering Data Engineering

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

The rule-based classifier classifies job errors based on a set of predefined rules and provides insights for schedulers to decide whether to retry the job and for engineers to diagnose and remediate the job failure. Rule Execution Engine is responsible for matching the collected logs against a set of predefined rules.

Tuning

Tuning Efficiency Big Data Engineering

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. We’ve compiled our speaking events below so you know what we’ve been working on.

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. We’ve compiled our speaking events below so you know what we’ve been working on.

AWS

AWS Entertainment Open Source Benchmarking

Presentation: Azure Cosmos DB: Low Latency and High Availability at Planet Scale

InfoQ

JULY 14, 2023

Mei-Chin Tsai, Vinod discuss the internal architecture of Azure Cosmos DB and how it achieves high availability, low latency, and scalability. By Mei-Chin Tsai, Vinod Sridharan

Latency

Latency Azure Availability Scalability

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

In recent times, in order to gain valuable insights or to develop the data-driven products companies such as Netflix, Spotify, Uber, AirBnB have built internal data pipelines. If built correctly, data pipelines can offer strategic advantages to the business. It can be used to power new analytics, insight, and product features.

Latency

Latency Analytics Scalability Engineering

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

InfoQ

JULY 3, 2023

The new solution achieved over 99% hit rate, helped reduce tail latencies by more than 60% and costs by 10% annually. LinkedIn introduced Couchbase as a centralized caching tier for scaling member profile reads to handle increasing traffic that has outgrown their existing database cluster. By Rafal Gancarz

Cache

Cache Latency Traffic Database

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next.

Big Data

Big Data Storage Benchmarking Hardware

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A unified data management (UDM) system combines the best of data warehouses, data lakes, and streaming without expensive and error-prone ETL. It offers reliability and performance of a data warehouse, real-time and low-latency characteristics of a streaming system, and scale and cost-efficiency of a data lake.

Big Data

Big Data Artificial Intelligence Storage Hardware

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

InfoQ

APRIL 15, 2024

He specifically delved into Venice DB, the NoSQL data store used for feature persistence. At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. By Rafal Gancarz

Artificial Intelligence

Artificial Intelligence Big Data Data Engineering Latency

Part 3: A Survey of Analytics Engineering Work at Netflix

The Netflix TechBlog

JANUARY 6, 2025

This article is the last in a multi-part series sharing a breadth of Analytics Engineering work at Netflix, recently presented as part of our annual internal Analytics Engineering conference. As an analytics engineer accustomed to crafting quick solutions, I underestimated the effort required to deploy a production-grade analytics API.

Analytics

Analytics Engineering Cache Entertainment

Cloud Efficiency at Netflix

The Netflix TechBlog

DECEMBER 17, 2024

Our ecosystem enables engineering teams to run applications and services at scale, utilizing a mix of open-source and proprietary solutions. One crucial way in which we do this is through the democratization of highly curated data sources that sunshine usage and cost patterns across Netflixs services and teams.

Efficiency

Efficiency Cloud Analytics Infrastructure

Technology Performance Pulse

Introducing Impressions at Netflix

Choosing an OLAP Engine for Financial Risk Management: What To Consider?

Trending Sources

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Optimizing data warehouse storage

Netflix at AWS re:Invent 2019

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Incremental Processing using Netflix Maestro and Apache Iceberg

Supporting Diverse ML Systems at Netflix

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Presentation: Azure Cosmos DB: Low Latency and High Availability at Planet Scale

Friends don't let friends build data pipelines

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

Kubernetes for Big Data Workloads

5 data integration trends that will define the future of ETL in 2018

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

Part 3: A Survey of Analytics Engineering Work at Netflix

Cloud Efficiency at Netflix

Stay Connected