Data Engineering, Design and Latency - Technology Performance Pulse

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits. This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture.

Storage

Storage Latency Efficiency Data Engineering

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

To support this growth, we’ve revisited Pushy’s past assumptions and design decisions with an eye towards both Pushy’s future role and future stability. KeyValue is an abstraction over the storage engine itself, which allows us to choose the best storage engine that meets our SLO needs.

Latency

Latency Cache Tuning Efficiency

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Edge computing has transformed how businesses and industries process and manage data. By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. As data streams grow in complexity, processing efficiency can decline.

IoT

IoT Energy Logistics Latency

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN. We explore all the systems necessary to make and stream content from Netflix.

AWS

AWS Entertainment Open Source Benchmarking

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

Whether in analyzing A/B tests, optimizing studio production, training algorithms, investing in content acquisition, detecting security breaches, or optimizing payments, well structured and accurate data is foundational. Users configure the workflow to read the data in a window (e.g. data arrives too late to be useful).

Processing

Processing Big Data Efficiency Engineering

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems.

Systems

Systems Media Cache Open Source

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN. We explore all the systems necessary to make and stream content from Netflix.

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN. We explore all the systems necessary to make and stream content from Netflix.

AWS

AWS Entertainment Open Source Benchmarking

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

One downside of the iterative design of the optimizer is that any bottleneck can block completion and cause a timeout, which we initially observed in a non-trivial number of cases. Upon further profiling, we found that most of the latency came from the candidate generated step (i.e.,

Tuning

Tuning Efficiency Big Data Engineering

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

InfoQ

JULY 3, 2023

The new solution achieved over 99% hit rate, helped reduce tail latencies by more than 60% and costs by 10% annually. LinkedIn introduced Couchbase as a centralized caching tier for scaling member profile reads to handle increasing traffic that has outgrown their existing database cluster. By Rafal Gancarz

Cache

Cache Latency Traffic Database

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

Unfortunately, building data pipelines remains a daunting, time-consuming, and costly activity. Not everyone is operating at Netflix or Spotify scale data engineering function. Often companies underestimate the necessary effort and cost involved to build and maintain data pipelines.

Latency

Latency Analytics Scalability Engineering

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A unified data management (UDM) system combines the best of data warehouses, data lakes, and streaming without expensive and error-prone ETL. It offers reliability and performance of a data warehouse, real-time and low-latency characteristics of a streaming system, and scale and cost-efficiency of a data lake.

Big Data

Big Data Artificial Intelligence Storage Hardware

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

InfoQ

APRIL 15, 2024

He specifically delved into Venice DB, the NoSQL data store used for feature persistence. At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. By Rafal Gancarz

Artificial Intelligence

Artificial Intelligence Big Data Data Engineering Latency

Part 3: A Survey of Analytics Engineering Work at Netflix

The Netflix TechBlog

JANUARY 6, 2025

Dashboard DesignTips Rina Chang , SusieLu What is design, and why does it matter? Often people think design is about how things look, but design is actually about how things work. Everything is designed, because were all making choices about how things work, but not everything is designed well.

Analytics

Analytics Engineering Cache Entertainment

Technology Performance Pulse

Optimizing data warehouse storage

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Trending Sources

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Netflix at AWS re:Invent 2019

Incremental Processing using Netflix Maestro and Apache Iceberg

Supporting Diverse ML Systems at Netflix

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

Friends don't let friends build data pipelines

5 data integration trends that will define the future of ETL in 2018

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

Part 3: A Survey of Analytics Engineering Work at Netflix

Stay Connected