Data Engineering, Engineering and Systems - Technology Performance Pulse

Our First Netflix Data Engineering Summit

The Netflix TechBlog

DECEMBER 14, 2023

Engineers from across the company came together to share best practices on everything from Data Processing Patterns to Building Reliable Data Pipelines. The result was a series of talks which we are now sharing with the rest of the Data Engineering community!

Data Engineering

Data Engineering Engineering Software Engineering Best Practices

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions. What is late-arriving data?

Data Engineering

Data Engineering Engineering Processing Games

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Chaos Data Engineering Manifesto: 5 Laws for Successful Failures

DZone

FEBRUARY 27, 2023

During every major election, the wave would crest and crash against our overwhelmed systems before receding, allowing us to assess the damage. It's midnight in the dim and cluttered office of The New York Times, currently serving as the "situation room." A powerful surge of traffic is inevitable.

Data Engineering

Data Engineering Engineering Traffic Systems

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems.

Systems

Systems Media Cache Open Source

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Every image you hover over isnt just a visual placeholder; its a critical data point that fuels our sophisticated personalization engine. It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure.

Tuning

Tuning Latency Efficiency Storage

Choosing an OLAP Engine for Financial Risk Management: What To Consider?

DZone

AUGUST 16, 2023

From a data engineer's point of view, financial risk management is a series of data analysis activities on financial data. The financial sector imposes its unique requirements on data engineering. Before they adopted an OLAP engine, they were using Kettle to collect data.

FinTech

FinTech Engineering Data Engineering Latency

Analytics at Netflix: Who we are and what we do

The Netflix TechBlog

SEPTEMBER 18, 2020

Analytics at Netflix: Who We Are and What We Do An Introduction to Analytics and Visualization Engineering at Netflix by Molly Jackman & Meghana Reddy Explained: Season 1 (Photo Credit: Netflix) Across nearly every industry, there is recognition that data analytics is key to driving informed business decision-making.

Analytics

Analytics Engineering Data Engineering Tuning

How TripleLift Built an Adtech Data Pipeline Processing Billions of Events Per Day

High Scalability

JUNE 15, 2020

This is a guest post by Eunice Do , Data Engineer at TripleLift , a technology company leading the next generation of programmatic advertising. What is the name of your system and where can we find out more about it? The system is the data pipeline at TripleLift. Why did you decide to build this system?

Processing

Processing Data Engineering Engineering Efficiency

Ready-to-go sample data pipelines with Dataflow

The Netflix TechBlog

DECEMBER 3, 2022

The dataflow migration command is a special feature, developed single handedly by Stephen Huenneke , to fully automate the communication and tracking of a data warehouse table changes. Running code against a production database can be slow, especially with the overhead required for distributed data processing systems like Apache Spark.

Best Practices

Best Practices Code Testing Data Engineering

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Now, imagine yourself in the role of a software engineer responsible for a micro-service which publishes data consumed by few critical customer facing services (e.g. You are about to make structural changes to the data and want to know who and what downstream to your service will be impacted.

Infrastructure

Infrastructure Big Data Transportation Architecture

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

Netflix’s engineering culture is predicated on Freedom & Responsibility, the idea that everyone (and every team) at Netflix is entrusted with a core responsibility and they are free to operate with freedom to satisfy their mission.

Infrastructure

Infrastructure Cloud Scalability AWS

Data pipeline asset management with Dataflow

The Netflix TechBlog

FEBRUARY 9, 2022

see “data pipeline” Intro The problem of managing scheduled workflows and their assets is as old as the use of cron daemon in early Unix operating systems. The design of a cron job is simple, you take some system command, you pick the schedule to run it on and you are done. Manually constructed continuous delivery system.

Storage

Storage Data Engineering Testing Code

Article: Design Pattern Proposal for Autoscaling Stateful Systems

InfoQ

JANUARY 25, 2023

In this article, Rogerio Robetti discusses the challenges in auto-scaling stateful storage systems and proposes an opinionated design solution to automatically scale up (vertical) and scale out (horizontal) from a single node up to several nodes in a cluster with minimum configuration and interference of the operator. By Rogerio Robetti

Design

Design Systems Storage Data Engineering

What is IT automation?

Dynatrace

JULY 6, 2022

As organizations continue to adopt multicloud strategies, the complexity of these environments grows, increasing the need to automate cloud engineering operations to ensure organizations can enforce their policies and architecture principles. This requires significant data engineering efforts, as well as work to build machine-learning models.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

Since memory management is not something one usually associates with classification problems, this blog focuses on formulating the problem as an ML problem and the data engineering that goes along with it. Some nuances while creating this dataset come from the on-field domain knowledge of our engineers. Labeling the data?

Big Data

Big Data Cache Engineering Data Engineering

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. We’ve compiled our speaking events below so you know what we’ve been working on.

AWS

AWS Entertainment Open Source Benchmarking

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

Due to its popularity, the number of workflows managed by the system has grown exponentially. The scheduler on-call has to closely monitor the system during non-business hours. As the usage increased, we had to vertically scale the system to keep up and were approaching AWS instance type limits.

Java

Java Scalability Traffic Architecture

5 key areas for tech leaders to watch in 2020

O'Reilly

FEBRUARY 18, 2020

This year’s growth in Python usage was buoyed by its increasing popularity among data scientists and machine learning (ML) and artificial intelligence (AI) engineers. there’s a Python library for virtually anything a developer or data scientist might need to do. In aggregate, data engineering usage declined 8% in 2019.

Software Architecture

Software Architecture DevOps Data Engineering Architecture

SIEM Volume Spike Alerts Using ML

DZone

JANUARY 31, 2024

SIEM systems enable early detection of security threats and suspicious activities by analyzing vast amounts of log data in real time. Correlation Engine: SIEM systems analyze and correlate the collected data to identify patterns, anomalies, and potential security incidents.

Data Engineering

Data Engineering Storage Network Infrastructure

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

Use cases We found several use cases where a system like AutoOptimize can bring tons of value. Some of the optimizations are prerequisites for a high-performance data warehouse. Sometimes Data Engineers write downstream ETLs on ingested data to optimize the data/metadata layouts to make other ETL processes cheaper and faster.

Storage

Storage Latency Efficiency Data Engineering

3. Psyberg: Automated end to end catch up

The Netflix TechBlog

NOVEMBER 14, 2023

Data Load Type : The ETL can either load the missed/new data specifically or reload the entire specified range. This helps overwrite data only when required and minimizes unnecessary reprocessing. As seen above, by chaining these Psyberg workflows, we could automate the catchup for late-arriving data from hours 2 and 6.

Processing

Processing Tuning C++ Efficiency

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Data Security With edge devices dispersed across various locations, securing data from creation to consumption has become a critical challenge. Unlike centralized systems, where data resides in a single, well-protected environment, edge computing increases the attack surface, making systems vulnerable to breaches.

IoT

IoT Energy Logistics Latency

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Introduction At Netflix, hundreds of thousands of workflows and millions of jobs are running per day across multiple layers of the big data platform. Rule Execution Engine is responsible for matching the collected logs against a set of predefined rules. the scheduler, job orchestrator, and compute clusters). Unclassified errors.

Tuning

Tuning Efficiency Big Data Engineering

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

Sample system diagram for an Alexa voice command. The other main use case was RENO, the Rapid Event Notification System mentioned above. Rewriting always comes with a risk, and it’s never the first solution we reach for, particularly when working with a system that’s in place and working well.

Latency

Latency Cache Tuning Efficiency

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

NOVEMBER 12, 2019

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Apply here. Stateful JavaScript Apps. Generous free tier.

Education

Education Java Software Engineering Engineering

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

OCTOBER 29, 2019

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Apply here. Stateful JavaScript Apps. Generous free tier.

Education

Education Java Software Engineering Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer! Who's Hiring?

Education

Education Software Engineering Scalability Engineering

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 3, 2020

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Apply here. Advertise your job here! Cool Products and Services.

Education

Education Engineering Games Java

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 18, 2020

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Apply here. Advertise your job here! Cool Products and Services.

Education

Education Engineering Games Java

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 30, 2020

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer! Who's Hiring?

Education

Education Software Engineering Engineering Big Data

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer! Who's Hiring?

Education

Education Software Engineering Scalability Engineering

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

JANUARY 7, 2020

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Apply here. Advertise your job here! Cool Products and Services.

Education

Education Java Software Engineering Engineering

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

DECEMBER 12, 2019

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Apply here. Advertise your job here! Cool Products and Services.

Education

Education Java Software Engineering Engineering

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 24, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Engineering Big Data

Organise your engineering teams around the work by reteaming

Abhishek Tiwari

JULY 20, 2019

When it comes to organising engineering teams, a popular view has been to organise your teams based on either Spotify's agile model (i.e. One thing stand-out to me is being intentional and practical about your engineering organisation design. squads, chapters, tribes, and guilds) or simply follow Amazon's two-pizza team model.

Engineering

Engineering Retail Airlines Healthcare

Microservices Adoption in 2020

O'Reilly

JULY 15, 2020

Almost one-third (29%) of respondents say their employers are migrating or implementing a majority of their systems (over 50%) using microservices. Software engineers comprise the survey audience’s single largest cluster, over one quarter (27%) of respondents (Figure 1). Adopters are betting big on microservices. What does that mean?

Database

Database Architecture Education Systems

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 9, 2020

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Apply here. Advertise your job here! Cool Products and Services.

Education

Education Engineering Games Java

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 17, 2020

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Apply here. Advertise your job here! Cool Products and Services.

Education

Education Engineering Java Servers

Our First Netflix Data Engineering Summit

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Trending Sources

A Recap of the Data Engineering Open Forum at Netflix

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Chaos Data Engineering Manifesto: 5 Laws for Successful Failures

Supporting Diverse ML Systems at Netflix

Introducing Impressions at Netflix

Choosing an OLAP Engine for Financial Risk Management: What To Consider?

Analytics at Netflix: Who we are and what we do

How TripleLift Built an Adtech Data Pipeline Processing Billions of Events Per Day

Ready-to-go sample data pipelines with Dataflow

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Data pipeline asset management with Dataflow

Article: Design Pattern Proposal for Autoscaling Stateful Systems

What is IT automation?

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

Netflix at AWS re:Invent 2019

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

5 key areas for tech leaders to watch in 2020

SIEM Volume Spike Alerts Using ML

Optimizing data warehouse storage

3. Psyberg: Automated end to end catch up

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Organise your engineering teams around the work by reteaming

Microservices Adoption in 2020

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Stay Connected