Data Engineering, Efficiency and Processing - Technology Performance Pulse

Our First Netflix Data Engineering Summit

The Netflix TechBlog

DECEMBER 14, 2023

Engineers from across the company came together to share best practices on everything from Data Processing Patterns to Building Reliable Data Pipelines. The result was a series of talks which we are now sharing with the rest of the Data Engineering community! In this video, Sr.

Data Engineering

Data Engineering Engineering Software Engineering Best Practices

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions. What is late-arriving data? Let’s dive in!

Data Engineering

Data Engineering Engineering Processing Games

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Automated Testing in Data Engineering: An Imperative for Quality and Efficiency

DZone

JANUARY 9, 2024

In the data-driven landscape of today, automation has become indispensable across industries, not just to maximize efficiency but, more importantly, to ensure quality. This holds true for the critical field of data engineering as well.

Data Engineering

Data Engineering Efficiency Engineering Testing

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. This nuanced integration of data and technology empowers us to offer bespoke content recommendations. This queue ensures we are consistently capturing raw events from our global userbase.

Tuning

Tuning Latency Efficiency Storage

2. Diving Deeper into Psyberg: Stateless vs Stateful Data Processing

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty In the inaugural blog post of this series, we introduced you to the state of our pipelines before Psyberg and the challenges with incremental processing that led us to create the Psyberg framework within Netflix’s Membership and Finance data engineering team.

Processing

Processing Data Engineering Efficiency Analytics

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

by Jun He , Yingyi Zhang , and Pawan Dixit Incremental processing is an approach to process new or changed data in workflows. The key advantage is that it only incrementally processes data that are newly added or updated to a dataset, instead of re-processing the complete dataset.

Processing

Processing Big Data Efficiency Engineering

How TripleLift Built an Adtech Data Pipeline Processing Billions of Events Per Day

High Scalability

JUNE 15, 2020

This is a guest post by Eunice Do , Data Engineer at TripleLift , a technology company leading the next generation of programmatic advertising. The system is the data pipeline at TripleLift. TripleLift is an adtech company, and like most companies in this industry, we deal with high volumes of data on a daily basis.

Processing

Processing Data Engineering Engineering Efficiency

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

What is IT automation?

Dynatrace

JULY 6, 2022

At its most basic, automating IT processes works by executing scripts or procedures either on a schedule or in response to particular events, such as checking a file into a code repository. Adding AIOps to automation processes makes the volume of data that applications and multicloud environments generate much less overwhelming.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

3. Psyberg: Automated end to end catch up

The Netflix TechBlog

NOVEMBER 14, 2023

In the previous installments of this series, we introduced Psyberg and delved into its core operational modes: Stateless and Stateful Data Processing. Pipelines After Psyberg Let’s explore how different modes of Psyberg could help with a multistep data pipeline. In this case, the minimum hour to process the data is hour 2.

Processing

Processing Tuning C++ Efficiency

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

There are several benefits of such optimizations like saving on storage, faster query time, cheaper downstream processing, and an increase in developer productivity by removing additional ETLs written only for query performance improvement. Some of the optimizations are prerequisites for a high-performance data warehouse.

Storage

Storage Latency Efficiency Data Engineering

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

Infrastructure

Infrastructure Big Data Transportation Architecture

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

As a micro-service owner, a Netflix engineer is responsible for its innovation as well as its operation, which includes making sure the service is reliable, secure, efficient and performant. In the Efficiency space, our data teams focus on transparency and optimization.

Infrastructure

Infrastructure Cloud Scalability AWS

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

DZone

DECEMBER 27, 2023

Welcome to the first post in our exciting series on mastering offline data pipeline's best practices, focusing on the potent combination of Apache Airflow and data processing engines like Hive and Spark. Working together, they form the backbone of many modern data engineering solutions.

Best Practices

Best Practices Data Engineering Big Data Games

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Edge computing has transformed how businesses and industries process and manage data. By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Leverage tiered storage systems that dynamically offload data based on priority.

IoT

IoT Energy Logistics Latency

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. In this way, no human intervention is required in the remediation process. Multi-objective optimizations. user name).

Tuning

Tuning Efficiency Big Data Engineering

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

The voice service then constructs a message for the device and places it on the message queue, which is then processed and sent to Pushy to deliver to the device. With these clear benefits, we continued to build out this functionality for more devices, enabling the same efficiency wins. It served Pushy’s needs well for many years.

Latency

Latency Cache Tuning Efficiency

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Data: Fast Data Our main data lake is hosted on S3, organized as Apache Iceberg tables. For ETL and other heavy lifting of data, we mainly rely on Apache Spark. In addition to Spark, we want to support last-mile data processing in Python, addressing use cases such as feature transformations, batch inference, and training.

Systems

Systems Media Cache Open Source

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

This entertaining romp through the tech stack serves as an introduction to how we think about and design systems, the Netflix approach to operational challenges, and how other organizations can apply our thought processes and technologies. In 2019, Netflix moved thousands of container hosts to bare metal.

AWS

AWS Entertainment Open Source Benchmarking

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

In the data domain, it is common to have a super large number of jobs within a single workflow. For example, a workflow to backfill hourly data for the past five years can lead to 43800 jobs (24 * 365 * 5), each of which processes data for an hour. But sometimes, it is not efficient.

Java

Java Scalability Traffic Architecture

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

MARCH 2, 2021

At Netflix, our data scientists span many areas of technical specialization, including experimentation, causal inference, machine learning, NLP, modeling, and optimization. Together with data analytics and data engineering, we comprise the larger, centralized Data Science and Engineering group.

Analytics

Analytics C++ Innovation Engineering

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 3, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Games Java

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 18, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Games Java

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 17, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Java Servers

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 9, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Games Java

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 24, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Software Engineering Engineering Big Data

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 30, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Software Engineering Engineering Big Data

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

This entertaining romp through the tech stack serves as an introduction to how we think about and design systems, the Netflix approach to operational challenges, and how other organizations can apply our thought processes and technologies.

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

This entertaining romp through the tech stack serves as an introduction to how we think about and design systems, the Netflix approach to operational challenges, and how other organizations can apply our thought processes and technologies.

AWS

AWS Entertainment Open Source Benchmarking

Data Pipelines: The Hammer for Every Nail

Abhishek Tiwari

JULY 7, 2023

In the era of big data and complex data processing, data pipelines have emerged as a popular solution for managing and manipulating data. They provide a systematic approach to extract, transform, and load (ETL) data from various sources, enabling organizations to derive valuable insights.

Logistics

Logistics Transportation Scalability Data Engineering

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

In recent times, in order to gain valuable insights or to develop the data-driven products companies such as Netflix, Spotify, Uber, AirBnB have built internal data pipelines. If built correctly, data pipelines can offer strategic advantages to the business. Depending on frameworks, data processing units (a.k.a

Latency

Latency Analytics Scalability Engineering

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A common theme across all these trends is to remove the complexity by simplifying data management as a whole. In 2018, we anticipate that ETL will either lose relevance or the ETL process will disintegrate and be consumed by new data architectures. Unified data management architecture.

Big Data

Big Data Artificial Intelligence Storage Hardware

ETL Workflow Modeling

Abhishek Tiwari

APRIL 14, 2018

Developing Extract–transform–load (ETL) workflow is a time-consuming activity yet a very important component of data warehousing process. The process to develop ETL workflow is often ad-hoc, complex, trial and error based. It has been suggested that formal modeling of ETL process can alleviate most of these pain points.

Best Practices

Best Practices Database Design Efficiency

Reimagining Experimentation Analysis at Netflix

The Netflix TechBlog

SEPTEMBER 10, 2019

Our data scientists faced numerous challenges in our previous infrastructure. Complex business logic was embedded directly into the ETL pipelines by data engineers. In order to replicate results, scientists had to delve deep into the data, code, and documentation.

Metrics

Metrics Architecture Infrastructure Innovation

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

Adrian Cockcroft

FEBRUARY 13, 2023

This is the AWS Professional Services built tooling that customers can use to track the carbon footprint of their operations and processes, along with a customer example. STP213 Scaling global carbon footprint management — Blake Blackwell Persefoni Manager Data Engineering and Michael Floyd AWS Head of Sustainability Solutions.

AWS

AWS Energy Architecture Programming

Part 3: A Survey of Analytics Engineering Work at Netflix

The Netflix TechBlog

JANUARY 6, 2025

This article is the last in a multi-part series sharing a breadth of Analytics Engineering work at Netflix, recently presented as part of our annual internal Analytics Engineering conference. Data is a tool that enhances decision-making, complementing the deep expertise and industry knowledge of our creative professionals.

Analytics

Analytics Engineering Cache Entertainment

Cloud Efficiency at Netflix

The Netflix TechBlog

DECEMBER 17, 2024

Our ecosystem enables engineering teams to run applications and services at scale, utilizing a mix of open-source and proprietary solutions. In turn, our self-serve platforms allow teams to create and deploy, sometimes custom, workloads more efficiently. The standardized data model and processing promotes scalability and consistency.

Efficiency

Efficiency Cloud Analytics Infrastructure

Our First Netflix Data Engineering Summit

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Trending Sources

A Recap of the Data Engineering Open Forum at Netflix

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Automated Testing in Data Engineering: An Imperative for Quality and Efficiency

Introducing Impressions at Netflix

2. Diving Deeper into Psyberg: Stateless vs Stateful Data Processing

Incremental Processing using Netflix Maestro and Apache Iceberg

How TripleLift Built an Adtech Data Pipeline Processing Billions of Events Per Day

Data Engineers of Netflix?—?Interview with Samuel Setegne

What is IT automation?

3. Psyberg: Automated end to end catch up

Optimizing data warehouse storage

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Supporting Diverse ML Systems at Netflix

Netflix at AWS re:Invent 2019

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Data Pipelines: The Hammer for Every Nail

Friends don't let friends build data pipelines

5 data integration trends that will define the future of ETL in 2018

ETL Workflow Modeling

Reimagining Experimentation Analysis at Netflix

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

Part 3: A Survey of Analytics Engineering Work at Netflix

Cloud Efficiency at Netflix

Stay Connected