Data Engineering, Design and Processing - Technology Performance Pulse

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions. What is late-arriving data? Let’s dive in!

Data Engineering

Data Engineering Engineering Processing Games

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

JULY 15, 2021

Data Engineers of Netflix?—?Interview Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin, what drew you to data engineering?

Data Engineering

Data Engineering Engineering Entertainment Big Data

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

by Jun He , Yingyi Zhang , and Pawan Dixit Incremental processing is an approach to process new or changed data in workflows. The key advantage is that it only incrementally processes data that are newly added or updated to a dataset, instead of re-processing the complete dataset.

Processing

Processing Big Data Efficiency Engineering

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Dhevi Rajendran Dhevi Rajendran This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix.

Data Engineering

Data Engineering Engineering Software Engineering Big Data

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Ready-to-go sample data pipelines with Dataflow

The Netflix TechBlog

DECEMBER 3, 2022

Obviously not all tools are made with the same use case in mind, so we are planning to add more code samples for other (than classical batch ETL) data processing purposes, e.g. Machine Learning model building and scoring. The main workflow definition file holds the logic of a single run, in this case one day-worth of data.

Best Practices

Best Practices Code Testing Data Engineering

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Finally, imagine yourself in the role of a data platform reliability engineer tasked with providing advanced lead time to data pipeline (ETL) owners by proactively identifying issues upstream to their ETL jobs. Design a flexible data model ? —?Represent Enable seamless integration?—? push or pull.

Infrastructure

Infrastructure Big Data Transportation Architecture

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

There are several benefits of such optimizations like saving on storage, faster query time, cheaper downstream processing, and an increase in developer productivity by removing additional ETLs written only for query performance improvement. Some of the optimizations are prerequisites for a high-performance data warehouse.

Storage

Storage Latency Efficiency Data Engineering

Data pipeline asset management with Dataflow

The Netflix TechBlog

FEBRUARY 9, 2022

see “data pipeline” Intro The problem of managing scheduled workflows and their assets is as old as the use of cron daemon in early Unix operating systems. The design of a cron job is simple, you take some system command, you pick the schedule to run it on and you are done. workflow ?—?see Example: 0 0 * * MON /home/alice/backup.sh

Storage

Storage Data Engineering Testing Code

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

MARCH 2, 2021

At Netflix, our data scientists span many areas of technical specialization, including experimentation, causal inference, machine learning, NLP, modeling, and optimization. Together with data analytics and data engineering, we comprise the larger, centralized Data Science and Engineering group.

Analytics

Analytics C++ Innovation Engineering

5 key areas for tech leaders to watch in 2020

O'Reilly

FEBRUARY 18, 2020

This year’s growth in Python usage was buoyed by its increasing popularity among data scientists and machine learning (ML) and artificial intelligence (AI) engineers. The shift to cloud native design is transforming both software architecture and infrastructure and operations. Coincidence?

Software Architecture

Software Architecture DevOps Data Engineering Architecture

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Edge computing has transformed how businesses and industries process and manage data. By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Leverage tiered storage systems that dynamically offload data based on priority.

IoT

IoT Energy Logistics Latency

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems.

Systems

Systems Media Cache Open Source

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN.

AWS

AWS Entertainment Open Source Benchmarking

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

InfoQ

NOVEMBER 29, 2023

HubSpot adopted routing messages over multiple Kafka topics (called swimlanes) for the same producer to avoid the build-up in the consumer group lag and prioritize the processing of real-time traffic.

Processing

Processing Traffic Data Engineering Scalability

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

The voice service then constructs a message for the device and places it on the message queue, which is then processed and sent to Pushy to deliver to the device. To support this growth, we’ve revisited Pushy’s past assumptions and design decisions with an eye towards both Pushy’s future role and future stability.

Latency

Latency Cache Tuning Efficiency

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

In the data domain, it is common to have a super large number of jobs within a single workflow. For example, a workflow to backfill hourly data for the past five years can lead to 43800 jobs (24 * 365 * 5), each of which processes data for an hour. All the requests are processed via distributed queues for message passing.

Java

Java Scalability Traffic Architecture

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 3, 2020

Level up on in-demand technologies and prep for your interviews on Educative.io, featuring popular courses like the bestselling Grokking the System Design Interview. Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Try the 30-day free trial!

Education

Education Engineering Games Java

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! Learn to balance architecture trade-offs and design scalable enterprise-level software.

Education

Education Software Engineering Scalability Engineering

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 18, 2020

Level up on in-demand technologies and prep for your interviews on Educative.io, featuring popular courses like the bestselling Grokking the System Design Interview. Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Try the 30-day free trial!

Education

Education Engineering Games Java

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! Learn to balance architecture trade-offs and design scalable enterprise-level software.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 30, 2020

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! The result is a process that doesn't get you the best employees you could.

Education

Education Software Engineering Engineering Big Data

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 24, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Engineering Big Data

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 17, 2020

Level up on in-demand technologies and prep for your interviews on Educative.io, featuring popular courses like the bestselling Grokking the System Design Interview. Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Try the 30-day free trial!

Education

Education Engineering Java Servers

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 9, 2020

Level up on in-demand technologies and prep for your interviews on Educative.io, featuring popular courses like the bestselling Grokking the System Design Interview. Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Try the 30-day free trial!

Education

Education Engineering Games Java

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

In this way, no human intervention is required in the remediation process. We used feature hashing to process the non-numeric values because they come from a high cardinality and dynamic set of values. Multi-objective optimizations. Auto Remediation generates recommendations by considering both performance (i.e., user name).

Tuning

Tuning Efficiency Big Data Engineering

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN.

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN.

AWS

AWS Entertainment Open Source Benchmarking

The death of Agile?

O'Reilly

MARCH 2, 2020

When a project is going off track because some requirement wasn’t understood properly, you need to fix that as soon as possible—not after a year-long development process. This has to do with the concept of bounded context from Domain Driven Design.). When processes change, who wins, who loses, and why? Coincidence?

Artificial Intelligence

Artificial Intelligence Software Architecture Programming C++

Data Pipelines: The Hammer for Every Nail

Abhishek Tiwari

JULY 7, 2023

In the era of big data and complex data processing, data pipelines have emerged as a popular solution for managing and manipulating data. They provide a systematic approach to extract, transform, and load (ETL) data from various sources, enabling organizations to derive valuable insights.

Logistics

Logistics Transportation Scalability Data Engineering

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

In recent times, in order to gain valuable insights or to develop the data-driven products companies such as Netflix, Spotify, Uber, AirBnB have built internal data pipelines. If built correctly, data pipelines can offer strategic advantages to the business. Depending on frameworks, data processing units (a.k.a

Latency

Latency Analytics Scalability Engineering

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A common theme across all these trends is to remove the complexity by simplifying data management as a whole. In 2018, we anticipate that ETL will either lose relevance or the ETL process will disintegrate and be consumed by new data architectures. Unified data management architecture.

Big Data

Big Data Artificial Intelligence Storage Hardware

Organise your engineering teams around the work by reteaming

Abhishek Tiwari

JULY 20, 2019

One thing stand-out to me is being intentional and practical about your engineering organisation design. First and foremost, being intentional about organisation design requires good and honest discussions about all possible option. Specialisation could be around products, business process, or technologies. Certainly not.

Engineering

Engineering Retail Airlines Healthcare

ETL Workflow Modeling

Abhishek Tiwari

APRIL 14, 2018

Developing Extract–transform–load (ETL) workflow is a time-consuming activity yet a very important component of data warehousing process. The process to develop ETL workflow is often ad-hoc, complex, trial and error based. It has been suggested that formal modeling of ETL process can alleviate most of these pain points.

Best Practices

Best Practices Database Design Efficiency

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

InfoQ

AUGUST 7, 2024

Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.

AWS

AWS Analytics Performance Data Engineering

Top 20 Websites For Online Automation Testing Courses and Certifications

Testsigma

NOVEMBER 28, 2019

Udacity Udacity provides nanodegree programs on all automation languages like C++, Machine Learning, Data engineer, Robotics and more. They have diploma courses in Software Testing, Introduction to Python, Diploma in Design Thinking, Diploma in Python Programming. Test Management Improving the Test Process.

Website

Website Testing Programming Automotive

Symphonia at Velocity 2018, and more Serverless Insights

The Symphonia

JUNE 19, 2018

This type of talk is close to my heart since I strongly feel we need to do a lot of deep thinking (and subsequent knowledge sharing) of design and architecture patterns with Serverless. Over the past 9 months John’s been helping our friends at Beyondsoft with their new open-source Serverless data lake project?—?

Serverless

Serverless AWS DevOps Open Source

Reimagining Experimentation Analysis at Netflix

The Netflix TechBlog

SEPTEMBER 10, 2019

Our data scientists faced numerous challenges in our previous infrastructure. Complex business logic was embedded directly into the ETL pipelines by data engineers. In order to replicate results, scientists had to delve deep into the data, code, and documentation.

Metrics

Metrics Architecture Infrastructure Innovation

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

All Things Distributed

NOVEMBER 15, 2016

Previously, I wrote about Amazon QuickSight , a new service targeted at business users that aims to simplify the process of deriving insights from a wide variety of data sources quickly, easily, and at a low cost. Put simply, data is not always readily available and accessible to organizational end users.

Analytics

Analytics Availability Media Social Media

Part 3: A Survey of Analytics Engineering Work at Netflix

The Netflix TechBlog

JANUARY 6, 2025

Dashboard DesignTips Rina Chang , SusieLu What is design, and why does it matter? Often people think design is about how things look, but design is actually about how things work. Everything is designed, because were all making choices about how things work, but not everything is designed well.

Analytics

Analytics Engineering Cache Entertainment

1. Streamlining Membership Data Engineering at Netflix with Psyberg

A Recap of the Data Engineering Open Forum at Netflix

Trending Sources

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Data Engineers of Netflix?—?Interview with Kevin Wylie

Incremental Processing using Netflix Maestro and Apache Iceberg

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

Data Engineers of Netflix?—?Interview with Samuel Setegne

Ready-to-go sample data pipelines with Dataflow

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Optimizing data warehouse storage

Data pipeline asset management with Dataflow

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

5 key areas for tech leaders to watch in 2020

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Supporting Diverse ML Systems at Netflix

Netflix at AWS re:Invent 2019

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

The death of Agile?

Data Pipelines: The Hammer for Every Nail

Friends don't let friends build data pipelines

5 data integration trends that will define the future of ETL in 2018

Organise your engineering teams around the work by reteaming

ETL Workflow Modeling

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

Top 20 Websites For Online Automation Testing Courses and Certifications

Symphonia at Velocity 2018, and more Serverless Insights

Reimagining Experimentation Analysis at Netflix

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

Part 3: A Survey of Analytics Engineering Work at Netflix

Stay Connected