Data Engineering and Development - Technology Performance Pulse

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions. We will talk about this shortly.

Data Engineering

Data Engineering Engineering Processing Games

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

JULY 15, 2021

Data Engineers of Netflix?—?Interview Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin, what drew you to data engineering?

Data Engineering

Data Engineering Engineering Entertainment Big Data

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Ready-to-go sample data pipelines with Dataflow

The Netflix TechBlog

DECEMBER 3, 2022

Dataflow Dataflow is a command line utility built to improve experience and to streamline the data pipeline development at Netflix. The most commonly used one is dataflow project , which helps folks in managing their data pipeline repositories through creation, testing, deployment and few other activities. test_sparksql_write.py

Best Practices

Best Practices Code Testing Data Engineering

Analytics at Netflix: Who we are and what we do

The Netflix TechBlog

SEPTEMBER 18, 2020

Full ownership often means building new data pipelines, navigating complex schemas and large data sets, developing or improving metrics for business performance, and creating intuitive visualizations and dashboards?—?always Others have grown into new areas as part of their professional development at Netflix.

Analytics

Analytics Engineering Data Engineering Tuning

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

As a micro-service owner, a Netflix engineer is responsible for its innovation as well as its operation, which includes making sure the service is reliable, secure, efficient and performant. How can we develop templated detection modules (rules- and ML-based) and data streams to increases speed of development?

Infrastructure

Infrastructure Cloud Scalability AWS

5 key areas for tech leaders to watch in 2020

O'Reilly

FEBRUARY 18, 2020

There’s plenty of security risks for business executives, sysadmins, DBAs, developers, etc., The laggard use case was Python-based web development frameworks, which grew by just 3% in usage, year over year. there’s a Python library for virtually anything a developer or data scientist might need to do. to be wary of.

Software Architecture

Software Architecture DevOps Data Engineering Architecture

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

We adopted the following mission statement to guide our investments: “Provide a complete and accurate data lineage system enabling decision-makers to win moments of truth.” Nonetheless, Netflix data landscape (see below) is complex and many teams collaborate effectively for sharing the responsibility of our data system management.

Infrastructure

Infrastructure Big Data Transportation Architecture

Data pipeline asset management with Dataflow

The Netflix TechBlog

FEBRUARY 9, 2022

Let’s define some requirements that we are interested in delivering to the Netflix data engineers or anyone who would like to schedule a workflow with some external assets in it. Conclusions This new method available for Netflix data engineers makes workflow management easier, more transparent and more reliable.

Storage

Storage Data Engineering Testing Code

What is IT automation?

Dynatrace

JULY 6, 2022

Developing automation takes time. This requires significant data engineering efforts, as well as work to build machine-learning models. To learn more about how Dynatrace drives more efficient IT operations by automating IT processes, read the ebook, Developing an AIOps strategy for cloud observability. Read eBook now!

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. Some of the optimizations are prerequisites for a high-performance data warehouse.

Storage

Storage Latency Efficiency Data Engineering

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

MARCH 2, 2021

At Netflix, our data scientists span many areas of technical specialization, including experimentation, causal inference, machine learning, NLP, modeling, and optimization. Together with data analytics and data engineering, we comprise the larger, centralized Data Science and Engineering group.

Analytics

Analytics C++ Innovation Engineering

InfoQ Dev Summit in Boston: Two Days of Talks for Senior Developers

InfoQ

DECEMBER 20, 2023

This event is designed to help senior developers navigate their immediate development challenges, focusing exclusively on the technical aspects that matter right now. InfoQ is delighted to announce a new two-day conference, InfoQ Dev Summit Boston 2024, taking place June 24-25, 2024. By Artenisa Chatziou

Development

Development Design Data Engineering Scalability

Scaling Appsec at Netflix (Part 2)

The Netflix TechBlog

JUNE 6, 2022

including bug bounty, pentesting, PSIRT (product security incident response), security reviews, and developer security education?—?via As the team skews towards more software engineering focused talent, ramping up to support the shared Appsec-focused on-call has been challenging. via a shared on-call rotation.

Software Engineering

Software Engineering Scalability Education Engineering

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Occasionally, these use cases involve terabytes of data, so we have to pay attention to performance. By targeting @titus, Metaflow tasks benefit from these battle-hardened features out of the box, with no in-depth technical knowledge or engineering required from the ML engineers or data scientist end.

Systems

Systems Media Cache Open Source

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

These challenges are currently addressed in suboptimal and less cost efficient ways by individual local teams to fulfill the needs, such as Lookback: This is a generic and simple approach that data engineers use to solve the data accuracy problem. Users configure the workflow to read the data in a window (e.g.

Processing

Processing Big Data Efficiency Engineering

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

3:15pm-4:15pm OPN 209 Netflix’s application deployment at scale Andy Glover , Director Delivery Engineering & Paul Roberts, AWS Abstract : Spinnaker is an open-source continuous-delivery platform created by Netflix to improve its developers’ efficiency and reduce the time it takes to get an application into production.

AWS

AWS Entertainment Open Source Benchmarking

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

Since memory management is not something one usually associates with classification problems, this blog focuses on formulating the problem as an ML problem and the data engineering that goes along with it. for us at Netflix, this is a combination of the device type, app session ID and software development kit version (SDK version).

Big Data

Big Data Cache Engineering Data Engineering

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

History & motivation There were two main motivating use cases that drove Pushy’s initial development and usage. These pain points coincided with the introduction of KeyValue, which was a new offering from the CDE team that is roughly “HashMap as a service” for Netflix developers.

Latency

Latency Cache Tuning Efficiency

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

increasing at > 100% a year, the need for a scalable data workflow orchestrator has become paramount for Netflix’s business needs. After perusing the current landscape of workflow orchestrators, we decided to develop a next generation system that can scale horizontally to spread the jobs across the cluster consisting of 100’s of nodes.

Java

Java Scalability Traffic Architecture

The death of Agile?

O'Reilly

MARCH 2, 2020

The one thing I don’t see, and the one thing that more than anything else captures the value in Agile, is the ongoing conversation between the customer (however that’s conceived) and the developer. Agile is not, and never was, about getting developers to write software faster. This is important. Neckbeards? Geeks and nerds?

Artificial Intelligence

Artificial Intelligence Software Architecture Programming C++

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

NOVEMBER 12, 2019

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Java Software Engineering Engineering

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 3, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Games Java

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc. Please apply here. Join more than 265,000 other learners.

Education

Education Software Engineering Scalability Engineering

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 18, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Games Java

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

OCTOBER 29, 2019

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Java Software Engineering Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc. Please apply here. Join more than 265,000 other learners.

Education

Education Software Engineering Scalability Engineering

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 24, 2020

You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc. Please apply here. Try the 30-day free trial!

Education

Education Software Engineering Engineering Big Data

How Our Paths Brought Us to Data and Netflix

The Netflix TechBlog

SEPTEMBER 18, 2020

My work is typically developed in R or Python. I focus on improving experimentation methodology to test how well the newest files are working: do they need less bits to stream while providing a higher video quality? Do they cause less errors?

Analytics

Analytics Education Innovation Engineering

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 17, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Java Servers

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 9, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Games Java

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 30, 2020

You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc. Please apply here. Try the 30-day free trial!

Education

Education Software Engineering Engineering Big Data

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

JANUARY 7, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Java Software Engineering Engineering

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

DECEMBER 12, 2019

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Java Software Engineering Engineering

AI meets operations

O'Reilly

FEBRUARY 2, 2020

Source code is relatively less important compared to typical applications; the training data is what determines how the model behaves, and the training process is all about tweaking parameters in the application so that it delivers correct results most of the time. You need a repository for models and for the training data.

Software Architecture

Software Architecture Monitoring Software Engineering Code

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

For efficient error handling, Netflix developed an error classification service, called Pensive, which leverages a rule-based classifier for error classification. To address these challenges, we have developed a new feature, called Auto Remediation , which integrates the rule-based classifier with an ML service.

Tuning

Tuning Efficiency Big Data Engineering

Your technology architecture and engineering organization should coevolve as your startup grows

Abhishek Tiwari

FEBRUARY 26, 2020

Containers for local development and probably in UAT. Explore serverless functions to create Skills++: Induct Technical Architects, Developer Experience (DevX) 50-100 Engineers Focus: Finding new ways to add more value quickly for your customers by exploiting data. Test coverage (50-70%).

Technology

Technology Technology Architecture Engineering

Microservices Adoption in 2020

O'Reilly

JULY 15, 2020

Adding architects and engineers, we see that roughly 55% of the respondents are directly involved in software development. Technical roles represented in the “Other” category include IT managers, data engineers, DevOps practitioners, data scientists, systems engineers, and systems administrators.

Database

Database Architecture Education Systems

A Recap of the Data Engineering Open Forum at Netflix

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Trending Sources

Data Engineers of Netflix?—?Interview with Kevin Wylie

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Data Engineers of Netflix?—?Interview with Samuel Setegne

Ready-to-go sample data pipelines with Dataflow

Analytics at Netflix: Who we are and what we do

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

5 key areas for tech leaders to watch in 2020

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Data pipeline asset management with Dataflow

What is IT automation?

Optimizing data warehouse storage

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

InfoQ Dev Summit in Boston: Two Days of Talks for Senior Developers

Scaling Appsec at Netflix (Part 2)

Supporting Diverse ML Systems at Netflix

Incremental Processing using Netflix Maestro and Apache Iceberg

Netflix at AWS re:Invent 2019

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The death of Agile?

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

How Our Paths Brought Us to Data and Netflix

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

AI meets operations

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Your technology architecture and engineering organization should coevolve as your startup grows

Microservices Adoption in 2020

Stay Connected