Data Engineering and Testing - Technology Performance Pulse

Our First Netflix Data Engineering Summit

The Netflix TechBlog

DECEMBER 14, 2023

Engineers from across the company came together to share best practices on everything from Data Processing Patterns to Building Reliable Data Pipelines. The result was a series of talks which we are now sharing with the rest of the Data Engineering community! In this video, Sr.

Data Engineering

Data Engineering Engineering Software Engineering Best Practices

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

JULY 15, 2021

Data Engineers of Netflix?—?Interview Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin, what drew you to data engineering?

Data Engineering

Data Engineering Engineering Entertainment Big Data

Automated Testing in Data Engineering: An Imperative for Quality and Efficiency

DZone

JANUARY 9, 2024

This holds true for the critical field of data engineering as well. As organizations gather and process astronomical volumes of data, manual testing is no longer feasible or reliable. Automated testing methodologies are now imperative to deliver speed, accuracy, and integrity.

Data Engineering

Data Engineering Efficiency Engineering Testing

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Bringing Software Engineering Rigor to Data

DZone

FEBRUARY 20, 2023

This talk covers ways to leverage software engineering practices for data engineering and demonstrates how measuring key performance metrics could help build more robust and reliable data pipelines.

Software Engineering

Software Engineering Engineering Software Software

Ready-to-go sample data pipelines with Dataflow

The Netflix TechBlog

DECEMBER 3, 2022

The most commonly used one is dataflow project , which helps folks in managing their data pipeline repositories through creation, testing, deployment and few other activities. It lets you create YAML formatted mock data files based on selected tables, columns and a few rows of data from the Netflix data warehouse.

Best Practices

Best Practices Code Testing Data Engineering

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Key issues include: A shortage of edge-native data engineers and architects. The post These 7 Edge Data Challenges Will Test Companies the Most in 2025 appeared first on Volt Active Data. Limited understanding of edge-specific use cases among traditional IT teams. High costs of training and retaining talent.

IoT

IoT Energy Logistics Latency

Data pipeline asset management with Dataflow

The Netflix TechBlog

FEBRUARY 9, 2022

Let’s define some requirements that we are interested in delivering to the Netflix data engineers or anyone who would like to schedule a workflow with some external assets in it. How do you set up your deployment logic to know when to deploy the workflow to a test or dev environment? Is a single location enough? setup.py ???

Storage

Storage Data Engineering Testing Code

What is IT automation?

Dynatrace

JULY 6, 2022

Testing automation can be painstaking. It’s also crucial to test frequently when automating IT operations so that you don’t automatically replicate mistakes. This requires significant data engineering efforts, as well as work to build machine-learning models.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

MARCH 2, 2021

Together with data analytics and data engineering, we comprise the larger, centralized Data Science and Engineering group. Learning through data is in Netflix’s DNA. We use A/B tests to introduce new product features, such as our daily Top 10 row that help our members discover their next favorite show.

Analytics

Analytics C++ Innovation Engineering

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

While our engineering teams have and continue to build solutions to lighten this cognitive load (better guardrails, improved tooling, …), data and its derived products are critical elements to understanding, optimizing and abstracting our infrastructure. What will be the cost of rolling out the winning cell of an AB test to all users?

Infrastructure

Infrastructure Cloud Scalability AWS

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

Since memory management is not something one usually associates with classification problems, this blog focuses on formulating the problem as an ML problem and the data engineering that goes along with it. Some nuances while creating this dataset come from the on-field domain knowledge of our engineers.

Big Data

Big Data Cache Engineering Data Engineering

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

IPS enables users to continue to use the data processing patterns with minimal changes. Introduction Netflix relies on data to power its business in all phases. Users configure the workflow to read the data in a window (e.g. data arrives too late to be useful). past 3 hours or 10 days).

Processing

Processing Big Data Efficiency Engineering

How Our Paths Brought Us to Data and Netflix

The Netflix TechBlog

SEPTEMBER 18, 2020

I focus on improving experimentation methodology to test how well the newest files are working: do they need less bits to stream while providing a higher video quality? When we test new encodes, we need effective data science methods to quickly and accurately understand whether customers are having a better experience.

Analytics

Analytics Education Innovation Engineering

AI meets operations

O'Reilly

FEBRUARY 2, 2020

On one hand, ops groups are in a good position to do this; they’re already heavily invested in testing, monitoring, version control, reproducibility, and automation. This has important implications for testing. In the last two decades, a tremendous amount of work has been done on testing and building test suites.

Software Architecture

Software Architecture Monitoring Software Engineering Code

Top 20 Websites For Online Automation Testing Courses and Certifications

Testsigma

NOVEMBER 28, 2019

This article presents a list of some of the best available online Automation Testing courses and certifications you can consider. Hope this answers a few queries we receive for suggestions on Online Automation Testing Courses and Certifications. There are various courses listed on Udemy which can help people learn automation testing.

Website

Website Testing Programming Automotive

Your technology architecture and engineering organization should coevolve as your startup grows

Abhishek Tiwari

FEBRUARY 26, 2020

Get test automation improved (10-20%). Increase automated test cover (30-40%) and start implementing property tests like performance and load testing. Test coverage (50-70%). Introduce site-reliability engineering best-practices (SLI/SLOs). Skills++: Induct Site-reliability engineers, Data engineers.

Technology

Technology Technology Architecture Engineering

The death of Agile?

O'Reilly

MARCH 2, 2020

Fetishizing unit testing. The most important is discovering how to work with data science and artificial intelligence projects. Development timelines for these projects aren’t as predictable as traditional software; they stretch the meaning of “testing” in strange ways; they aren’t deterministic.

Artificial Intelligence

Artificial Intelligence Software Architecture Programming C++

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. This work would have not been possible without the solid, in-depth collaborations.

Tuning

Tuning Efficiency Big Data Engineering

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

Unfortunately, building data pipelines remains a daunting, time-consuming, and costly activity. Not everyone is operating at Netflix or Spotify scale data engineering function. Often companies underestimate the necessary effort and cost involved to build and maintain data pipelines.

Latency

Latency Analytics Scalability Engineering

Shadows

The Agile Manager

JULY 31, 2022

There are shadow IT teams of developers or data engineers that spring up in areas like operations or marketing because the captive IT function is slow, if not outright incapable, of responding to internal customer demand. Iteration plans are commitments; unit tests are guarantees of quality. The scope taken out of the 1.0

Programming

Programming Engineering Data Engineering Innovation

Optimizing dbt and Google’s BigQuery

DZone

DECEMBER 21, 2020

Setting up a data warehouse is the first step towards fully utilizing big data analysis. Still, it is one of many that need to be taken before you can generate value from the data you gather. An important step in that chain of the process is data modeling and transformation.

Big Data

Big Data Google Scalability Processing

Microservices Adoption in 2020

O'Reilly

JULY 15, 2020

Most (74%) respondents say their teams own the build-test-deploy-maintain phases of the software lifecycle. Technical roles represented in the “Other” category include IT managers, data engineers, DevOps practitioners, data scientists, systems engineers, and systems administrators. Success with containers.

Database

Database Architecture Education Systems

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next.

Big Data

Big Data Storage Benchmarking Hardware

Organise your engineering teams around the work by reteaming

Abhishek Tiwari

JULY 20, 2019

Depending on work you can choose a smaller team of similar expertise (for example a team with mostly frontend engineers) or a smaller team of diverse expertise (team with balanced frontend, backend, data engineers). Thirdly, let engineers themselves choose the delivery teams and organise them around the initiative.

Engineering

Engineering Retail Airlines Healthcare

Experimentation is a major focus of Data Science across Netflix

The Netflix TechBlog

JANUARY 11, 2022

Here we describe the role of Experimentation and A/B testing within the larger Data Science and Engineering organization at Netflix, including how our platform investments support running tests at scale while enabling innovation. Curious to learn about what it’s like to be a Data Engineer at Netflix?

Innovation

Innovation Metrics Engineering Testing

Reimagining Experimentation Analysis at Netflix

The Netflix TechBlog

SEPTEMBER 10, 2019

Toby Mao , Sri Sri Perangur , Colin McFarland Another day, another custom script to analyze an A/B test. You can look at ABlaze (our centralized A/B testing platform) and take a quick look at how it’s performing. At any point a Netflix user is in many different A/B tests orchestrated through ABlaze. Not at Netflix.

Metrics

Metrics Architecture Infrastructure Innovation

Part 3: A Survey of Analytics Engineering Work at Netflix

The Netflix TechBlog

JANUARY 6, 2025

Batch processing data may provide a similar impact and take significantly less time. Its easier to develop and maintain, and tends to be more familiar for analytics engineers, data scientists, and data engineers. Additionally, if you are developing a proof of concept, the upfront investment may not be worth it.

Analytics

Analytics Engineering Cache Entertainment

Educating a New Generation of Workers

O'Reilly

NOVEMBER 26, 2024

Entirely new paradigms rise quickly: cloud computing, data engineering, machine learning engineering, mobile development, and large language models. It’s less risky to hire adjunct professors with industry experience to fill teaching roles that have a vocational focus: mobile development, data engineering, and cloud computing.

Education

Education Azure AWS Java

Technology Performance Pulse

Our First Netflix Data Engineering Summit

Data Engineers of Netflix?—?Interview with Kevin Wylie

Trending Sources

Automated Testing in Data Engineering: An Imperative for Quality and Efficiency

Data Engineers of Netflix?—?Interview with Samuel Setegne

Bringing Software Engineering Rigor to Data

Ready-to-go sample data pipelines with Dataflow

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Data pipeline asset management with Dataflow

What is IT automation?

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

Incremental Processing using Netflix Maestro and Apache Iceberg

How Our Paths Brought Us to Data and Netflix

AI meets operations

Top 20 Websites For Online Automation Testing Courses and Certifications

Your technology architecture and engineering organization should coevolve as your startup grows

The death of Agile?

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Friends don't let friends build data pipelines

Shadows

Optimizing dbt and Google’s BigQuery

Microservices Adoption in 2020

Kubernetes for Big Data Workloads

Organise your engineering teams around the work by reteaming

Experimentation is a major focus of Data Science across Netflix

Reimagining Experimentation Analysis at Netflix

Part 3: A Survey of Analytics Engineering Work at Netflix

Educating a New Generation of Workers

Stay Connected