Architecture, Data Engineering and Processing - Technology Performance Pulse

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions. What is late-arriving data? Let’s dive in!

Data Engineering

Data Engineering Engineering Processing Games

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

by Jun He , Yingyi Zhang , and Pawan Dixit Incremental processing is an approach to process new or changed data in workflows. The key advantage is that it only incrementally processes data that are newly added or updated to a dataset, instead of re-processing the complete dataset.

Processing

Processing Big Data Efficiency Engineering

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. This nuanced integration of data and technology empowers us to offer bespoke content recommendations. This queue ensures we are consistently capturing raw events from our global userbase.

Tuning

Tuning Latency Efficiency Storage

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

We adopted the following mission statement to guide our investments: “Provide a complete and accurate data lineage system enabling decision-makers to win moments of truth.” Nonetheless, Netflix data landscape (see below) is complex and many teams collaborate effectively for sharing the responsibility of our data system management.

Infrastructure

Infrastructure Big Data Transportation Architecture

What is IT automation?

Dynatrace

JULY 6, 2022

At its most basic, automating IT processes works by executing scripts or procedures either on a schedule or in response to particular events, such as checking a file into a code repository. Adding AIOps to automation processes makes the volume of data that applications and multicloud environments generate much less overwhelming.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

5 key areas for tech leaders to watch in 2020

O'Reilly

FEBRUARY 18, 2020

This year’s growth in Python usage was buoyed by its increasing popularity among data scientists and machine learning (ML) and artificial intelligence (AI) engineers. Software architecture, infrastructure, and operations are each changing rapidly. Trends in software architecture, infrastructure, and operations.

Software Architecture

Software Architecture DevOps Data Engineering Architecture

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

There are several benefits of such optimizations like saving on storage, faster query time, cheaper downstream processing, and an increase in developer productivity by removing additional ETLs written only for query performance improvement. Some of the optimizations are prerequisites for a high-performance data warehouse.

Storage

Storage Latency Efficiency Data Engineering

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Edge computing has transformed how businesses and industries process and manage data. By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Leverage tiered storage systems that dynamically offload data based on priority.

IoT

IoT Energy Logistics Latency

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

This operational component places some cognitive load on our engineers, requiring them to develop deep understanding of telemetry and alerting systems, capacity provisioning process, security and reliability best practices, and a vast amount of informal knowledge about the cloud infrastructure.

Infrastructure

Infrastructure Cloud Scalability AWS

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

Meson was based on a single leader architecture with high availability. In the data domain, it is common to have a super large number of jobs within a single workflow. For example, a workflow to backfill hourly data for the past five years can lead to 43800 jobs (24 * 365 * 5), each of which processes data for an hour.

Java

Java Scalability Traffic Architecture

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

InfoQ

NOVEMBER 29, 2023

HubSpot adopted routing messages over multiple Kafka topics (called swimlanes) for the same producer to avoid the build-up in the consumer group lag and prioritize the processing of real-time traffic.

Processing

Processing Traffic Data Engineering Scalability

Stream Processing: How it Works, Use Cases & Popular Frameworks

Simform

FEBRUARY 14, 2023

Stream processing has become a core part of enterprise data architecture today due to the explosive growth of data from sources such as IoT sensors, security logs, and web applications. This blog discusses the topic of stream processing in detail to help you navigate its landscape with ease.

Processing

Processing IoT Architecture Data Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

T riplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Learn to balance architecture trade-offs and design scalable enterprise-level software. The result is a process that doesn't get you the best employees you could. Apply here.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

T riplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Learn to balance architecture trade-offs and design scalable enterprise-level software. The result is a process that doesn't get you the best employees you could. Apply here.

Education

Education Software Engineering Scalability Engineering

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

In this way, no human intervention is required in the remediation process. We used feature hashing to process the non-numeric values because they come from a high cardinality and dynamic set of values. Multi-objective optimizations. Auto Remediation generates recommendations by considering both performance (i.e., user name).

Tuning

Tuning Efficiency Big Data Engineering

The death of Agile?

O'Reilly

MARCH 2, 2020

When a project is going off track because some requirement wasn’t understood properly, you need to fix that as soon as possible—not after a year-long development process. When processes change, who wins, who loses, and why? Software architecture, infrastructure, and operations are each changing rapidly.

Artificial Intelligence

Artificial Intelligence Software Architecture Programming C++

AI meets operations

O'Reilly

FEBRUARY 2, 2020

A model isn’t source code, and it isn’t data; it’s an artifact built from the two. Given source code and the training data, you could re-produce a model, but it almost certainly wouldn’t be the same because of randomization in the training process. Models almost certainly react to incoming data; that’s their point.

Software Architecture

Software Architecture Monitoring Software Engineering Code

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A common theme across all these trends is to remove the complexity by simplifying data management as a whole. In 2018, we anticipate that ETL will either lose relevance or the ETL process will disintegrate and be consumed by new data architectures. Unified data management architecture.

Big Data

Big Data Artificial Intelligence Storage Hardware

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

In recent times, in order to gain valuable insights or to develop the data-driven products companies such as Netflix, Spotify, Uber, AirBnB have built internal data pipelines. If built correctly, data pipelines can offer strategic advantages to the business. Depending on frameworks, data processing units (a.k.a

Latency

Latency Analytics Scalability Engineering

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Custom schedulers.

Big Data

Big Data Storage Benchmarking Hardware

Organise your engineering teams around the work by reteaming

Abhishek Tiwari

JULY 20, 2019

Specialisation could be around products, business process, or technologies. One way to create a Spotify model inspired engineering organisation is to organise long-lived squads by retail business process hubs - i.e. specialisation around business process. Let's take an example of retail as a domain of interest.

Engineering

Engineering Retail Airlines Healthcare

Symphonia at Velocity 2018, and more Serverless Insights

The Symphonia

JUNE 19, 2018

I was fortunate to be both presenting a 2-day workshop (on AWS Serverless Architectures and Continuous Deployment) as well as hosting a full-day Serverless track of talks. One of the catalysts for starting Symphonia was the massive interest in my article on “ Serverless Architectures ” that is published on Martin Fowler’s site.

Serverless

Serverless AWS DevOps Open Source

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

InfoQ

AUGUST 7, 2024

Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.

AWS

AWS Analytics Performance Data Engineering

How machine learning is accelerating data integration?

Abhishek Tiwari

DECEMBER 23, 2017

Data integration generally requires in-depth domain knowledge, a strong understanding of data schemas and underlying relationships. This can be time-consuming and bit challenging if you are dealing with hundreds of data sources and thousands of event types (see my recent article on ELT architecture ).

Architecture

Architecture Engineering Processing Code

Reimagining Experimentation Analysis at Netflix

The Netflix TechBlog

SEPTEMBER 10, 2019

Our data scientists often want to apply their knowledge of the business and statistics to fully understand the outcome of an experiment. Instead of relying on engineers to productionize scientific contributions, we’ve made a strategic bet to build an architecture that enables data scientists to easily contribute.

Metrics

Metrics Architecture Infrastructure Innovation

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

All Things Distributed

NOVEMBER 15, 2016

Previously, I wrote about Amazon QuickSight , a new service targeted at business users that aims to simplify the process of deriving insights from a wide variety of data sources quickly, easily, and at a low cost. Put simply, data is not always readily available and accessible to organizational end users.

Analytics

Analytics Availability Media Social Media

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

Adrian Cockcroft

FEBRUARY 13, 2023

This is the AWS Professional Services built tooling that customers can use to track the carbon footprint of their operations and processes, along with a customer example. STP213 Scaling global carbon footprint management — Blake Blackwell Persefoni Manager Data Engineering and Michael Floyd AWS Head of Sustainability Solutions.

AWS

AWS Energy Architecture Programming

Part 3: A Survey of Analytics Engineering Work at Netflix

The Netflix TechBlog

JANUARY 6, 2025

A few questions to ask yourself when considering the information architecture of your appinclude: Do you have different user groups trying to accomplish different things? Data is a tool that enhances decision-making, complementing the deep expertise and industry knowledge of our creative professionals.

Analytics

Analytics Engineering Cache Entertainment

Educating a New Generation of Workers

O'Reilly

NOVEMBER 26, 2024

Entirely new paradigms rise quickly: cloud computing, data engineering, machine learning engineering, mobile development, and large language models. It’s less risky to hire adjunct professors with industry experience to fill teaching roles that have a vocational focus: mobile development, data engineering, and cloud computing.

Education

Education Azure AWS Java

Technology Performance Pulse

1. Streamlining Membership Data Engineering at Netflix with Psyberg

A Recap of the Data Engineering Open Forum at Netflix

Trending Sources

Data Engineers of Netflix?—?Interview with Samuel Setegne

Incremental Processing using Netflix Maestro and Apache Iceberg

Introducing Impressions at Netflix

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

What is IT automation?

5 key areas for tech leaders to watch in 2020

Optimizing data warehouse storage

These 7 Edge Data Challenges Will Test Companies the Most in 2025

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

Stream Processing: How it Works, Use Cases & Popular Frameworks

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The death of Agile?

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

AI meets operations

5 data integration trends that will define the future of ETL in 2018

Friends don't let friends build data pipelines

Kubernetes for Big Data Workloads

Organise your engineering teams around the work by reteaming

Symphonia at Velocity 2018, and more Serverless Insights

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

How machine learning is accelerating data integration?

Reimagining Experimentation Analysis at Netflix

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

Part 3: A Survey of Analytics Engineering Work at Netflix

Educating a New Generation of Workers

Stay Connected