Architecture and Data Engineering - Technology Performance Pulse

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions.

Data Engineering

Data Engineering Engineering Processing Games

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Architecture Overview The first pivotal step in managing impressions begins with the creation of a Source-of-Truth (SOT) dataset. The enriched data is seamlessly accessible for both real-time applications via Kafka and historical analysis through storage in an Apache Iceberg table.

Tuning

Tuning Latency Efficiency Storage

Reimagining Experimentation Analysis at Netflix

The Netflix TechBlog

SEPTEMBER 10, 2019

Our data scientists often want to apply their knowledge of the business and statistics to fully understand the outcome of an experiment. Instead of relying on engineers to productionize scientific contributions, we’ve made a strategic bet to build an architecture that enables data scientists to easily contribute.

Metrics

Metrics Architecture Infrastructure Innovation

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

We adopted the following mission statement to guide our investments: “Provide a complete and accurate data lineage system enabling decision-makers to win moments of truth.” Nonetheless, Netflix data landscape (see below) is complex and many teams collaborate effectively for sharing the responsibility of our data system management.

Infrastructure

Infrastructure Big Data Transportation Architecture

5 key areas for tech leaders to watch in 2020

O'Reilly

FEBRUARY 18, 2020

This year’s growth in Python usage was buoyed by its increasing popularity among data scientists and machine learning (ML) and artificial intelligence (AI) engineers. Software architecture, infrastructure, and operations are each changing rapidly. Trends in software architecture, infrastructure, and operations.

Software Architecture

Software Architecture DevOps Data Engineering Architecture

Your technology architecture and engineering organization should coevolve as your startup grows

Abhishek Tiwari

FEBRUARY 26, 2020

The evolution of your technology architecture should depend on the size, culture, and skill set of your engineering organization. There are no hard-and-fast rules to figure out interdependency between technology architecture and engineering organization but below is what I think can really work well for product startup.

Technology

Technology Technology Architecture Engineering

What is IT automation?

Dynatrace

JULY 6, 2022

As organizations continue to adopt multicloud strategies, the complexity of these environments grows, increasing the need to automate cloud engineering operations to ensure organizations can enforce their policies and architecture principles. How organizations benefit from automating IT practices.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

While our engineering teams have and continue to build solutions to lighten this cognitive load (better guardrails, improved tooling, …), data and its derived products are critical elements to understanding, optimizing and abstracting our infrastructure. Give us a holler if you are interested in a thought exchange.

Infrastructure

Infrastructure Cloud Scalability AWS

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture. Some of the optimizations are prerequisites for a high-performance data warehouse.

Storage

Storage Latency Efficiency Data Engineering

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

Meson was based on a single leader architecture with high availability. It serves thousands of users, including data scientists, data engineers, machine learning engineers, software engineers, content producers, and business analysts, for various use cases. Figure 1 shows the high-level architecture.

Java

Java Scalability Traffic Architecture

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Inconsistent network performance affecting data synchronization. Introduce scalable microservices architectures to distribute computational loads efficiently. Key issues include: A shortage of edge-native data engineers and architects. Limited understanding of edge-specific use cases among traditional IT teams.

IoT

IoT Energy Logistics Latency

Experimentation is a major focus of Data Science across Netflix

The Netflix TechBlog

JANUARY 11, 2022

To learn about Analytics and Viz Engineering, have a look at Analytics at Netflix: Who We Are and What We Do by Molly Jackman & Meghana Reddy and How Our Paths Brought Us to Data and Netflix by Julie Beckley & Chris Pham. Curious to learn about what it’s like to be a Data Engineer at Netflix?

Innovation

Innovation Metrics Engineering Testing

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

These challenges are currently addressed in suboptimal and less cost efficient ways by individual local teams to fulfill the needs, such as Lookback: This is a generic and simple approach that data engineers use to solve the data accuracy problem. Users configure the workflow to read the data in a window (e.g.

Processing

Processing Big Data Efficiency Engineering

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

Summary Providing Network Insight into the Cloud Network Infrastructure using VPC Flow Logs at hyper scale is made possible with the Sqooby architecture. After several iterations of this architecture and some tuning, Sqooby has proven to be able to scale.

Network

Network Tuning AWS Traffic

Microservices Adoption in 2020

O'Reilly

JULY 15, 2020

Software engineers comprise the survey audience’s single largest cluster, over one quarter (27%) of respondents (Figure 1). If you combine the different architectural roles—i.e., Adding architects and engineers, we see that roughly 55% of the respondents are directly involved in software development.

Database

Database Architecture Education Systems

AI meets operations

O'Reilly

FEBRUARY 2, 2020

Collaboration between AI developers and operations teams will lead to growing pains on both sides, especially since many data scientists and AI researchers have had limited exposure to, or knowledge of, software engineering. O’Reilly Strata Data & AI Conference , San Jose, March 15-18.

Software Architecture

Software Architecture Monitoring Software Engineering Code

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

T riplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Learn to balance architecture trade-offs and design scalable enterprise-level software. Make your job search O (1), not O ( n ). Apply here. Need excellent people? Advertise your job here!

Education

Education Software Engineering Scalability Engineering

Data Ingestion: The First Step Towards a Flawless Data Pipeline

Simform

JANUARY 8, 2023

Data ingestion is the foremost layer in a data engineering pipeline, acting as a vital pillar in the overall analytics architecture. Thus, it is essential to implement data ingestion just right. Here is everything you need to know to take the first step toward a flawless data pipeline.

Data Engineering

Data Engineering Analytics Architecture Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

T riplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Learn to balance architecture trade-offs and design scalable enterprise-level software. Make your job search O (1), not O ( n ). Apply here. Need excellent people? Advertise your job here!

Education

Education Software Engineering Scalability Engineering

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Evolving to Auto Remediation: Service Architecture Methodology To address the above-mentioned challenges, our basic methodology is to integrate the rule-based classifier with an ML service to generate recommendations, and use a configuration service to apply the recommendations automatically: Generating recommendations.

Tuning

Tuning Efficiency Big Data Engineering

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

All Things Distributed

NOVEMBER 15, 2016

The reality is that many traditional BI solutions are built on top of legacy desktop and on-premises architectures that are decades old. They require teams of data engineers to spend months building complex data models and synthesizing the data before they can generate their first report.

Analytics

Analytics Availability Media Social Media

What is a Data Pipeline: Types, Architecture, Use Cases & more

Simform

JANUARY 25, 2023

Businesses can unlock the value of data only after it is transformed into actionable insights and when those insights are delivered promptly. But implementing such robust data pipelines can be complex and challenging. This blog discusses all the ins and outs of building data pipelines and how they can help strengthen businesses.

Architecture

Architecture Data Engineering Engineering

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A common theme across all these trends is to remove the complexity by simplifying data management as a whole. In 2018, we anticipate that ETL will either lose relevance or the ETL process will disintegrate and be consumed by new data architectures. Unified data management architecture.

Big Data

Big Data Artificial Intelligence Storage Hardware

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

Adrian Cockcroft

FEBRUARY 13, 2023

STP213 Scaling global carbon footprint management — Blake Blackwell Persefoni Manager Data Engineering and Michael Floyd AWS Head of Sustainability Solutions. SUS302 Optimizing architectures for sustainability — Katja Philipp AWS SA and Szymon Kochanski AWS SA. SUS209 — there was no talk with this code.

AWS

AWS Energy Architecture Programming

How Uber Sped Up SQL-based Data Analytics with Presto and Express Queries

InfoQ

NOVEMBER 18, 2024

Uber uses Presto, an open-source distributed SQL query engine, to provide analytics across several data sources, including Apache Hive, Apache Pinot, MySQL, and Apache Kafka. To improve its performance, Uber engineers explored the advantages of dealing with quick queries, a.k.a.

Analytics

Analytics Open Source Engineering Performance

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

Unfortunately, building data pipelines remains a daunting, time-consuming, and costly activity. Not everyone is operating at Netflix or Spotify scale data engineering function. Often companies underestimate the necessary effort and cost involved to build and maintain data pipelines.

Latency

Latency Analytics Scalability Engineering

Organise your engineering teams around the work by reteaming

Abhishek Tiwari

JULY 20, 2019

The engineering organisation described may not work for you because of a team of 8-10 people is still a very big overhead. In this model, software architecture and code ownership is a reflection of the organisational model. Thirdly, let engineers themselves choose the delivery teams and organise them around the initiative.

Engineering

Engineering Retail Airlines Healthcare

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next.

Big Data

Big Data Storage Benchmarking Hardware

Presentation: Azure Cosmos DB: Low Latency and High Availability at Planet Scale

InfoQ

JULY 14, 2023

Mei-Chin Tsai, Vinod discuss the internal architecture of Azure Cosmos DB and how it achieves high availability, low latency, and scalability. By Mei-Chin Tsai, Vinod Sridharan

Latency

Latency Azure Availability Scalability

Stream Processing: How it Works, Use Cases & Popular Frameworks

Simform

FEBRUARY 14, 2023

Stream processing has become a core part of enterprise data architecture today due to the explosive growth of data from sources such as IoT sensors, security logs, and web applications. This blog discusses the topic of stream processing in detail to help you navigate its landscape with ease.

Processing

Processing IoT Architecture Data Engineering

Symphonia at Velocity 2018, and more Serverless Insights

The Symphonia

JUNE 19, 2018

I was fortunate to be both presenting a 2-day workshop (on AWS Serverless Architectures and Continuous Deployment) as well as hosting a full-day Serverless track of talks. One of the catalysts for starting Symphonia was the massive interest in my article on “ Serverless Architectures ” that is published on Martin Fowler’s site.

Serverless

Serverless AWS DevOps Open Source

Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs

InfoQ

DECEMBER 29, 2023

Zendesk reduced its data storage costs by over 80% by migrating from DynamoDB to a tiered storage solution using MySQL and S3. The company considered different storage technologies and decided to combine the relational database and the object store to strike a balance between querybility and scalability while keeping the costs down.

Storage

Storage Scalability Database Technology

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

InfoQ

APRIL 15, 2024

He specifically delved into Venice DB, the NoSQL data store used for feature persistence. At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. By Rafal Gancarz

Artificial Intelligence

Artificial Intelligence Big Data Data Engineering Latency

A case for ELT

Abhishek Tiwari

DECEMBER 22, 2017

Cheap storage and on-demand compute in the cloud coupled with the emergence of new big data frameworks and tools are forcing us to rethink the whole ETL and data warehousing architecture. In addition, this approach is more tailored for both structured as well unstructured data sets. Classic ETL. Stateless and elastic.

Big Data

Big Data Retail Storage Google

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Uber Engineering

AUGUST 14, 2019

Maintaining Uber’s large-scale data warehouse comes with an operational cost in terms of ETL functions and storage. Once identified, … The post Less is More: Engineering Data Warehouse Efficiency with Minimalist Design appeared first on Uber Engineering Blog.

Efficiency

Efficiency Engineering Design Storage

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

InfoQ

AUGUST 7, 2024

Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.

AWS

AWS Analytics Performance Data Engineering

How machine learning is accelerating data integration?

Abhishek Tiwari

DECEMBER 23, 2017

Data integration generally requires in-depth domain knowledge, a strong understanding of data schemas and underlying relationships. This can be time-consuming and bit challenging if you are dealing with hundreds of data sources and thousands of event types (see my recent article on ELT architecture ).

Architecture

Architecture Engineering Processing Code

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

InfoQ

NOVEMBER 29, 2023

HubSpot adopted routing messages over multiple Kafka topics (called swimlanes) for the same producer to avoid the build-up in the consumer group lag and prioritize the processing of real-time traffic.

Processing

Processing Traffic Data Engineering Scalability

InfoQ Dev Summit in Boston: Two Days of Talks for Senior Developers

InfoQ

DECEMBER 20, 2023

InfoQ is delighted to announce a new two-day conference, InfoQ Dev Summit Boston 2024, taking place June 24-25, 2024. This event is designed to help senior developers navigate their immediate development challenges, focusing exclusively on the technical aspects that matter right now. By Artenisa Chatziou

Development

Development Design Data Engineering Scalability

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

InfoQ

JULY 3, 2023

LinkedIn introduced Couchbase as a centralized caching tier for scaling member profile reads to handle increasing traffic that has outgrown their existing database cluster. The new solution achieved over 99% hit rate, helped reduce tail latencies by more than 60% and costs by 10% annually. By Rafal Gancarz

Cache

Cache Latency Traffic Database

A Recap of the Data Engineering Open Forum at Netflix

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Trending Sources

Data Engineers of Netflix?—?Interview with Samuel Setegne

Introducing Impressions at Netflix

Reimagining Experimentation Analysis at Netflix

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

5 key areas for tech leaders to watch in 2020

Your technology architecture and engineering organization should coevolve as your startup grows

What is IT automation?

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Optimizing data warehouse storage

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Experimentation is a major focus of Data Science across Netflix

Incremental Processing using Netflix Maestro and Apache Iceberg

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Microservices Adoption in 2020

AI meets operations

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Data Ingestion: The First Step Towards a Flawless Data Pipeline

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

What is a Data Pipeline: Types, Architecture, Use Cases & more

5 data integration trends that will define the future of ETL in 2018

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

How Uber Sped Up SQL-based Data Analytics with Presto and Express Queries

Friends don't let friends build data pipelines

Organise your engineering teams around the work by reteaming

Kubernetes for Big Data Workloads

Presentation: Azure Cosmos DB: Low Latency and High Availability at Planet Scale

Stream Processing: How it Works, Use Cases & Popular Frameworks

Symphonia at Velocity 2018, and more Serverless Insights

Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

A case for ELT

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

How machine learning is accelerating data integration?

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

InfoQ Dev Summit in Boston: Two Days of Talks for Senior Developers

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

Stay Connected