Architecture, Data Engineering and Tuning - Technology Performance Pulse

Architecture

Data Engineering

Tuning

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Architecture Overview The first pivotal step in managing impressions begins with the creation of a Source-of-Truth (SOT) dataset. The enriched data is seamlessly accessible for both real-time applications via Kafka and historical analysis through storage in an Apache Iceberg table.

Tuning

Tuning Latency Efficiency Storage

Join 5,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

What is IT automation?

Dynatrace

JULY 6, 2022

As organizations continue to adopt multicloud strategies, the complexity of these environments grows, increasing the need to automate cloud engineering operations to ensure organizations can enforce their policies and architecture principles. By tuning workflows, you can increase their efficiency and effectiveness.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

We adopted the following mission statement to guide our investments: “Provide a complete and accurate data lineage system enabling decision-makers to win moments of truth.” Nonetheless, Netflix data landscape (see below) is complex and many teams collaborate effectively for sharing the responsibility of our data system management.

Infrastructure

Infrastructure Big Data Transportation Architecture

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture. Some of the optimizations are prerequisites for a high-performance data warehouse. Orient: Gather tuning parameters for a particular table that changed.

Storage

Storage Latency Efficiency Data Engineering

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. We have also noted a great potential for further improvement by model tuning (see the section of Rollout in Production).

Tuning

Tuning Efficiency Big Data Engineering

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

And in order to gain visibility into these logs, we need to somehow ingest and enrich this data. It is easier to tune a large Spark job for a consistent volume of data. In other words, we are able to ensure that our Spark app does not “eat” more data than it was tuned to handle. We named this library Sqooby.

Network

Network Tuning AWS Big Data

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

Meson was based on a single leader architecture with high availability. It serves thousands of users, including data scientists, data engineers, machine learning engineers, software engineers, content producers, and business analysts, for various use cases. Figure 1 shows the high-level architecture.

Java

Java Scalability Traffic Architecture

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

These challenges are currently addressed in suboptimal and less cost efficient ways by individual local teams to fulfill the needs, such as Lookback: This is a generic and simple approach that data engineers use to solve the data accuracy problem. Users configure the workflow to read the data in a window (e.g.

Processing

Processing Big Data Efficiency Engineering

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

Unfortunately, building data pipelines remains a daunting, time-consuming, and costly activity. Not everyone is operating at Netflix or Spotify scale data engineering function. Often companies underestimate the necessary effort and cost involved to build and maintain data pipelines.

Latency

Latency Analytics Scalability Engineering

Organise your engineering teams around the work by reteaming

Abhishek Tiwari

JULY 20, 2019

The engineering organisation described may not work for you because of a team of 8-10 people is still a very big overhead. In this model, software architecture and code ownership is a reflection of the organisational model. Because you are changing team composition, you need robust norms of conduct and engineering practices in place.

Engineering

Engineering Retail Airlines Healthcare

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

InfoQ

APRIL 15, 2024

He specifically delved into Venice DB, the NoSQL data store used for feature persistence. At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. By Rafal Gancarz

Big Data

Big Data Artificial Intelligence Data Engineering Latency

A Recap of the Data Engineering Open Forum at Netflix

Introducing Impressions at Netflix

Trending Sources

What is IT automation?

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Optimizing data warehouse storage

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Incremental Processing using Netflix Maestro and Apache Iceberg

Friends don't let friends build data pipelines

Organise your engineering teams around the work by reteaming

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

Stay Connected