Architecture, Data Engineering and Engineering - Technology Performance Pulse

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions.

Data Engineering

Data Engineering Engineering Processing Games

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Every image you hover over isnt just a visual placeholder; its a critical data point that fuels our sophisticated personalization engine. Architecture Overview The first pivotal step in managing impressions begins with the creation of a Source-of-Truth (SOT) dataset.

Tuning

Tuning Latency Efficiency Storage

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Now, imagine yourself in the role of a software engineer responsible for a micro-service which publishes data consumed by few critical customer facing services (e.g. You are about to make structural changes to the data and want to know who and what downstream to your service will be impacted.

Infrastructure

Infrastructure Big Data Transportation Architecture

Your technology architecture and engineering organization should coevolve as your startup grows

Abhishek Tiwari

FEBRUARY 26, 2020

The evolution of your technology architecture should depend on the size, culture, and skill set of your engineering organization. There are no hard-and-fast rules to figure out interdependency between technology architecture and engineering organization but below is what I think can really work well for product startup.

Technology

Technology Technology Architecture Engineering

5 key areas for tech leaders to watch in 2020

O'Reilly

FEBRUARY 18, 2020

This year’s growth in Python usage was buoyed by its increasing popularity among data scientists and machine learning (ML) and artificial intelligence (AI) engineers. Software architecture, infrastructure, and operations are each changing rapidly. Python libraries are no less useful for manipulating or engineering data, too.).

Software Architecture

Software Architecture DevOps Data Engineering Architecture

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

Netflix’s engineering culture is predicated on Freedom & Responsibility, the idea that everyone (and every team) at Netflix is entrusted with a core responsibility and they are free to operate with freedom to satisfy their mission. All these micro-services are currently operated in AWS cloud infrastructure.

Infrastructure

Infrastructure Cloud Scalability AWS

What is IT automation?

Dynatrace

JULY 6, 2022

As organizations continue to adopt multicloud strategies, the complexity of these environments grows, increasing the need to automate cloud engineering operations to ensure organizations can enforce their policies and architecture principles. How organizations benefit from automating IT practices.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

Meson was based on a single leader architecture with high availability. Usability Netflix is a data-driven company, where key decisions are driven by data insights, from the pixel color used on the landing page to the renewal of a TV-series. Figure 1 shows the high-level architecture.

Java

Java Scalability Traffic Architecture

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Uber Engineering

AUGUST 14, 2019

Once identified, … The post Less is More: Engineering Data Warehouse Efficiency with Minimalist Design appeared first on Uber Engineering Blog. In our experience, optimizing for operational efficiency requires answering one key question: for which tables does the maintenance cost supersede utility?

Efficiency

Efficiency Engineering Design Storage

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

It also improves the engineering productivity by simplifying the existing pipelines and unlocking the new patterns. Users configure the workflow to read the data in a window (e.g. The window is set based on users’ domain knowledge so that users have a high confidence that the late arriving data will be included or will not matter (i.e.

Processing

Processing Big Data Efficiency Engineering

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture. Some of the optimizations are prerequisites for a high-performance data warehouse.

Storage

Storage Latency Efficiency Data Engineering

How Uber Sped Up SQL-based Data Analytics with Presto and Express Queries

InfoQ

NOVEMBER 18, 2024

Uber uses Presto, an open-source distributed SQL query engine, to provide analytics across several data sources, including Apache Hive, Apache Pinot, MySQL, and Apache Kafka. To improve its performance, Uber engineers explored the advantages of dealing with quick queries, a.k.a.

Analytics

Analytics Open Source Engineering Performance

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Inconsistent network performance affecting data synchronization. Introduce scalable microservices architectures to distribute computational loads efficiently. Key issues include: A shortage of edge-native data engineers and architects. Limited understanding of edge-specific use cases among traditional IT teams.

IoT

IoT Energy Logistics Latency

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

The rule-based classifier classifies job errors based on a set of predefined rules and provides insights for schedulers to decide whether to retry the job and for engineers to diagnose and remediate the job failure. Rule Execution Engine is responsible for matching the collected logs against a set of predefined rules.

Tuning

Tuning Efficiency Big Data Engineering

Organise your engineering teams around the work by reteaming

Abhishek Tiwari

JULY 20, 2019

When it comes to organising engineering teams, a popular view has been to organise your teams based on either Spotify's agile model (i.e. One thing stand-out to me is being intentional and practical about your engineering organisation design. squads, chapters, tribes, and guilds) or simply follow Amazon's two-pizza team model.

Engineering

Engineering Retail Airlines Healthcare

Microservices Adoption in 2020

O'Reilly

JULY 15, 2020

Software engineers comprise the survey audience’s single largest cluster, over one quarter (27%) of respondents (Figure 1). If you combine the different architectural roles—i.e., Adding architects and engineers, we see that roughly 55% of the respondents are directly involved in software development. Figure 1: Respondent roles.

Database

Database Architecture Education Systems

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

As with any sustainable engineering design, focusing on simplicity is very important. Summary Providing Network Insight into the Cloud Network Infrastructure using VPC Flow Logs at hyper scale is made possible with the Sqooby architecture.

Network

Network Tuning AWS Traffic

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Scalability Engineering

The death of Agile?

O'Reilly

MARCH 2, 2020

This year’s growth in Python usage was buoyed by its increasing popularity among data scientists and machine learning (ML) and artificial intelligence (AI) engineers. Software architecture, infrastructure, and operations are each changing rapidly. Key survey results: The C-suite is engaged with data quality.

Artificial Intelligence

Artificial Intelligence Software Architecture Programming C++

AI meets operations

O'Reilly

FEBRUARY 2, 2020

Collaboration between AI developers and operations teams will lead to growing pains on both sides, especially since many data scientists and AI researchers have had limited exposure to, or knowledge of, software engineering. O’Reilly Strata Data & AI Conference , San Jose, March 15-18.

Software Architecture

Software Architecture Monitoring Software Engineering Code

Data Ingestion: The First Step Towards a Flawless Data Pipeline

Simform

JANUARY 8, 2023

Data ingestion is the foremost layer in a data engineering pipeline, acting as a vital pillar in the overall analytics architecture. Thus, it is essential to implement data ingestion just right. Here is everything you need to know to take the first step toward a flawless data pipeline.

Data Engineering

Data Engineering Analytics Architecture Engineering

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

In recent times, in order to gain valuable insights or to develop the data-driven products companies such as Netflix, Spotify, Uber, AirBnB have built internal data pipelines. If built correctly, data pipelines can offer strategic advantages to the business. It can be used to power new analytics, insight, and product features.

Latency

Latency Analytics Scalability Engineering

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

Adrian Cockcroft

FEBRUARY 13, 2023

SUS206 Sustainability and AWS silicon — Kamran Khan AWS Senior Product Manager Inferential/Trainium/FPGA, David Chaiken Pinterest Chief Architect, and Paul Mazurkiewicz AWS Senior Principal Engineer. SUS302 Optimizing architectures for sustainability — Katja Philipp AWS SA and Szymon Kochanski AWS SA.

AWS

AWS Energy Architecture Programming

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A common theme across all these trends is to remove the complexity by simplifying data management as a whole. In 2018, we anticipate that ETL will either lose relevance or the ETL process will disintegrate and be consumed by new data architectures. Unified data management architecture.

Big Data

Big Data Artificial Intelligence Storage Hardware

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next.

Big Data

Big Data Storage Benchmarking Hardware

What is a Data Pipeline: Types, Architecture, Use Cases & more

Simform

JANUARY 25, 2023

Businesses can unlock the value of data only after it is transformed into actionable insights and when those insights are delivered promptly. But implementing such robust data pipelines can be complex and challenging. This blog discusses all the ins and outs of building data pipelines and how they can help strengthen businesses.

Architecture

Architecture Data Engineering Engineering

Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs

InfoQ

DECEMBER 29, 2023

Zendesk reduced its data storage costs by over 80% by migrating from DynamoDB to a tiered storage solution using MySQL and S3. The company considered different storage technologies and decided to combine the relational database and the object store to strike a balance between querybility and scalability while keeping the costs down.

Storage

Storage Scalability Database Technology

Google Announces the General Availability of A2 Virtual Machines

InfoQ

APRIL 7, 2021

Recently, Google announced A2 Virtual Machines (VMs)' general availability based on the NVIDIA Ampere A100 Tensor Core GPUs in Compute Engine.

Virtualization

Virtualization Google Availability Engineering

Symphonia at Velocity 2018, and more Serverless Insights

The Symphonia

JUNE 19, 2018

I was fortunate to be both presenting a 2-day workshop (on AWS Serverless Architectures and Continuous Deployment) as well as hosting a full-day Serverless track of talks. One of the catalysts for starting Symphonia was the massive interest in my article on “ Serverless Architectures ” that is published on Martin Fowler’s site.

Serverless

Serverless AWS DevOps Open Source

Presentation: Azure Cosmos DB: Low Latency and High Availability at Planet Scale

InfoQ

JULY 14, 2023

Mei-Chin Tsai, Vinod discuss the internal architecture of Azure Cosmos DB and how it achieves high availability, low latency, and scalability. By Mei-Chin Tsai, Vinod Sridharan

Latency

Latency Azure Availability Scalability

Stream Processing: How it Works, Use Cases & Popular Frameworks

Simform

FEBRUARY 14, 2023

Stream processing has become a core part of enterprise data architecture today due to the explosive growth of data from sources such as IoT sensors, security logs, and web applications. This blog discusses the topic of stream processing in detail to help you navigate its landscape with ease.

Processing

Processing IoT Architecture Data Engineering

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

InfoQ

APRIL 15, 2024

He specifically delved into Venice DB, the NoSQL data store used for feature persistence. At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. By Rafal Gancarz

Artificial Intelligence

Artificial Intelligence Big Data Data Engineering Latency

How machine learning is accelerating data integration?

Abhishek Tiwari

DECEMBER 23, 2017

Data integration generally requires in-depth domain knowledge, a strong understanding of data schemas and underlying relationships. This can be time-consuming and bit challenging if you are dealing with hundreds of data sources and thousands of event types (see my recent article on ELT architecture ).

Architecture

Architecture Engineering Processing Code

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

InfoQ

AUGUST 7, 2024

Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.

AWS

AWS Analytics Performance Data Engineering

A case for ELT

Abhishek Tiwari

DECEMBER 22, 2017

Cheap storage and on-demand compute in the cloud coupled with the emergence of new big data frameworks and tools are forcing us to rethink the whole ETL and data warehousing architecture. In addition, this approach is more tailored for both structured as well unstructured data sets. Classic ETL. Stateless and elastic.

Big Data

Big Data Retail Storage Google

InfoQ Dev Summit in Boston: Two Days of Talks for Senior Developers

InfoQ

DECEMBER 20, 2023

InfoQ is delighted to announce a new two-day conference, InfoQ Dev Summit Boston 2024, taking place June 24-25, 2024. This event is designed to help senior developers navigate their immediate development challenges, focusing exclusively on the technical aspects that matter right now. By Artenisa Chatziou

Development

Development Design Data Engineering Scalability

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

InfoQ

NOVEMBER 29, 2023

HubSpot adopted routing messages over multiple Kafka topics (called swimlanes) for the same producer to avoid the build-up in the consumer group lag and prioritize the processing of real-time traffic.

Processing

Processing Traffic Data Engineering Scalability

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

InfoQ

JULY 3, 2023

LinkedIn introduced Couchbase as a centralized caching tier for scaling member profile reads to handle increasing traffic that has outgrown their existing database cluster. The new solution achieved over 99% hit rate, helped reduce tail latencies by more than 60% and costs by 10% annually. By Rafal Gancarz

Cache

Cache Latency Traffic Database

AWS Launches General Availability of Amazon EC2 P5 Instances for AI/ML and HPC Workloads

InfoQ

AUGUST 3, 2023

AWS recently announced the general availability (GA) of Amazon EC2 P5 instances powered by the latest NVIDIA H100 Tensor Core GPUs suitable for users that require high performance and scalability in AI/ML and HPC workloads. The GA is a follow-up to the earlier announcement of the development of the infrastructure. By Steef-Jan Wiggers

AWS

AWS Availability Scalability Infrastructure

A Recap of the Data Engineering Open Forum at Netflix

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Trending Sources

Data Engineers of Netflix?—?Interview with Samuel Setegne

Introducing Impressions at Netflix

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Your technology architecture and engineering organization should coevolve as your startup grows

5 key areas for tech leaders to watch in 2020

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

What is IT automation?

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Incremental Processing using Netflix Maestro and Apache Iceberg

Optimizing data warehouse storage

How Uber Sped Up SQL-based Data Analytics with Presto and Express Queries

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Organise your engineering teams around the work by reteaming

Microservices Adoption in 2020

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

The death of Agile?

AI meets operations

Data Ingestion: The First Step Towards a Flawless Data Pipeline

Friends don't let friends build data pipelines

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

5 data integration trends that will define the future of ETL in 2018

Kubernetes for Big Data Workloads

What is a Data Pipeline: Types, Architecture, Use Cases & more

Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs

Google Announces the General Availability of A2 Virtual Machines

Symphonia at Velocity 2018, and more Serverless Insights

Presentation: Azure Cosmos DB: Low Latency and High Availability at Planet Scale

Stream Processing: How it Works, Use Cases & Popular Frameworks

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

How machine learning is accelerating data integration?

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

A case for ELT

InfoQ Dev Summit in Boston: Two Days of Talks for Senior Developers

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

AWS Launches General Availability of Amazon EC2 P5 Instances for AI/ML and HPC Workloads

Stay Connected