Architecture, Data Engineering and Event - Technology Performance Pulse

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions. It also becomes inefficient as the data scale increases.

Data Engineering

Data Engineering Engineering Processing Games

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Architecture Overview The first pivotal step in managing impressions begins with the creation of a Source-of-Truth (SOT) dataset. Collecting Raw Impression Events As Netflix members explore our platform, their interactions with the user interface spark a vast array of raw events.

Tuning

Tuning Latency Efficiency Storage

What is IT automation?

Dynatrace

JULY 6, 2022

At its most basic, automating IT processes works by executing scripts or procedures either on a schedule or in response to particular events, such as checking a file into a code repository. When monitoring tools release a stream of alerts, teams can easily identify which ones are false and assess whether an event requires human intervention.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

We adopted the following mission statement to guide our investments: “Provide a complete and accurate data lineage system enabling decision-makers to win moments of truth.” Nonetheless, Netflix data landscape (see below) is complex and many teams collaborate effectively for sharing the responsibility of our data system management.

Infrastructure

Infrastructure Big Data Transportation Architecture

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

While our engineering teams have and continue to build solutions to lighten this cognitive load (better guardrails, improved tooling, …), data and its derived products are critical elements to understanding, optimizing and abstracting our infrastructure. Give us a holler if you are interested in a thought exchange.

Infrastructure

Infrastructure Cloud Scalability AWS

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture. Some of the optimizations are prerequisites for a high-performance data warehouse. Both automatic (event-driven) as well as manual (ad-hoc) optimization.

Storage

Storage Latency Efficiency Data Engineering

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

Meson was based on a single leader architecture with high availability. It serves thousands of users, including data scientists, data engineers, machine learning engineers, software engineers, content producers, and business analysts, for various use cases. Figure 1 shows the high-level architecture.

Java

Java Scalability Traffic Architecture

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

It is easier to tune a large Spark job for a consistent volume of data. As you may know, S3 can emit messages when events (such as a file creation events) occur which can be directed into an AWS SQS queue. These events represent a specific cut of data from the table.

Network

Network Tuning AWS Traffic

The death of Agile?

O'Reilly

MARCH 2, 2020

This year’s growth in Python usage was buoyed by its increasing popularity among data scientists and machine learning (ML) and artificial intelligence (AI) engineers. Software architecture, infrastructure, and operations are each changing rapidly. Key survey results: The C-suite is engaged with data quality.

Artificial Intelligence

Artificial Intelligence Software Architecture Programming C++

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

These challenges are currently addressed in suboptimal and less cost efficient ways by individual local teams to fulfill the needs, such as Lookback: This is a generic and simple approach that data engineers use to solve the data accuracy problem. Users configure the workflow to read the data in a window (e.g.

Processing

Processing Big Data Efficiency Engineering

AI meets operations

O'Reilly

FEBRUARY 2, 2020

The creation and management of data pipelines isn’t something that operations groups are responsible for–though, despite the proliferation of new titles like “data engineer” and “data ops,” in the future I suspect these jobs will be subsumed into “operations.”. Upcoming events.

Software Architecture

Software Architecture Monitoring Software Engineering Code

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

InfoQ

AUGUST 7, 2024

Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.

AWS

AWS Analytics Performance Data Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

T riplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Learn to balance architecture trade-offs and design scalable enterprise-level software. Make your job search O (1), not O ( n ). Apply here. Need excellent people? Advertise your job here!

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

T riplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Learn to balance architecture trade-offs and design scalable enterprise-level software. Make your job search O (1), not O ( n ). Apply here. Need excellent people? Advertise your job here!

Education

Education Software Engineering Scalability Engineering

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A common theme across all these trends is to remove the complexity by simplifying data management as a whole. In 2018, we anticipate that ETL will either lose relevance or the ETL process will disintegrate and be consumed by new data architectures. Unified data management architecture.

Big Data

Big Data Artificial Intelligence Storage Hardware

InfoQ Dev Summit in Boston: Two Days of Talks for Senior Developers

InfoQ

DECEMBER 20, 2023

This event is designed to help senior developers navigate their immediate development challenges, focusing exclusively on the technical aspects that matter right now. InfoQ is delighted to announce a new two-day conference, InfoQ Dev Summit Boston 2024, taking place June 24-25, 2024. By Artenisa Chatziou

Development

Development Design Data Engineering Scalability

Organise your engineering teams around the work by reteaming

Abhishek Tiwari

JULY 20, 2019

The engineering organisation described may not work for you because of a team of 8-10 people is still a very big overhead. In this model, software architecture and code ownership is a reflection of the organisational model. Thirdly, let engineers themselves choose the delivery teams and organise them around the initiative.

Engineering

Engineering Retail Airlines Healthcare

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next.

Big Data

Big Data Storage Benchmarking Hardware

Symphonia at Velocity 2018, and more Serverless Insights

The Symphonia

JUNE 19, 2018

This summer also marks the 4-yearly event that is La Copa Mundial (we only get Telemundo in my apartment, not Fox Sports Network) but since the good old US of A are absent from the men’s World Cup this year, football fever is distinctly frigid. It’s a great event full of deep technology experience, and a whole breadth of diversity.

Serverless

Serverless AWS DevOps Open Source

A case for ELT

Abhishek Tiwari

DECEMBER 22, 2017

Cheap storage and on-demand compute in the cloud coupled with the emergence of new big data frameworks and tools are forcing us to rethink the whole ETL and data warehousing architecture. In addition, this approach is more tailored for both structured as well unstructured data sets. Classic ETL. Late transformation.

Big Data

Big Data Retail Storage Google

How machine learning is accelerating data integration?

Abhishek Tiwari

DECEMBER 23, 2017

Data integration generally requires in-depth domain knowledge, a strong understanding of data schemas and underlying relationships. This can be time-consuming and bit challenging if you are dealing with hundreds of data sources and thousands of event types (see my recent article on ELT architecture ).

Architecture

Architecture Engineering Processing Code

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

InfoQ

NOVEMBER 29, 2023

HubSpot adopted routing messages over multiple Kafka topics (called swimlanes) for the same producer to avoid the build-up in the consumer group lag and prioritize the processing of real-time traffic.

Processing

Processing Traffic Data Engineering Scalability

Experimentation is a major focus of Data Science across Netflix

The Netflix TechBlog

JANUARY 11, 2022

To learn about Analytics and Viz Engineering, have a look at Analytics at Netflix: Who We Are and What We Do by Molly Jackman & Meghana Reddy and How Our Paths Brought Us to Data and Netflix by Julie Beckley & Chris Pham. Curious to learn about what it’s like to be a Data Engineer at Netflix?

Innovation

Innovation Metrics Engineering Testing

Technology Performance Pulse

1. Streamlining Membership Data Engineering at Netflix with Psyberg

A Recap of the Data Engineering Open Forum at Netflix

Trending Sources

Introducing Impressions at Netflix

What is IT automation?

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Optimizing data warehouse storage

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The death of Agile?

Incremental Processing using Netflix Maestro and Apache Iceberg

AI meets operations

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

5 data integration trends that will define the future of ETL in 2018

InfoQ Dev Summit in Boston: Two Days of Talks for Senior Developers

Organise your engineering teams around the work by reteaming

Kubernetes for Big Data Workloads

Symphonia at Velocity 2018, and more Serverless Insights

A case for ELT

How machine learning is accelerating data integration?

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

Experimentation is a major focus of Data Science across Netflix

Stay Connected