Architecture, Design and Latency - Technology Performance Pulse

Designing Instagram

High Scalability

JANUARY 11, 2022

Design a photo-sharing platform similar to Instagram where users can upload their photos and share it with their followers. High Level Design. Architecture. Component Design. API Design. We have provided the API design of posting an image on Instagram below. API Design. Problem Statement. Data Models.

Design

Design Media Storage Logistics

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

RabbitMQ is designed for flexible routing and message reliability, while Kafka handles high-throughput event streaming and real-time data processing. This article outlines the key differences in architecture, performance, and use cases to help determine the best fit for your workload. What is RabbitMQ? What is Apache Kafka?

Latency

Latency Analytics Architecture Storage

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

Architectural Insights: Designing Efficient Multi-Layered Caching With Instagram Example

DZone

FEBRUARY 27, 2024

Leveraging this hierarchical structure can significantly reduce latency and improve overall performance.

Cache

Cache Efficiency Architecture Design

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

This scenario underscored the need for a new recommender system architecture where member preference learning is centralized, enhancing accessibility and utility across different models. Yet, many are confined to a brief temporal window due to constraints in serving latency or training costs.

Tuning

Tuning Efficiency Latency Strategy

Spring WebFlux: publishOn vs subscribeOn for Improving Microservices Performance

DZone

SEPTEMBER 23, 2024

With the rise of microservices architecture , there has been a rapid acceleration in the modernization of legacy platforms, leveraging cloud infrastructure to deliver highly scalable, low-latency, and more responsive services. Traditional blocking architectures often struggle to keep up performance, especially under high load.

Performance

Performance Latency Architecture Programming

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

The Netflix TechBlog

SEPTEMBER 29, 2022

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support for Non-Parallelizable Workloads by Kostas Christidis Introduction Timestone is a high-throughput, low-latency priority queueing system we built in-house to support the needs of Cosmos , our media encoding platform. Over the past 2.5

Latency

Latency Systems Media Serverless

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

This decoupling is crucial in modern architectures where scalability and fault tolerance are paramount. The architecture of RabbitMQ is meticulously designed for complex message routing, enabling dynamic and flexible interactions between producers and consumers.

Best Practices

Best Practices Traffic Strategy Scalability

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

The Netflix TechBlog

SEPTEMBER 3, 2021

Remote calls are never free; they impose extra latency, increase probability of an error, and consume network bandwidth. How can we achieve a similar functionality when designing our gRPC APIs? When we process a request it is often beneficial to know which fields the caller is interested in and which ones they ignore.

Design

Design Java Code Servers

Scalable Annotation Service?—?Marken

The Netflix TechBlog

JANUARY 25, 2023

The service should be able to serve real-time, aka UI, applications so CRUD and search operations should be achieved with low latency. Our service will be used by a lot of internal UI applications hence the latency for CRUD and search operations must be low. Search latency for the generic text queries are in milliseconds.

Scalability

Scalability Latency Media Architecture

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination. Data Model At its core, the KV abstraction is built around a two-level map architecture.

Latency

Latency Storage Cache Servers

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

Motivation With the rapid growth in Netflix member base and the increasing complexity of our systems, our architecture has evolved into an asynchronous one that enables both online and offline computation. Architecture As shown in the diagram above, the RENO service can be broken down into the following components.

Systems

Systems Traffic Architecture Mobile

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing One approach to such a challenging scenario is stream processing, a computing paradigm and software architectural style for data-intensive software systems that emerged to cope with requirements for near real-time processing of massive amounts of data. We designed experimental scenarios inspired by chaos engineering.

Engineering

Engineering Tuning Latency Open Source

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

SEPTEMBER 24, 2021

Table 1: Movie and File Size Examples Initial Architecture A simplified view of our initial cloud video processing pipeline is illustrated in the following diagram. Figure 1: A Simplified Video Processing Pipeline With this architecture, chunk encoding is very efficient and processed in distributed cloud computing instances.

Cloud

Cloud Media Storage Cache

For your eyes only: improving Netflix video quality with neural networks

The Netflix TechBlog

NOVEMBER 17, 2022

Our approach to NN-based video downscaling The deep downscaler is a neural network architecture designed to improve the end-to-end video quality by learning a higher-quality video downscaler. We employed an adaptive network design that is applicable to the wide variety of resolutions we use for encoding.

Network

Network Media Innovation Efficiency

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

What is observability? Not just logs, metrics and traces

Dynatrace

OCTOBER 1, 2021

As dynamic systems architectures increase in complexity and scale, IT teams face mounting pressure to track and respond to conditions and issues across their multi-cloud environments. The architects and developers who create the software must design it to be observed. Dynatrace news. Benefits of observability.

Metrics

Metrics Open Source Monitoring Cloud

Dynatrace accelerates business transformation with new AI observability solution

Dynatrace

JANUARY 31, 2024

Retrieval-augmented generation emerges as the standard architecture for LLM-based applications Given that LLMs can generate factually incorrect or nonsensical responses, retrieval-augmented generation (RAG) has emerged as an industry standard for building GenAI applications.

Cache

Cache Azure Infrastructure Monitoring

Data ingestion pipeline with Operation Management

The Netflix TechBlog

MARCH 7, 2023

We designed a unique concept called Annotation Operations which allows teams to create data pipelines and easily write annotations without worrying about access patterns of their data from different applications. But we cannot search or present low latency retrievals from files Etc.

Media

Media Latency Architecture Database

Allegro Reduces Kafka Producer Latency Outliers by 82% After Switching to XFS

InfoQ

APRIL 26, 2024

Allegro experimented with different performance optimization options to improve Apache Kafka producer tail latency and eventually switched all its clusters to the XFS filesystem. The company used Kafka protocol sniffing, JVM profiling, and eBPF, which proved instrumental in identifying and eliminating performance bottlenecks.

Latency

Latency Performance Tuning Scalability

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

The original assumptions and architectural choices were no longer viable. Overview The figure below depicts a simplified high-level architecture of a single Titus cluster (a.k.a We started seeing increased response latencies and leader servers running at dangerously high utilization.

Cache

Cache Latency Traffic Systems

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

SEPTEMBER 8, 2020

We tried a few iterations of what this new service should look like, and eventually settled on a modern architecture that aimed to give more control of the API experience to the client teams. For us, it means that we now need to have ~15 MDN tabs open when writing routes :) Let’s briefly discuss the architecture of this microservice.

Latency

Latency Cache Java Traffic

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix TechBlog

JANUARY 10, 2024

This architecture shift greatly reduced the processing latency and increased system resiliency. We expanded pipeline support to serve our studio/content-development use cases, which had different latency and resiliency requirements as compared to the traditional streaming use case.

Processing

Processing Media Latency Innovation

The Netflix Cosmos Platform

The Netflix TechBlog

MARCH 1, 2021

It supports both high throughput services that consume hundreds of thousands of CPUs at a time, and latency-sensitive workloads where humans are waiting for the results of a computation. The subsystems all communicate with each other asynchronously via Timestone, a high-scale, low-latency priority queuing system.

Serverless

Serverless Media Latency Social Media

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

What is AWS Lambda?

Dynatrace

APRIL 5, 2021

Lambda’s highly efficient, on-demand computing environment aligns with today’s microservices-centric architectures, and readily integrates with other popular AWS offerings that an organization may already be using. AWS continues to improve how it handles latency issues. It helps SRE teams automate responses.

Lambda

Lambda AWS Serverless Hardware

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

While data lakes and data warehousing architectures are commonly used modes for storing and analyzing data, a data lakehouse is an efficient third way to store and analyze data that unifies the two architectures while preserving the benefits of both. Data lakehouses deliver the query response with minimal latency.

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Today is a very exciting day as we release Amazon DynamoDB , a fast, highly reliable and cost-effective NoSQL database service designed for internet scale applications. Amazon DynamoDB offers low, predictable latencies at any scale. Comments ().

Scalability

Scalability Database Ecommerce Latency

Zero Configuration Service Mesh with On-Demand Cluster Discovery

The Netflix TechBlog

AUGUST 29, 2023

Today we have a wealth of tools, both OSS and commercial, all designed for cloud-native environments. To improve availability, we designed systems where components could fail separately and avoid single points of failure. There is a downside to fetching this data on-demand: this adds latency to the first request to a cluster.

Traffic

Traffic Latency Cloud C++

Edge Authentication and Token-Agnostic Identity Propagation

The Netflix TechBlog

FEBRUARY 9, 2021

Plus, the architecture of the Edge tier was evolving to a PaaS (platform as a service) model, and we had some tough decisions to make about how, and where, to handle identity token handling. The system architecture now takes the form of: Notice that tokens never traverse past the Edge gateway / EAS boundary. We are serving over 2.5

Architecture

Architecture Latency Servers Website

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

At Netflix, we also heavily embrace a microservice architecture that emphasizes separation of concerns. The data warehouse is not designed to serve point requests from microservices with low latency. Therefore, we must efficiently move data from the data warehouse to a global, low-latency and highly-reliable key-value store.

Latency

Latency Storage Big Data Tuning

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

This is a set of best practices and guidelines that help you design and operate reliable, secure, efficient, cost-effective, and sustainable systems in the cloud. If you use AWS cloud services to build and run your applications, you may be familiar with the AWS Well-Architected framework.

AWS

AWS Efficiency Azure Cloud

LinkedIn Migrates Espresso to HTTP2 and Reduces Connections by 88% and Latency by 75%

InfoQ

DECEMBER 4, 2023

to HTTP2, resulting in a reduction in the number of connections, latency, and garbage collection times. LinkedIn was able to dramatically improve the scalability and performance of its Espresso database by migrating it from HTTP1.1 To achieve these gains, the team had to optimize the Netty’s default HTTP2 stack to make it fit their needs.

Latency

Latency Scalability Database Performance

Netflix Video Quality at Scale with Cosmos Microservices

The Netflix TechBlog

NOVEMBER 2, 2021

For example, when we design a new version of VMAF, we need to effectively roll it out throughout the entire Netflix catalog of movies and TV shows. This article explains how we designed microservices and workflows on top of the Cosmos platform to bolster such video quality innovations. via bug fixes). We call this system Cosmos.

Media

Media Innovation Metrics Latency

Towards a Reliable Device Management Platform

The Netflix TechBlog

AUGUST 30, 2021

System Setup Architecture The following diagram summarizes the architecture description: Figure 1: Event-sourcing architecture of the Device Management Platform. By the following morning, alerts were received regarding high memory consumption and GC latencies, to the point where the service was unresponsive to HTTP requests.

Latency

Latency Traffic Transportation Cloud

Under the Hood of Amazon EC2 Container Service

All Things Distributed

JULY 20, 2015

Today, I want to explore the Amazon ECS architecture and what this architecture enables. This architecture affords Amazon ECS high availability, low latency, and high throughput because the data store is never pessimistically locked. Below is a diagram of the basic components of Amazon ECS: How we coordinate the cluster.

Latency

Latency Architecture AWS Open Source

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture. These principles reduce resource usage by being more efficient and effective while lowering the end-to-end latency in data processing. Transparency to end-users.

Storage

Storage Latency Efficiency Data Engineering

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Increased latency during peak loads. Introduce scalable microservices architectures to distribute computational loads efficiently.

IoT

IoT Energy Logistics Latency

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

ITOps refers to the process of acquiring, designing, deploying, configuring, and maintaining equipment and services that support an organization’s desired business outcomes. This includes response time, accuracy, speed, throughput, uptime, CPU utilization, and latency. Performance. What does IT operations do?

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

What are SLOs? How service-level objectives work with SLIs to deliver on SLAs

Dynatrace

DECEMBER 2, 2021

As organizations adopt microservices-based architecture , service-level objectives (SLOs) have become a vital way for teams to set specific, measurable targets that ensure users are receiving agreed-upon service levels. You can set SLOs based on individual indicators, such as batch throughput, request latency, and failures-per-second.

Metrics

Metrics Best Practices DevOps Infrastructure

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

High Scalability

SEPTEMBER 8, 2018

RISELabs , those wonderfully innovative folks over at Berkeley, have uplifted their Anna datatabase —a shared-nothing, thread-per-core architecture to achieve lightning-fast speeds by avoiding all coordination mechanisms—to become cloud-aware. While database neogenesis has slowed down considerably, it has not gone necrotic.

Storage

Storage Performance AWS Cloud

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

By collecting and analyzing key performance metrics of the service over time, we can assess the impact of the new changes and determine if they meet the availability, latency, and performance requirements. A/B testing is also a key technique in migrations where the updates to the architecture involve changing device contracts as well.

Traffic

Traffic Metrics Systems Strategy

Netflix Drive

The Netflix TechBlog

MAY 5, 2021

In the future posts, we will do an architectural deep dive into the several components of Netflix Drive. Fig 3: POSIX interface of Netflix Drive The POSIX files and folders interface for Netflix Drive is designed as a layered system with the FUSE implementation hooks forming the top layer.

Media

Media Storage Architecture Cloud

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

Building general purpose architectures has always been hard; there are often so many conflicting requirements that you cannot derive an architecture that will serve all, so we have often ended up focusing on one side of the requirements that allow you to serve that area really well.

AWS

AWS Programming Latency Architecture

Designing Instagram

RabbitMQ vs. Kafka: Key Differences

Trending Sources

Netflix’s Distributed Counter Abstraction

Architectural Insights: Designing Efficient Multi-Layered Caching With Instagram Example

Foundation Model for Personalized Recommendation

Spring WebFlux: publishOn vs subscribeOn for Improving Microservices Performance

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

Best Practices for Scaling RabbitMQ

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

Scalable Annotation Service?—?Marken

Introducing Netflix’s Key-Value Data Abstraction Layer

Rapid Event Notification System at Netflix

Why applying chaos engineering to data-intensive applications matters

Netflix Cloud Packaging in the Terabyte Era

For your eyes only: improving Netflix video quality with neural networks

Introducing Netflix TimeSeries Data Abstraction Layer

What is observability? Not just logs, metrics and traces

Dynatrace accelerates business transformation with new AI observability solution

Data ingestion pipeline with Operation Management

Allegro Reduces Kafka Producer Latency Outliers by 82% After Switching to XFS

Consistent caching mechanism in Titus Gateway

Seamlessly Swapping the API backend of the Netflix Android app

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix Cosmos Platform

Predictive CPU isolation of containers at Netflix

What is AWS Lambda?

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

Zero Configuration Service Mesh with On-Demand Cluster Discovery

Edge Authentication and Token-Agnostic Identity Propagation

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Implementing AWS well-architected pillars with automated workflows

LinkedIn Migrates Espresso to HTTP2 and Reduces Connections by 88% and Latency by 75%

Netflix Video Quality at Scale with Cosmos Microservices

Towards a Reliable Device Management Platform

Under the Hood of Amazon EC2 Container Service

Optimizing data warehouse storage

These 7 Edge Data Challenges Will Test Companies the Most in 2025

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

What are SLOs? How service-level objectives work with SLIs to deliver on SLAs

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Netflix Drive

Amazon EC2 Cluster GPU Instances - All Things Distributed

Stay Connected