Architecture, Latency and Traffic - Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. This approach has a handful of benefits.

Traffic

Traffic Latency Tuning Systems

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic

Traffic Metrics Systems Strategy

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

This article outlines the key differences in architecture, performance, and use cases to help determine the best fit for your workload. RabbitMQ follows a message broker model with advanced routing, while Kafkas event streaming architecture uses partitioned logs for distributed processing. What is RabbitMQ? What is Apache Kafka?

Latency

Latency Analytics Architecture Storage

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Optimizing RabbitMQ performance through strategies such as keeping queues short, enabling lazy queues, and monitoring health checks is essential for maintaining system efficiency and effectively managing high traffic loads.

Best Practices

Best Practices Traffic Strategy Scalability

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Architecture Overview The first pivotal step in managing impressions begins with the creation of a Source-of-Truth (SOT) dataset. Impression Source-of-Truth architecture Ensuring High Quality Impressions Maintaining the highest quality of impressions is a top priority.

Tuning

Tuning Latency Efficiency Storage

Why Replace External Database Caches?

DZone

AUGUST 28, 2024

Putting an external cache in front of the database is commonly used to compensate for subpar latency stemming from various factors, such as inefficient database internals, driver usage, infrastructure choices, traffic spikes, and so on. This is a clear performance-oriented decision.

Cache

Cache Database Latency Traffic

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

Motivation With the rapid growth in Netflix member base and the increasing complexity of our systems, our architecture has evolved into an asynchronous one that enables both online and offline computation. This helps limit the outgoing traffic footprint considerably.

Systems

Systems Traffic Architecture Mobile

Implementing service-level objectives to improve software quality

Dynatrace

DECEMBER 27, 2022

As more organizations embrace microservices-based architecture to deliver goods and services digitally, maintaining customer satisfaction has become exponentially more challenging. First, it helps to understand that applications and all the services and infrastructure that support them generate telemetry data based on traffic from real users.

Software

Software Software Benchmarking Latency

Lessons learned from enterprise service-level objective management

Dynatrace

MAY 19, 2022

Example 1: Architecture boundaries. First, they took a big step back and looked at their end-to-end architecture (Figure 2). SLO dashboard defined by architectural boundary. In their new dashboard, they added dimensions for load, latency, and open problems for each component. Not all attempts succeed on the first try.

Automotive

Automotive Latency Architecture Mobile

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

NOVEMBER 2, 2020

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. Logs and background requests are examples of this type of traffic.

Traffic

Traffic Metrics Infrastructure Architecture

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

The original assumptions and architectural choices were no longer viable. Overview The figure below depicts a simplified high-level architecture of a single Titus cluster (a.k.a With traffic growth, a single leader node handling all request volume started becoming overloaded.

Cache

Cache Latency Traffic Systems

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination. Data Model At its core, the KV abstraction is built around a two-level map architecture. Useful for keeping “n-newest” or prefix path deletion.

Latency

Latency Storage Cache Servers

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

SEPTEMBER 8, 2020

We tried a few iterations of what this new service should look like, and eventually settled on a modern architecture that aimed to give more control of the API experience to the client teams. For us, it means that we now need to have ~15 MDN tabs open when writing routes :) Let’s briefly discuss the architecture of this microservice.

Latency

Latency Cache Java Traffic

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber Engineering

OCTOBER 17, 2018

To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks … The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Big Data

Big Data Latency Transportation Traffic

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

Microservices-based architectures and software containers enable organizations to deploy and modify applications with unprecedented speed. At the lowest level, SLIs provide a view of service availability, latency, performance, and capacity across systems. However, cloud complexity has made software delivery challenging.

Best Practices

Best Practices DevOps Latency Metrics

Bending pause times to your will with Generational ZGC

The Netflix TechBlog

MARCH 5, 2024

Reduced tail latencies In both our GRPC and DGS Framework services, GC pauses are a significant source of tail latencies. Each of these errors is a canceled request resulting in a retry so this reduction further reduces overall service traffic by this rate: Errors rates per second. There is no best garbage collector.

Latency

Latency Java Tuning Efficiency

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix TechBlog

JANUARY 10, 2024

This architecture shift greatly reduced the processing latency and increased system resiliency. We expanded pipeline support to serve our studio/content-development use cases, which had different latency and resiliency requirements as compared to the traditional streaming use case. divide the input video into small chunks 2.

Processing

Processing Media Latency Innovation

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

In case of a spike in traffic, you can automatically spin up more resources, often in a matter of seconds. Likewise, you can scale down when your application experiences decreased traffic. For example, as traffic increases, costs will too. This can dramatically decrease network latency and its effect on the end-user experience.

Cloud

Cloud Traffic Best Practices Hardware

Zero Configuration Service Mesh with On-Demand Cluster Discovery

The Netflix TechBlog

AUGUST 29, 2023

In order for a service to talk to another, it needs to know two things: the name of the destination service, and whether or not the traffic should be secure. In this architecture, service to service communication no longer goes through the single point of failure of a load balancer.

Traffic

Traffic Latency Cloud C++

Towards a Reliable Device Management Platform

The Netflix TechBlog

AUGUST 30, 2021

System Setup Architecture The following diagram summarizes the architecture description: Figure 1: Event-sourcing architecture of the Device Management Platform. As such, we can see that the traffic load on the Device Management Platform’s control plane is very dynamic over time. million elements.

Latency

Latency Traffic Transportation Cloud

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

Azure Traffic Manager. Azure Front Door enables you to define, manage, and monitor the global routing for your web traffic by optimizing for best performance and quick global failover for high availability. Azure Batch. Azure DB for MariaDB. Azure DB for MySQL. Azure DB for PostgreSQL. Azure SQL Managed Instance. Azure HDInsight.

Azure

Azure Cloud Big Data Virtualization

So many bad takes?—?What is there to learn from the Prime Video microservices to monolith story

Adrian Cockcroft

MAY 6, 2023

Then they tried to scale it to cope with high traffic and discovered that some of the state transitions in their step functions were too frequent, and they had some overly chatty calls between AWS lambda functions and S3. They state in the blog that this was quick to build, which is the point.

Serverless

Serverless Lambda Best Practices Traffic

Achieving observability in async workflows

The Netflix TechBlog

MAY 14, 2021

Managing and operating asynchronous workflows can be difficult without the proper tools and architecture that puts observability, debugging, and tracing at the forefront. Prodicle Distribution Our service is required to be elastic and handle bursty traffic. Written by Colby Callahan , Megha Manohara , and Mike Azar. Things got hairy.

Traffic

Traffic Java Latency Google

Telltale: Netflix Application Monitoring Simplified

The Netflix TechBlog

AUGUST 13, 2020

We also highlight interesting broader events such as regional traffic evacuations and nearby deployments , information that is vital to understanding health holistically. Regional traffic evacuations. For example, a latency increase is less critical than error rate increase and some error codes are less critical than others.

Monitoring

Monitoring Tuning Traffic Metrics

Automated observability, security, and reliability at scale

Dynatrace

JULY 18, 2023

This is especially crucial in microservice architectures, where the number of components can be overwhelming. As software development grows more complex, managing components using an automated onboarding process becomes increasingly important. Proper notifications or escalations are automated based on ownership information.

Best Practices

Best Practices Code Infrastructure Latency

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

s web-based applications often encounter database scaling challenges when faced with growth in users, traffic, and data. Behind the scenes, Amazon DynamoDB automatically spreads the data and traffic for a table over a sufficient number of servers to meet the request capacity specified by the customer. Consistency. SimpleDBâ??s

Scalability

Scalability Database Ecommerce Latency

Understanding operational 5G: a first measurement study on its coverage, performance and energy consumption

The Morning Paper

OCTOBER 4, 2020

We are standing on the eve of the 5G era… 5G, as a monumental shift in cellular communication technology, holds tremendous potential for spurring innovations across many vertical industries, with its promised multi-Gbps speed, sub-10 ms low latency, and massive connectivity. Throughput and latency. energy consumption).

Energy

Energy Latency Performance Network

Netflix Video Quality at Scale with Cosmos Microservices

The Netflix TechBlog

NOVEMBER 2, 2021

The Reloaded system is a well-matured and scalable system, but its monolithic architecture can slow down rapid innovation. This enables us to use our scale to increase throughput and reduce latencies. Here, based on the video length, the throughput and latency requirements, available scale etc., via bug fixes).

Media

Media Innovation Metrics Latency

Towards a Unified Theory of Web Performance

Alex Russell

FEBRUARY 28, 2022

Here are two renderings of the same Gmail inbox in different architectural styles: one based on Ajax, and the other on "basic" HTML : The Ajax version of Gmail loads 4.8MiB of resources, including 3.8MiB of JavaScript to load an inbox containing two messages. Today's web architecture debates (e.g.

Performance

Performance Latency Architecture Network

Comparisons of Proxies for MySQL

Percona

MARCH 20, 2023

When designing an architecture, many components need to be considered before deciding on the best solution. Let us take a look also the latency: Here the situation starts to be a little bit more complicated. MySQL Router is the one that has the higher latency no matter what. That allows it to go a bit further. and ProxySQL 6.6k.

Games

Games Latency Traffic Cache

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

JULY 26, 2021

Netflix is known for its loosely coupled microservice architecture and with a global studio footprint, surfacing and connecting the data from microservices into a studio data catalog in real time has become more important than ever. Most of the business views created on top of the Iceberg tables can tolerate a few minutes of latency.

Big Data

Big Data Government Processing Analytics

Latency vs. Throughput: Navigating the Digital Highway

VoltDB

FEBRUARY 29, 2024

In this fast-paced ecosystem, two vital elements determine the efficiency of this traffic: latency and throughput. LATENCY: THE WAITING GAME Latency is like the time you spend waiting in line at your local coffee shop. All these moments combined represent latency – the time it takes for your order to reach your hands.

Latency

Latency Games Traffic Network

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

Nonetheless, we found a number of limitations that could not satisfy our requirements e.g. stalling the processing of log events until a dump is complete, missing ability to trigger dumps on demand, or implementations that block write traffic by using table locks. Blocking write traffic by locking tables. DBLog High Level Architecture.

Database

Database Traffic Transportation Open Source

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

Nonetheless, we found a number of limitations that could not satisfy our requirements e.g. stalling the processing of log events until a dump is complete, missing ability to trigger dumps on demand, or implementations that block write traffic by using table locks. Blocking write traffic by locking tables. DBLog High Level Architecture.

Database

Database Traffic Transportation Open Source

Datadog Creates Scalable Data Ingestion Architecture

InfoQ

JUNE 16, 2023

Datadog created a dedicated data ingestion architecture offering exactly-once semantics for their third-generation event store, Husky. The event-driven architecture (EDA) can accommodate bursts in traffic in the multi-tenant platform with reasonable ingestion latency and acceptable operational costs. By Rafal Gancarz

Architecture

Architecture Scalability Latency Traffic

Expanding the Cloud: Enabling Globally Distributed Applications and Disaster Recovery

All Things Distributed

NOVEMBER 26, 2013

Cross Region Read Replicas also enable you to serve read traffic for your global customer base from regions that are nearest to them. Cross Region Read Replicas also make it even easier for our global customers to scale database deployments to meet the performance demands of high-traffic, globally disperse applications.

Cloud

Cloud AWS Traffic Latency

Growth Engineering at Netflix?—?Automated Imagery Generation

The Netflix TechBlog

FEBRUARY 9, 2021

Server-generated assets, since client-side generation would require the retrieval of many individual images, which would increase latency and time-to-render. To reduce latency, assets should be generated in an offline fashion and not in real time. Here’s what the final architecture looked like.

Engineering

Engineering Storage Latency Entertainment

Handling user-initiated actions in an asynchronous, message-based architecture

O'Reilly Software

DECEMBER 11, 2017

A message-based microservices architecture offers many advantages, making solutions easier to scale and expand with new services. The asynchronous nature of interservice interactions inherent to this architecture, however, poses challenges for user-initiated actions such as create-read-update-delete (CRUD) requests on an object.

Architecture

Architecture Government Latency Efficiency

Most Common RabbitMQ Use Cases

Scalegrid

AUGUST 27, 2024

Wondering where RabbitMQ fits into your architecture? They utilize a routing key mechanism that ensures precise navigation paths for message traffic. Microservices Communication In the context of a microservices architecture that demands scalability and loose coupling among services, RabbitMQ serves as a critical component.

Ecommerce

Ecommerce IoT Games Scalability

Optimizing CDN Architecture: Enhancing Performance and User Experience

IO River

NOVEMBER 2, 2023

CDNs cache content on edge servers distributed globally, reducing the distance between users and the content they want.‍CDNs use load-balancing techniques to distribute incoming traffic across multiple servers called Points of Presence (PoPs) which distribute content closer to end-users and improve overall performance.

Architecture

Architecture Cache Performance Latency

Optimizing CDN Architecture: Enhancing Performance and User Experience

IO River

NOVEMBER 2, 2023

CDNs use load-balancing techniques to distribute incoming traffic across multiple servers called Points of Presence (PoPs) which distribute content closer to end-users and improve overall performance.Â â€What is CDN Architecture?â€CDN â€CDN architecture serves as a blueprint or plan that guides the distribution of CDN provider PoPs.

Architecture

Architecture Cache Performance Latency

Looking back at 10 years of compartmentalization at AWS

All Things Distributed

MARCH 26, 2018

A concept that has changed infrastructure architecture is now at the core of both AWS and customer reliability and operations. " Silo your traffic or not – you choose. When your architecture does stay within an Availability Zone as much as possible, there are more benefits.

AWS

AWS Latency Lambda Architecture

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Trending Sources

RabbitMQ vs. Kafka: Key Differences

Best Practices for Scaling RabbitMQ

Introducing Impressions at Netflix

Why Replace External Database Caches?

Rapid Event Notification System at Netflix

Implementing service-level objectives to improve software quality

Lessons learned from enterprise service-level objective management

Keeping Netflix Reliable Using Prioritized Load Shedding

Consistent caching mechanism in Titus Gateway

Introducing Netflix TimeSeries Data Abstraction Layer

Introducing Netflix’s Key-Value Data Abstraction Layer

Seamlessly Swapping the API backend of the Netflix Android app

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Site reliability done right: 5 SRE best practices that deliver on business objectives

Bending pause times to your will with Generational ZGC

Rebuilding Netflix Video Processing Pipeline with Microservices

Predictive CPU isolation of containers at Netflix

What is cloud migration?

Zero Configuration Service Mesh with On-Demand Cluster Discovery

Towards a Reliable Device Management Platform

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

So many bad takes?—?What is there to learn from the Prime Video microservices to monolith story

Achieving observability in async workflows

Telltale: Netflix Application Monitoring Simplified

Automated observability, security, and reliability at scale

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

Understanding operational 5G: a first measurement study on its coverage, performance and energy consumption

Netflix Video Quality at Scale with Cosmos Microservices

Towards a Unified Theory of Web Performance

Comparisons of Proxies for MySQL

Data Movement in Netflix Studio via Data Mesh

Latency vs. Throughput: Navigating the Digital Highway

DBLog: A Generic Change-Data-Capture Framework

DBLog: A Generic Change-Data-Capture Framework

Datadog Creates Scalable Data Ingestion Architecture

Expanding the Cloud: Enabling Globally Distributed Applications and Disaster Recovery

Growth Engineering at Netflix?—?Automated Imagery Generation

Handling user-initiated actions in an asynchronous, message-based architecture

Most Common RabbitMQ Use Cases

Optimizing CDN Architecture: Enhancing Performance and User Experience

Optimizing CDN Architecture: Enhancing Performance and User Experience

Looking back at 10 years of compartmentalization at AWS

Stay Connected