Infrastructure, Latency and Traffic - Technology Performance Pulse

Behind the Streams: Live at Netflix. Part 1

The Netflix TechBlog

JULY 15, 2025

Load-Balancing Netflix Traffic at Global Scale This means that we had a lot to build in order to make Live work well on Netflix. While UDP-based protocols can provide additional features like ultra-low latency, HTTPS has ubiquitous support among devices and compatibility with delivery and encoding systems.

Entertainment

Entertainment Traffic AWS Latency

Investigation of a Workbench UI Latency Issue

The Netflix TechBlog

OCTOBER 14, 2024

Using this approach, we observed latencies ranging from 1 to 10 seconds, averaging 7.4 However, when we captured packets on the ZeroMQ socket while reproducing the issue, we didn’t observe heavy traffic on this socket that could cause such blocking. Meanwhile, traffic from other ports, such as port 22 for SSH, remained unaffected.

Latency

Latency Virtualization Traffic Processing

Driving Content Delivery Efficiency Through Classifying Cache Misses

The Netflix TechBlog

JULY 2, 2025

Our ability to efficiently localize traffic, known as Content Delivery Efficiency, is a critical component of Open Connect’s service. Specifically, we classify the causes of traffic not being served from local servers, a phenomenon that we refer to as cache misses. and assesses its ability to serve additional traffic.

Cache

Cache Efficiency Traffic Latency

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Its partitioned log architecture supports both queuing and publish-subscribe models, allowing it to handle large-scale event processing with minimal latency. Apache Kafka uses a custom TCP/IP protocol for high throughput and low latency. However, performance can decline under high traffic conditions.

Latency

Latency Analytics Architecture Storage

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

The Challenge of Title Launch Observability As engineers, were wired to track system metrics like error rates, latencies, and CPU utilizationbut what about metrics that matter to a titlessuccess? This approach provides a few advantages: Low burden on existing systems: Log processing imposes minimal changes to existing infrastructure.

Traffic

Traffic Scalability Strategy Monitoring

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Optimizing RabbitMQ performance through strategies such as keeping queues short, enabling lazy queues, and monitoring health checks is essential for maintaining system efficiency and effectively managing high traffic loads.

Best Practices

Best Practices Traffic Strategy Scalability

Citus for PostgreSQL: How to Scale Your Database Horizontally

Scalegrid

JULY 25, 2025

However, as application data grows exponentially and user traffic scales to millions, traditional PostgreSQL deployments begin to encounter performance and scalability constraints. More users, more data, and more traffic no longer necessitate database overhauls—just the addition of new nodes.

Database

Database Azure Analytics Open Source

How Synthetic Monitoring Can Warm Up Your CDN (and Why It Matters)

Dotcom-Montior

JULY 26, 2025

For organizations operating at global scale, Content Delivery Networks (CDNs) have become indispensable infrastructure for delivering fast, reliable user experiences. When functioning optimally, a CDN serves content from the edge server closest to the requesting user, dramatically reducing network latency and improving page load times.

Monitoring

Monitoring Cache Strategy Metrics

Dotcom-Monitor’s Role in Ensuring SLA Compliance

Dotcom-Montior

FEBRUARY 28, 2025

Whether you’re running a high-traffic e-commerce website, a critical business application, or an API serving millions of requests, your customers rely on the promises made in your SLA. Also, Dotcom-Monitor’s scalable infrastructure accommodates growing businesses by supporting extensive endpoint monitoring.

Website

Website Monitoring Mobile Website Performance

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

Now let’s look at how we designed the tracing infrastructure that powers Edgar. If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls.

Infrastructure

Infrastructure Transportation Storage Open Source

Why Replace External Database Caches?

DZone

AUGUST 28, 2024

Putting an external cache in front of the database is commonly used to compensate for subpar latency stemming from various factors, such as inefficient database internals, driver usage, infrastructure choices, traffic spikes, and so on. This is a clear performance-oriented decision.

Cache

Cache Database Latency Traffic

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

The control group’s traffic utilized the legacy Falcor stack, while the experiment population leveraged the new GraphQL client and was directed to the GraphQL Shim. To determine customer impact, we could compare various metrics such as error rates, latencies, and time to render.

Traffic

Traffic Latency Cache Metrics

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Dynatrace

JUNE 25, 2020

The network latency between cluster nodes should be around 10 ms or less. Minimized cross-data center network traffic. For Premium HA, this has been extended from 10 ms latency (in the same network region) to around 100 ms network latency due to asynchronous data replication between regions.

Availability

Availability Hardware Latency Traffic

What are quality gates? How to use quality gates to deliver better software at speed and scale

Dynatrace

FEBRUARY 21, 2024

To remain competitive in today’s fast-paced market, organizations must not only ensure that their digital infrastructure is functioning optimally but also that software deployments and updates are delivered rapidly and consistently. In this example, unlike latency, the remaining three signals did not receive a “pass.”

Speed

Speed Software Software Latency

Noisy Neighbor Detection with eBPF

The Netflix TechBlog

SEPTEMBER 10, 2024

The first step is determining whether the problem originates from the application or the underlying infrastructure. Learn how Linux kernel instrumentation can improve your infrastructure observability with deeper insights and enhanced monitoring. We then calculate the run queue latency by simply subtracting the timestamps.

Latency

Latency Metrics Programming Monitoring

Implementing service-level objectives to improve software quality

Dynatrace

DECEMBER 27, 2022

SLOs can be a great way for DevOps and infrastructure teams to use data and performance expectations to make decisions, such as whether to release and where engineers should focus their time. Latency is the time that it takes a request to be served. SLOs aid decision making. SLOs promote automation. Define SLOs for each service.

Software

Software Software Benchmarking Latency

Service level objectives: 5 SLOs to get started

Dynatrace

JUNE 1, 2023

Note : you might hear the term latency used instead of response time. Both latency and response time are critical to ensure reliability. Latency typically refers to the time it takes for a single request to travel from its source to its destination. Latency primarily focuses on the time spent in transit.

Website

Website Latency Traffic Virtualization

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

NOVEMBER 2, 2020

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. CRITICAL : This traffic affects the ability to play.

Traffic

Traffic Metrics Infrastructure Architecture

The Power of Caching: Boosting API Performance and Scalability

DZone

AUGUST 16, 2023

Benefits of Caching Improved performance: Caching eliminates the need to retrieve data from the original source every time, resulting in faster response times and reduced latency. Reduced server load: By serving cached content, the load on the server is reduced, allowing it to handle more requests and improving overall scalability.

Cache

Cache Scalability Performance Latency

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Vidhya Arvind , Rajasekhar Ummadisetty , Joey Lynch , Vinay Chella Introduction At Netflix our ability to deliver seamless, high-quality, streaming experiences to millions of users hinges on robust, global backend infrastructure. It also serves as central configuration of access patterns such as consistency or latency targets.

Latency

Latency Storage Cache Servers

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

SEPTEMBER 8, 2020

The big difference from the monolith, though, is that this is now a standalone service deployed as a separate “application” (service) in our cloud infrastructure. In this step, a pipeline picks our candidate change, deploys the service, makes it publicly discoverable, and redirects a small percentage of production traffic to this new service.

Latency

Latency Cache Java Traffic

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

Generally speaking, cloud migration involves moving from on-premises infrastructure to cloud-based services. In cloud computing environments, infrastructure and services are maintained by the cloud vendor, allowing you to focus on how best to serve your customers. However, it can also mean migrating from one cloud to another.

Cloud

Cloud Traffic Best Practices Hardware

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

How site reliability engineering affects organizations’ bottom line SRE applies the disciplines of software engineering to infrastructure management, both on-premises and in the cloud. There are now many more applications, tools, and infrastructure variables that impact an application’s performance and availability.

Best Practices

Best Practices DevOps Latency Metrics

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Dynatrace

APRIL 25, 2023

For example, to handle traffic spikes and pay only for what they use. These functions are executed by a serverless platform or provider (such as AWS Lambda, Azure Functions or Google Cloud Functions) that manages the underlying infrastructure, scaling and billing. Scale automatically based on the demand and traffic patterns.

Serverless

Serverless Lambda Azure AWS

Event-Based Autoscaling: Ensuring Smooth Operations on Your Peak Days

DZone

JANUARY 21, 2024

These organizations face a common challenge – how much infrastructure do they need to ensure optimal performance without overprovisioning – which can become very costly, very quickly. This incident serves as a stark illustration of insufficient infrastructure planning during a critical event.

Retail

Retail Games Latency Traffic

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

Scalegrid

APRIL 16, 2020

Each of these models is suitable for production deployments and high traffic applications, and are available for all of our supported databases, including MySQL , PostgreSQL , Redis™ and MongoDB® database ( Greenplum® database coming soon). Are you comfortable setting up your own cloud infrastructure through AWS or Azure? Expert Tip.

Cloud

Cloud Azure AWS Database

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

Automated observability, security, and reliability at scale

Dynatrace

JULY 18, 2023

While infrastructure has historically been treated as a bottleneck where proper scaling and compute power are applied to improve performance, these aspects are now typically addressed by hyperscalers that offer cloud-based infrastructure and infrastructure as a service.

Best Practices

Best Practices Code Infrastructure Software

Service level objective examples: 5 SLO examples for faster, more reliable apps

Dynatrace

JUNE 1, 2023

Note : you might hear the term latency used instead of response time. Both latency and response time are critical to ensure reliability. Latency typically refers to the time it takes for a single request to travel from its source to its destination. Latency primarily focuses on the time spent in transit.

Website

Website Traffic Latency Virtualization

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix TechBlog

JANUARY 10, 2024

This architecture shift greatly reduced the processing latency and increased system resiliency. We expanded pipeline support to serve our studio/content-development use cases, which had different latency and resiliency requirements as compared to the traditional streaming use case. divide the input video into small chunks 2.

Processing

Processing Media Latency Innovation

Towards a Reliable Device Management Platform

The Netflix TechBlog

AUGUST 30, 2021

The Partner Infrastructure team at Netflix provides solutions to support these two significant efforts by enabling device management at scale. Together, they form the Device Management Platform, which is the infrastructural foundation for Netflix Test Studio (NTS). million elements.

Latency

Latency Traffic Transportation Cloud

Scale up your Dynatrace Managed software-intelligence deployment with self-healing insights

Dynatrace

JUNE 8, 2020

As a software intelligence platform, Dynatrace is woven into the fabric of your business systems, actively managing and providing self-healing capabilities for all aspects of your applications and vital infrastructure. Metrics are provided for general host info like CPU usage and memory consumption, OneAgent traffic, and network latency.

Software

Software Software Programming Metrics

Taming DORA compliance with AI, observability, and security

Dynatrace

AUGUST 27, 2024

Delivering financial services requires a complex landscape of applications, hybrid cloud infrastructure, and third-party vendors. It detects regressions and deviations from previously observed behavior, including latency, traffic, error rates, saturation, security coverage, vulnerability risk levels, and memory consumption.

Best Practices

Best Practices Government DevOps Analytics

How digital experience monitoring helps deliver business observability

Dynatrace

APRIL 26, 2022

Gartner estimates that by 2025, 70% of digital business initiatives will require infrastructure and operations (I&O) leaders to include digital experience metrics in their business reporting. With DEM solutions, organizations can operate over on-premise network infrastructure or private or public cloud SaaS or IaaS offerings.

Monitoring

Monitoring Social Media IoT Metrics

Telltale: Netflix Application Monitoring Simplified

The Netflix TechBlog

AUGUST 13, 2020

We also highlight interesting broader events such as regional traffic evacuations and nearby deployments , information that is vital to understanding health holistically. Regional traffic evacuations. Infrastructure change events. Especially during an incident. That is our Telltale vision. Mantis real-time streaming data.

Monitoring

Monitoring Tuning Traffic Metrics

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Berg , Romain Cledat , Kayla Seeley , Shashank Srikanth , Chaoying Wang , Darin Yu Netflix uses data science and machine learning across all facets of the company, powering a wide range of business applications from our internal infrastructure and content demand modeling to media understanding.

Systems

Systems Media Cache Open Source

Achieving observability in async workflows

The Netflix TechBlog

MAY 14, 2021

Prodicle Distribution Our service is required to be elastic and handle bursty traffic. We are expected to process 1,000 watermarks for a single distribution in a minute, with non-linear latency growth as the number of watermarks increases. Things got hairy. We wanted a scalable service that was near real-time, 2.

Traffic

Traffic Java Latency Google

Expanding the Cloud ? introducing the Asia Pacific (Sydney) Region.

All Things Distributed

NOVEMBER 12, 2012

This new Asia Pacific (Sydney) Region has been highly requested by companies worldwide, and it provides low latency access to AWS services for those who target customers in Australia and New Zealand. You can learn more about our growing global infrastructure footprint at [link]. blog comments powered by Disqus.

Cloud

Cloud AWS Ecommerce Latency

Välkommen till Stockholm – An AWS Region is coming to the Nordics

All Things Distributed

APRIL 4, 2017

The new region will give Nordic-based businesses, government organisations, non-profits, and global companies with customers in the Nordics, the ability to leverage the AWS technology infrastructure from data centers in Sweden. They migrated their IT infrastructure, including mission-critical payments platforms, to AWS in just six weeks.

AWS

AWS Airlines Latency Games

Ciao Milano! – An AWS Region is coming to Italy!

All Things Distributed

NOVEMBER 13, 2018

Currently we have 57 Availability Zones across 19 technology infrastructure Regions. Lamborghini, the world-famous manufacturer of elite, luxury sports cars based in Italy, has been using AWS to reduce the cost of their infrastructure by 50 percent, while also achieving better performance and scalability. million unique visits.

AWS

AWS Energy Automotive Traffic

Expanding the Cloud – An AWS Region is coming to Hong Kong

All Things Distributed

JUNE 20, 2017

This enables customers to serve content to their end users with low latency, giving them the best application experience. In 2008, AWS opened a point of presence (PoP) in Hong Kong to enable customers to serve content to their end users with low latency. Since then, AWS has added two more PoPs in Hong Kong, the latest in 2016.

AWS

AWS Logistics Cloud Social Media

DevOps automation: From event-driven automation to answer-driven automation [with causal AI]

Dynatrace

JULY 24, 2023

When a server experiences an outage, the system promptly triggers an alert and initiates actions like restarting a server or redirecting traffic to a redundant server. Change impact analysis is an indispensable process for effectively managing changes within an organization’s infrastructure and applications.

DevOps

DevOps Traffic Efficiency Servers

Behind the Streams: Live at Netflix. Part 1

Investigation of a Workbench UI Latency Issue

Trending Sources

Driving Content Delivery Efficiency Through Classifying Cache Misses

RabbitMQ vs. Kafka: Key Differences

Title Launch Observability at Netflix Scale

Best Practices for Scaling RabbitMQ

Citus for PostgreSQL: How to Scale Your Database Horizontally

Top Database Performance Issues and Solutions

How Synthetic Monitoring Can Warm Up Your CDN (and Why It Matters)

Dotcom-Monitor’s Role in Ensuring SLA Compliance

Building Netflix’s Distributed Tracing Infrastructure

Why Replace External Database Caches?

Migrating Netflix to GraphQL Safely

Introducing Netflix TimeSeries Data Abstraction Layer

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

What are quality gates? How to use quality gates to deliver better software at speed and scale

Noisy Neighbor Detection with eBPF

Implementing service-level objectives to improve software quality

Service level objectives: 5 SLOs to get started

Keeping Netflix Reliable Using Prioritized Load Shedding

The Power of Caching: Boosting API Performance and Scalability

Introducing Netflix’s Key-Value Data Abstraction Layer

Seamlessly Swapping the API backend of the Netflix Android app

What is cloud migration?

Site reliability done right: 5 SRE best practices that deliver on business objectives

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Event-Based Autoscaling: Ensuring Smooth Operations on Your Peak Days

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

Predictive CPU isolation of containers at Netflix

Automated observability, security, and reliability at scale

Service level objective examples: 5 SLO examples for faster, more reliable apps

Rebuilding Netflix Video Processing Pipeline with Microservices

Towards a Reliable Device Management Platform

Scale up your Dynatrace Managed software-intelligence deployment with self-healing insights

Taming DORA compliance with AI, observability, and security

How digital experience monitoring helps deliver business observability

Telltale: Netflix Application Monitoring Simplified

Supporting Diverse ML Systems at Netflix

Achieving observability in async workflows

Expanding the Cloud ? introducing the Asia Pacific (Sydney) Region.

Välkommen till Stockholm – An AWS Region is coming to the Nordics

Ciao Milano! – An AWS Region is coming to Italy!

Expanding the Cloud – An AWS Region is coming to Hong Kong

DevOps automation: From event-driven automation to answer-driven automation [with causal AI]

Stay Connected