Latency, Presentation and Processing - Technology Performance Pulse

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

Optimising for High Latency Environments

CSS Wizardry

SEPTEMBER 16, 2024

This gives fascinating insights into the network topography of our visitors, and how much we might be impacted by high latency regions. Round-trip-time (RTT) is basically a measure of latency—how long did it take to get from one endpoint to another and back again? What is RTT? RTT isn’t a you-thing, it’s a them-thing.

Latency

Latency Cache Transportation Mobile

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. In this multi-part blog series, we take you behind the scenes of our system that processes billions of impressions daily.

Tuning

Tuning Latency Efficiency Storage

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix TechBlog

JANUARY 10, 2024

Future blogs will provide deeper dives into each service, sharing insights and lessons learned from this process. The Netflix video processing pipeline went live with the launch of our streaming service in 2007. The Netflix video processing pipeline went live with the launch of our streaming service in 2007.

Processing

Processing Media Latency Innovation

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

RabbitMQ is designed for flexible routing and message reliability, while Kafka handles high-throughput event streaming and real-time data processing. RabbitMQ follows a message broker model with advanced routing, while Kafkas event streaming architecture uses partitioned logs for distributed processing. What is Apache Kafka?

Latency

Latency Analytics Architecture Storage

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

The Challenge of Title Launch Observability As engineers, were wired to track system metrics like error rates, latencies, and CPU utilizationbut what about metrics that matter to a titlessuccess? Option 1: Log Processing Log processing offers a straightforward solution for monitoring and analyzing title launches.

Traffic

Traffic Scalability Strategy Monitoring

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

The Netflix TechBlog

SEPTEMBER 29, 2022

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support for Non-Parallelizable Workloads by Kostas Christidis Introduction Timestone is a high-throughput, low-latency priority queueing system we built in-house to support the needs of Cosmos , our media encoding platform. Over the past 2.5

Latency

Latency Systems Serverless Media

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing One approach to such a challenging scenario is stream processing, a computing paradigm and software architectural style for data-intensive software systems that emerged to cope with requirements for near real-time processing of massive amounts of data. This significantly increases event latency.

Engineering

Engineering Tuning Latency Open Source

The Three Cs: Concatenate, Compress, Cache

CSS Wizardry

OCTOBER 16, 2023

In this post, I’m going to break these processes down into each of: ? Plotted on the same horizontal axis of 1.6s, the waterfalls speak for themselves: 201ms of cumulative latency; 109ms of cumulative download. 4,362ms of cumulative latency; 240ms of cumulative download. Read the complete test methodology. It gets worse.

Cache

Cache Latency Strategy Speed

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

In the time since it was first presented as an advanced Mesos framework, Titus has transparently evolved from being built on top of Mesos to Kubernetes, handling an ever-increasing volume of containers. This blog post presents how our current iteration of Titus deals with high API call volumes by scaling out horizontally.

Cache

Cache Latency Traffic Systems

Edgar: Solving Mysteries Faster with Observability

The Netflix TechBlog

SEPTEMBER 2, 2020

Edgar helps Netflix teams troubleshoot distributed systems efficiently with the help of a summarized presentation of request tracing, logs, analysis, and metadata. When a problem occurs, we put on our detective hats and start our mystery-solving process by gathering evidence. by Elizabeth Carretto Everyone loves Unsolved Mysteries.

Latency

Latency Transportation Engineering Traffic

Nine ways technology executives can get significant business value with the right observability platform

Dynatrace

MAY 21, 2024

Observability data presents executives with new opportunities to achieve this, by creating incremental value for cloud modernization , improved business analytics , and enhanced customer experience. With the latest advances from Dynatrace, this process is instantaneous.

Technology

Technology Technology Analytics Storage

Build systems more reliably with Dynatrace: Chaos Engineering

Dynatrace

AUGUST 21, 2024

This approach enhances key DORA metrics and enables early detection of failures in the release process, allowing SREs more time for innovation. These releases often assumed ideal conditions such as zero latency, infinite bandwidth, and no network loss, as highlighted in Peter Deutsch’s eight fallacies of distributed systems.

Engineering

Engineering Systems Latency Metrics

The Netflix Cosmos Platform

The Netflix TechBlog

MARCH 1, 2021

It supports both high throughput services that consume hundreds of thousands of CPUs at a time, and latency-sensitive workloads where humans are waiting for the results of a computation. The subsystems all communicate with each other asynchronously via Timestone, a high-scale, low-latency priority queuing system.

Serverless

Serverless Media Latency Social Media

Site reliability engineering: 5 things you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. Shift-left using an SRE approach means that reliability is baked into each process, app and code change.

Engineering

Engineering DevOps Government Latency

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Our previous blog post presented replay traffic testing — a crucial instrument in our toolkit that allows us to implement these transformations with precision and reliability. A process that doesn’t just minimize risk, but also facilitates a continuous evaluation of the rollout’s impact.

Traffic

Traffic Metrics Systems Strategy

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. Its goal is to assign running processes to time slices of the CPU in a “fair” way. So why mess with it?

Cache

Cache Latency Airlines Logistics

Site reliability engineering: 5 things to you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. Shift-left using an SRE approach means that reliability is baked into each process, app and code change.

Engineering

Engineering DevOps Government Latency

Extending Vector with eBPF to inspect host and container performance

The Netflix TechBlog

FEBRUARY 20, 2019

Today we are excited to announce latency heatmaps and improved container support for our on-host monitoring solution?—?Vector?—?to Remotely view real-time process scheduler latency and tcp throughput with Vector and eBPF What is Vector? to the broader community. Vector is open source and in use by multiple companies.

Performance

Performance Latency Open Source Metrics

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Usually Data scientists and engineers write Extract-Transform-Load (ETL) jobs and pipelines using big data compute technologies, like Spark or Presto , to process this data and periodically compute key information for a member or a video. The processed data is typically stored as data warehouse tables in AWS S3.

Latency

Latency Storage Big Data Tuning

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

The voice service then constructs a message for the device and places it on the message queue, which is then processed and sent to Pushy to deliver to the device. Since that presentation, Pushy has grown in both size and scope, and this article will be discussing the investments we’ve made to evolve Pushy for the next generation of features.

Latency

Latency Cache Tuning Efficiency

Optimize your environment: Unveiling Dynatrace Hyper-V extension for enhanced performance and efficient troubleshooting

Dynatrace

OCTOBER 23, 2023

This presents a challenge for IT operations teams, specifically in identifying and addressing performance issues or planning how to prevent future issues. Therefore, they experience how the application code functions and how the application operations depend on the underlying hardware resources and the operating system managed by Hyper-V.

Efficiency

Efficiency Virtualization Hardware Performance

What is full stack observability?

Dynatrace

APRIL 6, 2022

Observability can identify the baseline user experience and allow teams to improve it by optimizing page load times or reducing latency. Cloud environments present IT complexity challenges that don’t exist in on-premises data centers. Why full-stack observability matters. Improve business decisions with precision analytics.

DevOps

DevOps Innovation Infrastructure Cloud

Jamstack CMS: The Past, The Present and The Future

Smashing Magazine

AUGUST 20, 2021

Jamstack CMS: The Past, The Present and The Future. Jamstack CMS: The Past, The Present and The Future. While developers are an essential part of the Jamstack, they’re often heavily involved in the content publishing process. Mike Neumegen. 2021-08-20T08:00:00+00:00. 2021-08-20T09:19:47+00:00. Less Reliance On Developers.

Ecommerce

Ecommerce Website Government Internet

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Infrastructure

Dynatrace accelerates business transformation with new AI observability solution

Dynatrace

JANUARY 31, 2024

While off-the-shelf models assist many organizations in initiating their journeys with generative AI (GenAI), scaling AI for enterprise use presents formidable challenges. It requires specialized talent, a new technology stack to manage and deploy models, an ample budget for rising compute costs, and end-to-end security.

Cache

Cache Azure Infrastructure Monitoring

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Edge computing has transformed how businesses and industries process and manage data. By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. As data streams grow in complexity, processing efficiency can decline.

IoT

IoT Energy Logistics Latency

Edge Authentication and Token-Agnostic Identity Propagation

The Netflix TechBlog

FEBRUARY 9, 2021

In the process, we changed end-to-end identity propagation within the network of services to use a cryptographically-verifiable token-agnostic identity object. We would need to process authentication tokens (and protocols) further upstream. EAS also covers the read-only processing of tokens to create Passports (more on that later).

Architecture

Architecture Latency Servers Website

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

The Netflix TechBlog

SEPTEMBER 3, 2021

When we process a request it is often beneficial to know which fields the caller is interested in and which ones they ignore. Remote calls are never free; they impose extra latency, increase probability of an error, and consume network bandwidth. FieldMask is a protobuf message.

Design

Design Java Code Servers

Towards a Reliable Device Management Platform

The Netflix TechBlog

AUGUST 30, 2021

The challenge, then, is to be able to ingest and process these events in a scalable manner, i.e., scaling with the number of devices, which will be the focus of this blog post. In-Order Processing The semantics of correct device information updates ingestion requires that messages be consumed in the order that they are produced.

Latency

Latency Traffic Transportation Cloud

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Dynatrace

APRIL 25, 2023

Higher latency and cold start issues due to the initialization time of the functions. Data visualization : how to present, explore and interpret observability data from serverless functions intuitively, clearly, and holistically? Enable faster development and deployment cycles by abstracting away the infrastructure complexity.

Serverless

Serverless Lambda Azure AWS

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

There are several benefits of such optimizations like saving on storage, faster query time, cheaper downstream processing, and an increase in developer productivity by removing additional ETLs written only for query performance improvement. More processing resources. Increase in storage space.

Storage

Storage Latency Efficiency Data Engineering

Observability platform vs. observability tools

Dynatrace

DECEMBER 22, 2021

Metrics are measures of critical system values, such as CPU utilization or average write latency to persistent storage. A platform approach, on the other hand, presents a more effective option for understanding observability as a whole. In this case, the best option may be to stop the process and execute it when system load is low.

Artificial Intelligence

Artificial Intelligence Metrics Architecture DevOps

Zero Configuration Service Mesh with On-Demand Cluster Discovery

The Netflix TechBlog

AUGUST 29, 2023

For Inter-Process Communication (IPC) between services, we needed the rich feature set that a mid-tier load balancer typically provides. Eureka and Ribbon presented a simple but powerful interface, which made adopting them easy. There is a downside to fetching this data on-demand: this adds latency to the first request to a cluster.

Traffic

Traffic Latency Cloud C++

Making Cloud.typography Fast(er)

CSS Wizardry

AUGUST 13, 2019

Although this response has a 0B filesize, we will always take the latency hit on every single page view (and this response is basically 100% latency). com , which introduces yet more latency for the connection setup. Remember, neither of these changes are solving any of the issues inherently present in Cloud.typography.

Latency

Latency Cache Strategy Media

AI Essentials for Tech Executives

O'Reilly

FEBRUARY 18, 2025

You can find more information and our call for presentations here. Focusing on tools over processes is a red flag and the biggest mistake I see executives make when it comes to AI. Improvement Requires Process Assuming that buying a tool will solve your AI problems is like joining a gym but not actually going.

Latency

Latency Tuning Metrics Testing

Growth Engineering at Netflix?—?Automated Imagery Generation

The Netflix TechBlog

FEBRUARY 9, 2021

We need to be able to easily determine what imagery is present for a given platform, region, and language. Server-generated assets, since client-side generation would require the retrieval of many individual images, which would increase latency and time-to-render. The imagery needs to be localized.

Engineering

Engineering Storage Latency Entertainment

Crucial Redis Monitoring Metrics You Must Watch

Scalegrid

JANUARY 25, 2024

Key Takeaways Critical performance indicators such as latency, CPU usage, memory utilization, hit rate, and number of connected clients/slaves/evictions must be monitored to maintain Redis’s high throughput and low latency capabilities. It can achieve impressive performance, handling up to 50 million operations per second.

Metrics

Metrics Monitoring Latency Cache

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

This entertaining romp through the tech stack serves as an introduction to how we think about and design systems, the Netflix approach to operational challenges, and how other organizations can apply our thought processes and technologies. We explore all the systems necessary to make and stream content from Netflix.

AWS

AWS Entertainment Open Source Benchmarking

Redis® Monitoring Strategies for 2025

Scalegrid

JANUARY 21, 2025

Identifying key Redis metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. To monitor Redis instances effectively, collect Redis metrics focusing on cache hit ratio, memory allocated, and latency threshold. Setting Up RedisInsight Getting RedisInsight up and running is a simple process.

Strategy

Strategy Monitoring Latency DevOps

Act locally, connect globally with IoT and edge computing

All Things Distributed

OCTOBER 16, 2019

Because these IoT devices are powered by microprocessors or microcontrollers that have limited processing power and memory, they often rely heavily on AWS and the cloud for processing, analytics, storage, and machine learning. In other words, process the data closer to where it's created.

IoT

IoT Healthcare Internet Internet

A one size fits all database doesn't fit anyone

All Things Distributed

JUNE 21, 2018

This doesn't mean relational databases do not provide utility in present-day development and are not available, scalable, or provide high performance. Use cases such as gaming, ad tech, and IoT lend themselves particularly well to the key-value data model where the access patterns require low-latency Gets/Puts for known key values.

Database

Database AWS Games Latency

SVT-AV1: an open-source AV1 encoder and decoder

The Netflix TechBlog

MARCH 13, 2020

Compared to the most recent master version of libaom (AV1 reference software), SVT-AV1 is similar in compression efficiency and at the same time achieves significantly lower encoding latency on multi-core platforms when using its inherent parallelization capabilities.

Open Source

Open Source Efficiency C++ Speed

Redis® Monitoring Strategies for 2024

Scalegrid

DECEMBER 21, 2023

Identifying key Redis® metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. To monitor Redis® instances effectively, collect Redis metrics focusing on cache hit ratio, memory allocated, and latency threshold. Setting Up RedisInsight Getting RedisInsight up and running is a simple process.

Strategy

Strategy Monitoring Latency DevOps

Netflix’s Distributed Counter Abstraction

Optimising for High Latency Environments

Trending Sources

Introducing Impressions at Netflix

Rebuilding Netflix Video Processing Pipeline with Microservices

RabbitMQ vs. Kafka: Key Differences

Title Launch Observability at Netflix Scale

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

Why applying chaos engineering to data-intensive applications matters

The Three Cs: Concatenate, Compress, Cache

Consistent caching mechanism in Titus Gateway

Edgar: Solving Mysteries Faster with Observability

Nine ways technology executives can get significant business value with the right observability platform

Build systems more reliably with Dynatrace: Chaos Engineering

The Netflix Cosmos Platform

Site reliability engineering: 5 things you need to know

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Predictive CPU isolation of containers at Netflix

Site reliability engineering: 5 things to you need to know

Extending Vector with eBPF to inspect host and container performance

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Optimize your environment: Unveiling Dynatrace Hyper-V extension for enhanced performance and efficient troubleshooting

What is full stack observability?

Jamstack CMS: The Past, The Present and The Future

Introducing Netflix TimeSeries Data Abstraction Layer

Dynatrace accelerates business transformation with new AI observability solution

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Edge Authentication and Token-Agnostic Identity Propagation

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

Towards a Reliable Device Management Platform

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Optimizing data warehouse storage

Observability platform vs. observability tools

Zero Configuration Service Mesh with On-Demand Cluster Discovery

Making Cloud.typography Fast(er)

AI Essentials for Tech Executives

Growth Engineering at Netflix?—?Automated Imagery Generation

Crucial Redis Monitoring Metrics You Must Watch

Netflix at AWS re:Invent 2019

Redis® Monitoring Strategies for 2025

Act locally, connect globally with IoT and edge computing

A one size fits all database doesn't fit anyone

SVT-AV1: an open-source AV1 encoder and decoder

Redis® Monitoring Strategies for 2024

Stay Connected