Availability, Latency and Strategy - Technology Performance Pulse

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Dynatrace

APRIL 10, 2025

Here are five strategies executives can pursue to reduce tool sprawl, lower costs, and increase operational efficiency. Re-indexing data and rehydrating it from cold storage for incident investigation and forensics causes query latency and additional management overhead and cost. See the overview on the homepage.

Strategy

Strategy Storage Network Architecture

Optimising for High Latency Environments

CSS Wizardry

SEPTEMBER 16, 2024

This gives fascinating insights into the network topography of our visitors, and how much we might be impacted by high latency regions. Round-trip-time (RTT) is basically a measure of latency—how long did it take to get from one endpoint to another and back again? What is RTT? That’s exactly what this article is about.

Latency

Latency Cache Transportation Mobile

FIFO vs. LIFO: Which Queueing Strategy Is Better for Availability and Latency?

DZone

MARCH 14, 2023

As an engineer, you probably know that server performance under heavy load is crucial for maintaining the availability and responsiveness of your services. In this post, we'll explore both strategies through a simple simulation in Colab, allowing you to see the impact of changing parameters on system performance.

Strategy

Strategy Latency Availability Traffic

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Youll also learn strategies for maintaining data safety and managing node failures so your RabbitMQ setup is always up to the task. Implementing clustering and quorum queues in RabbitMQ significantly improves load distribution and data redundancy, ensuring high availability and fault tolerance for messaging services.

Best Practices

Best Practices Traffic Strategy Efficiency

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

We can experiment with different content placements or promotional strategies to boost visibility and engagement. Analyzing impression history, for example, might help determine how well a specific row on the home page is functioning or assess the effectiveness of a merchandising strategy.

Tuning

Tuning Latency Efficiency Storage

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

Yet, many are confined to a brief temporal window due to constraints in serving latency or training costs. Key insights from this shiftinclude: A Data-Centric Approach : Shifting focus from model-centric strategies, which heavily rely on feature engineering, to a data-centric one.

Tuning

Tuning Efficiency Latency Strategy

The Three Cs: Concatenate, Compress, Cache

CSS Wizardry

OCTOBER 16, 2023

What is the availability, configurability, and efficacy of each? ?️ Given that 66% of all websites (and 77% of all requests ) are running HTTP/2, I will not discuss concatenation strategies for HTTP/1.1 4,362ms of cumulative latency; 240ms of cumulative download. The former makes for a simpler build step, but is it faster? ?️

Cache

Cache Latency Strategy Speed

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Its design prioritizes high availability and efficient data transfer with minimal overhead, making it a practical choice for handling real-time data pipelines and distributed event processing. It follows a push-based approach, ensuring messages are distributed to consumers as soon as they become available.

Latency

Latency Analytics Architecture Storage

Reducing Your Database Hosting Costs: DigitalOcean vs. AWS vs. Azure

Scalegrid

APRIL 28, 2020

Replication Strategy. Is my database cluster still highly available? All of our high availability options are offered in DigitalOcean, including 2 Replicas + 1 Arbiter, 3 Replicas and custom replica set setups. DigitalOcean does not have the concept of availability zones (AZ), so we distribute the nodes across different regions.

Azure

Azure AWS Database Latency

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

The three strategies we will discuss today are AB Testing , Replay Testing, and Sticky Canaries. Let’s discuss the three testing strategies in further detail. To determine customer impact, we could compare various metrics such as error rates, latencies, and time to render. How does it work?

Traffic

Traffic Latency Metrics Cache

Redis® Monitoring Strategies for 2025

Scalegrid

JANUARY 21, 2025

Identifying key Redis metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. To achieve optimal tracking results it is important to choose wisely among available tools like Prometheus or Grafana, which offer deeper insights into understanding your Redis instances for better performance optimization.

Strategy

Strategy Monitoring Latency DevOps

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

This blog series will examine the tools, techniques, and strategies we have utilized to achieve this goal. In this testing strategy, we execute a copy (replay) of production traffic against a system’s existing and new versions to perform relevant validations. This approach has a handful of benefits.

Traffic

Traffic Latency Tuning Systems

Redis® Monitoring Strategies for 2024

Scalegrid

DECEMBER 21, 2023

Identifying key Redis® metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. To monitor Redis® instances effectively, collect Redis metrics focusing on cache hit ratio, memory allocated, and latency threshold.

Strategy

Strategy Monitoring Latency DevOps

Noisy Neighbor Detection with eBPF

The Netflix TechBlog

SEPTEMBER 10, 2024

Continuous Instrumentation of the Linux Scheduler To ensure the reliability of our workloads that depend on low latency responses, we instrumented the run queue latency for each container, which measures the time processes spend in the scheduling queue before being dispatched to the CPU.

Latency

Latency Metrics Programming Monitoring

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing systems, designed for continuous, low-latency processing, demand swift recovery mechanisms to tolerate and mitigate failures effectively. After failures, Kafka Streams’ partition assignment strategy, triggered by rebalances, causes its executions to accumulate more lag. This significantly increases event latency.

Engineering

Engineering Tuning Latency Open Source

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Central to this infrastructure is our use of multiple online distributed databases such as Apache Cassandra , a NoSQL database known for its high availability and scalability. It also serves as central configuration of access patterns such as consistency or latency targets. Useful for keeping “n-newest” or prefix path deletion.

Latency

Latency Storage Cache Servers

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Scalegrid

JUNE 4, 2020

Compare Latency. lower latency compared to DigitalOcean for PostgreSQL. Now, let’s take a look at the throughput and latency performance of our comparison. Next, we are going to test and compare the latency performance between ScaleGrid and DigitalOcean for PostgreSQL. PostgreSQL DigitalOcean Latency Averages (ms).

Database

Database Latency Benchmarking Performance

AI-driven analysis of Spring Micrometer metrics in context, with typology at scale

Dynatrace

APRIL 7, 2022

Every company has its own strategy as to which technologies to use. The Dynatrace registry v2 is available starting with version 1.8.0 Auto-configuration is available for all OneAgent-monitored hosts. Dynatrace news. That’s a large amount of data to handle. Here’s how it works. of Micrometer.

Metrics

Metrics Java Latency Cache

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

Mastering Hybrid Cloud Strategy Are you looking to leverage the best private and public cloud worlds to propel your business forward? A hybrid cloud strategy could be your answer. Understanding Hybrid Cloud Strategy A hybrid cloud merges the capabilities of public and private clouds into a singular, coherent system.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

Plan Your Multi Cloud Strategy

Scalegrid

MARCH 22, 2024

A well-planned multi cloud strategy can seriously upgrade your business’s tech game, making you more agile. Key Takeaways Multi-cloud strategies have become increasingly popular due to the need for flexibility, innovation, and the avoidance of vendor lock-in. They can also bolster uptime and limit latency issues or potential downtimes.

Strategy

Strategy Cloud Government Innovation

Automated Change Impact Analysis with Site Reliability Guardian

Dynatrace

FEBRUARY 15, 2023

Streamline development and delivery processes Nowadays, digital transformation strategies are executed by almost every organization across all industries. SREs use Service-Level Indicators (SLI) to see the complete picture of service availability, latency, performance, and capacity across various systems, especially revenue-critical systems.

DevOps

DevOps Latency Traffic Best Practices

Scalable Annotation Service?—?Marken

The Netflix TechBlog

JANUARY 25, 2023

The service should be able to serve real-time, aka UI, applications so CRUD and search operations should be achieved with low latency. All data should be also available for offline analytics in Hive/Iceberg. Our service will be used by a lot of internal UI applications hence the latency for CRUD and search operations must be low.

Scalability

Scalability Latency Media Architecture

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

Because Google offers its own Google Cloud Architecture Framework and Microsoft its Azure Well-Architected Framework , organizations that use a combination of these platforms triple the challenge of integrating their performance frameworks into a cohesive strategy. SRG validates the status of the resiliency SLOs for the experiment period.

AWS

AWS Efficiency Azure Cloud

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Dynatrace accelerates business transformation with new AI observability solution

Dynatrace

JANUARY 31, 2024

Model observability provides visibility into resource consumption and operation costs, aiding in optimization and ensuring the most efficient use of available resources. Finding a balance between complexity and impact must be a priority for organizations that adopt AI strategies.

Cache

Cache Azure Infrastructure Monitoring

Self-Host Your Static Assets

CSS Wizardry

MAY 31, 2019

Every new origin we need to visit needs a connection opening, and that can be very costly: DNS resolution, TCP handshakes, and TLS negotiation all add up, and the story gets worse the higher the latency of the connection is. On a slower, higher-latency connection, the story is much, mush worse. All completely avoidable. to just 3.6s.

Cache

Cache Latency Infrastructure Website

SRE vs DevOps: What you need to know

Dynatrace

FEBRUARY 24, 2021

SRE is the transformation of traditional operations practices by using software engineering and DevOps principles to improve the availability, performance, and scalability of releases by building resiliency into apps and infrastructure. Designating and managing Service Level Objectives (SLOs) as availability targets for a service.

DevOps

DevOps Software Engineering Speed Google

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

Cloud migration is the process of transferring some or all your data, software, and operations to a cloud-based computing environment that offers unlimited scale and high availability. A cloud migration strategy, however, provides technical optimization that’s also firmly rooted in the business value chain. Read eBook now!

Cloud

Cloud Traffic Best Practices Strategy

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

In PACELC terms we choose PC/EC and have the same level of availability for writes of our previous system while improving our theoretical availability for reads. In that scenario, the system would need to deal with the data propagation latency directly, for example, by use of timeouts or client-originated update tracking mechanisms.

Cache

Cache Latency Traffic Systems

Real-World Effectiveness of Brotli

CSS Wizardry

APRIL 22, 2020

They were either running their own infrastructure and installing and deploying Brotli everywhere proved non-trivial, or they were using a CDN who didn’t have readily available support for the new algorithm. Each new TCP connection has no way of knowing what bandwidth it currently has available to it, nor how reliable the connection is (i.e.

Latency

Latency Servers Website Speed

The Best Way to Host MongoDB on DigitalOcean

Scalegrid

DECEMBER 16, 2019

In this post, we’ll walk you through the best way to host MongoDB on DigitalOcean, including the best instance types to use, disk types, replication strategy, and managed service providers. MongoDB Replication Strategies. DigitalOcean Advantages for MongoDB. DigitalOcean Droplets. minutes of downtime in one year.

Azure

Azure AWS Database Latency

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

By collecting and analyzing key performance metrics of the service over time, we can assess the impact of the new changes and determine if they meet the availability, latency, and performance requirements. Migrating Persistent Stores Stateful APIs pose unique challenges that require different strategies.

Traffic

Traffic Metrics Systems Strategy

Balancing Low Latency, High Availability, and Cloud Choice

VoltDB

MAY 14, 2024

Balancing Low Latency, High Availability and Cloud Choice Cloud hosting is no longer just an option — it’s now, in many cases, the default choice. Let’s look at the top cloud computing use cases, the use cases for which cloud probably isn’t the best route available, and the use cases where a hybrid approach may be best.

Latency

Latency Availability Cloud Hardware

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

This includes response time, accuracy, speed, throughput, uptime, CPU utilization, and latency. ITOps is also responsible for configuring, maintaining, and managing servers to provide consistent, high-availability network performance and overall security, including a disaster readiness plan. Performance. What does IT operations do?

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

What is AWS Lambda?

Dynatrace

APRIL 5, 2021

You can eliminate the latency issues caused by cold starts — an increase in normal response time when a new instance receives its first request — by using edge-optimized functions that run code closer to users and other projects. AWS continues to improve how it handles latency issues. How do AWS Lambda functions impact monitoring?

Lambda

Lambda AWS Serverless Hardware

Unlock the power of contextual log analytics

Dynatrace

OCTOBER 2, 2024

The Clouds app provides a view of all available cloud-native services. Logs in context, along with other details, are instantly available after selecting a resource. The reasons are easy to find, looking at the latest improvements that went live along with the general availability of the Logs app.

Analytics

Analytics AWS DevOps Cloud

Observability vs. monitoring: What’s the difference?

Dynatrace

NOVEMBER 3, 2021

For example, when monitoring a database, you’ll want to know about any latency when writing data to a disk or average query response time. At the same time, metrics on a dashboard are showing resource exhaustion issues, such as lack of available memory. The post Observability vs. monitoring: What’s the difference?

Monitoring

Monitoring Metrics DevOps Scalability

Making Cloud.typography Fast(er)

CSS Wizardry

AUGUST 13, 2019

There was no appetite from them to do so, so I decided to make it all available for free anyway—a faster web benefits everyone. Although this response has a 0B filesize, we will always take the latency hit on every single page view (and this response is basically 100% latency). Next up, we get sent to fonts.[client].com

Latency

Latency Cache Strategy Media

DevOps observability: A guide for DevOps and DevSecOps teams

Dynatrace

JANUARY 18, 2023

This methodology aims to improve software system reliability using several key categories such as availability, performance, latency, efficiency, capacity, and incident response. Organizations that are new to both practices will want to adopt a strategy that incorporates both. What are SLOs?

DevOps

DevOps Best Practices Innovation Strategy

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

SEPTEMBER 8, 2020

Over the course of this post, we will talk about our approach to this migration, the strategies that we employed, and the tools we built to support this. Being able to canary a new route let us verify latency and error rates were within acceptable limits. ecosystem and the rich selection of npm packages available.

Latency

Latency Cache Java Traffic

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

JULY 26, 2021

From the moment a Netflix film or series is pitched and long before it becomes available on Netflix, it goes through many phases. Operational Reporting is a reporting paradigm specialized in covering high-resolution, low-latency data sets, serving detailed day-to-day activities¹ and processes of a business domain.

Big Data

Big Data Government Processing Analytics

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

Scalegrid

APRIL 16, 2020

In this post, we compare ScaleGrid’s Bring Your Own Cloud (BYOC) plan vs. the standard Dedicated Hosting model to help you determine the best strategy for your MySQL, PostgreSQL, Redis™ and MongoDB® database deployment. Both AWS EC2 instances and Azure VM instances are available as Reserved Instances, and can be used through the BYOC plan.

Cloud

Cloud Azure AWS Database

Netflix’s Distributed Counter Abstraction

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Trending Sources

Optimising for High Latency Environments

FIFO vs. LIFO: Which Queueing Strategy Is Better for Availability and Latency?

Best Practices for Scaling RabbitMQ

Introducing Impressions at Netflix

Foundation Model for Personalized Recommendation

The Three Cs: Concatenate, Compress, Cache

RabbitMQ vs. Kafka: Key Differences

Reducing Your Database Hosting Costs: DigitalOcean vs. AWS vs. Azure

Migrating Netflix to GraphQL Safely

Redis® Monitoring Strategies for 2025

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Redis® Monitoring Strategies for 2024

Noisy Neighbor Detection with eBPF

Why applying chaos engineering to data-intensive applications matters

Introducing Netflix’s Key-Value Data Abstraction Layer

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

AI-driven analysis of Spring Micrometer metrics in context, with typology at scale

Mastering Hybrid Cloud Strategy

Predictive CPU isolation of containers at Netflix

Plan Your Multi Cloud Strategy

Automated Change Impact Analysis with Site Reliability Guardian

Scalable Annotation Service?—?Marken

Implementing AWS well-architected pillars with automated workflows

Introducing Netflix TimeSeries Data Abstraction Layer

Dynatrace accelerates business transformation with new AI observability solution

Self-Host Your Static Assets

SRE vs DevOps: What you need to know

What is cloud migration?

Consistent caching mechanism in Titus Gateway

Real-World Effectiveness of Brotli

The Best Way to Host MongoDB on DigitalOcean

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Balancing Low Latency, High Availability, and Cloud Choice

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

What is AWS Lambda?

Unlock the power of contextual log analytics

Observability vs. monitoring: What’s the difference?

Making Cloud.typography Fast(er)

DevOps observability: A guide for DevOps and DevSecOps teams

Seamlessly Swapping the API backend of the Netflix Android app

Data Movement in Netflix Studio via Data Mesh

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

Stay Connected