Latency, Strategy and Testing - Technology Performance Pulse

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

The Three Cs: Concatenate, Compress, Cache

CSS Wizardry

OCTOBER 16, 2023

Given that 66% of all websites (and 77% of all requests ) are running HTTP/2, I will not discuss concatenation strategies for HTTP/1.1 In one test, I concatenated it all into one big file, and the other had the library split into 12 files. Read the complete test methodology. in this article. If you are still running HTTP/1.1,

Cache

Cache Latency Strategy Speed

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

The three strategies we will discuss today are AB Testing , Replay Testing, and Sticky Canaries. To launch Phase 1 safely, we used AB Testing. To launch Phase 2 safely, we used Replay Testing and Sticky Canaries. We knew we could test the same query with the same inputs and consistently expect the same results.

Traffic

Traffic Latency Metrics Cache

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

This blog series will examine the tools, techniques, and strategies we have utilized to achieve this goal. This blog post will provide a detailed analysis of replay traffic testing, a versatile technique we have applied in the preliminary validation phase for multiple migration initiatives. This approach has a handful of benefits.

Traffic

Traffic Latency Tuning Systems

Reducing Your Database Hosting Costs: DigitalOcean vs. AWS vs. Azure

Scalegrid

APRIL 28, 2020

Replication Strategy. Does it affect latency? Yes, you can see an increase in latency. So, if you’re hosting your application in AWS or Azure and move your database to DigitalOcean, you will see an increase in latency. Here are the configurations for this comparison: Plan. Dedicated Hosting. MongoDB® Database.

Azure

Azure AWS Database Latency

The Three Types of Performance Testing

CSS Wizardry

OCTOBER 27, 2018

A lot of companies—even if they are aware that performance is key to their business—are often unsure of how, when, or where performance testing sits within their development lifecycle. Each kind of testing is listed chronologically—that is, you should do them in order—but all complement each other, and will ultimately feed into one another.

Performance Testing

Performance Testing Testing Performance Strategy

Interpreting A/B test results: false negatives and power

The Netflix TechBlog

OCTOBER 26, 2021

Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , and Colin McFarland This is the fourth post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on our products. Have a look at Part 1 (Decision Making at Netflix), Part 2 (What is an A/B Test?), Need to catch up?

Testing

Testing Metrics Latency Design

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Scalegrid

JUNE 4, 2020

Compare Latency. lower latency compared to DigitalOcean for PostgreSQL. PostgreSQL DigitalOcean Performance Test. Now, let’s take a look at the throughput and latency performance of our comparison. Next, we are going to test and compare the latency performance between ScaleGrid and DigitalOcean for PostgreSQL.

Database

Database Latency Benchmarking Performance

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Our previous blog post presented replay traffic testing — a crucial instrument in our toolkit that allows us to implement these transformations with precision and reliability. Compared to replay testing, canaries allow us to extend the validation scope beyond the service level.

Traffic

Traffic Metrics Systems Strategy

OpenTelemetry 101: A nontechnical guide for IT leaders and enthusiasts

Dynatrace

JULY 22, 2024

Traces are used for performance analysis, latency optimization, and root cause analysis. This approach allows you to test and refine configurations, manage implementation complexity, and demonstrate value to stakeholders. Capture critical performance indicators such as request latency, error rates, and resource usage.

Latency

Latency Best Practices Metrics Open Source

Self-Host Your Static Assets

CSS Wizardry

MAY 31, 2019

Every new origin we need to visit needs a connection opening, and that can be very costly: DNS resolution, TCP handshakes, and TLS negotiation all add up, and the story gets worse the higher the latency of the connection is. On a slower, higher-latency connection, the story is much, mush worse. All completely avoidable. to just 3.6s.

Cache

Cache Latency Infrastructure Website

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination. It also serves as central configuration of access patterns such as consistency or latency targets. Useful for keeping “n-newest” or prefix path deletion.

Latency

Latency Storage Cache Efficiency

How to maximize serverless benefits and overcome its challenges

Dynatrace

OCTOBER 10, 2022

However, not all cloud strategies are the same. Application developers can spin up isolated test environments that pose no risk to current operations. Then, they can apply DevSecOps best practices to fully test new code and see what breaks without affecting current operations. Reduced latency. Difficult to test.

Serverless

Serverless Infrastructure Lambda Latency

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. CFS is widely used and therefore well tested and Linux machines around the world run with reasonable performance.

Cache

Cache Latency Airlines Logistics

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

In that scenario, the system would need to deal with the data propagation latency directly, for example, by use of timeouts or client-originated update tracking mechanisms. We started seeing increased response latencies and leader servers running at dangerously high utilization.

Cache

Cache Latency Traffic Systems

Automated Change Impact Analysis with Site Reliability Guardian

Dynatrace

FEBRUARY 15, 2023

Streamline development and delivery processes Nowadays, digital transformation strategies are executed by almost every organization across all industries. SREs use Service-Level Indicators (SLI) to see the complete picture of service availability, latency, performance, and capacity across various systems, especially revenue-critical systems.

DevOps

DevOps Latency Traffic Best Practices

Real-World Effectiveness of Brotli

CSS Wizardry

APRIL 22, 2020

This is because file-size is only one aspect of web performance, and whatever the file-size is, the resource is still sat on top of a lot of other factors and constants—latency, packet loss, etc. This simple, elegant strategy manages to balance caution with optimism, and applies to every new TCP connection that your web application makes.

Latency

Latency Servers Website Speed

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

Because Google offers its own Google Cloud Architecture Framework and Microsoft its Azure Well-Architected Framework , organizations that use a combination of these platforms triple the challenge of integrating their performance frameworks into a cohesive strategy. If both objectives pass, you have achieved your cost reduction on CPU size.

AWS

AWS Efficiency Azure Cloud

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

SEPTEMBER 8, 2020

Over the course of this post, we will talk about our approach to this migration, the strategies that we employed, and the tools we built to support this. For the migration, testing was a first-class citizen. Replay Testing Enter replay testing.

Latency

Latency Cache Java Traffic

Dynatrace accelerates business transformation with new AI observability solution

Dynatrace

JANUARY 31, 2024

For production models, this provides observability of service-level agreement (SLA) performance metrics, such as token consumption, latency, availability, response time, and error count. For model explainability, they can implement custom regression tests, providing indicators of model reputation and behavior over time.

Cache

Cache Azure Infrastructure Monitoring

SRE vs DevOps: What you need to know

Dynatrace

FEBRUARY 24, 2021

Encouraging a shift-left approach , testing earlier in the development lifecycle. Reduced latency. If you haven’t implemented either, a best practice to get started is to develop a strategy that incorporates both DevOps and SRE practices. Investing in automation and tooling to avoid toil. SRE vs DevOps? Efficiency.

DevOps

DevOps Software Engineering Speed Google

Real user monitoring vs. synthetic monitoring: Understanding best practices

Dynatrace

JUNE 27, 2022

These development and testing practices ensure the performance of critical applications and resources to deliver loyalty-building user experiences. Because pre-production environments are used for testing before an application is released to end users, teams have no access to real-user data. What is synthetic monitoring?

Best Practices

Best Practices Monitoring Wireless Traffic

The Best Way to Host MongoDB on DigitalOcean

Scalegrid

DECEMBER 16, 2019

In this post, we’ll walk you through the best way to host MongoDB on DigitalOcean, including the best instance types to use, disk types, replication strategy, and managed service providers. MongoDB Replication Strategies. DigitalOcean Advantages for MongoDB. What’s most impressive is that you’re not compromising performance for cost.

Azure

Azure AWS Database Latency

DevOps observability: A guide for DevOps and DevSecOps teams

Dynatrace

JANUARY 18, 2023

This methodology aims to improve software system reliability using several key categories such as availability, performance, latency, efficiency, capacity, and incident response. Organizations that are new to both practices will want to adopt a strategy that incorporates both. Site reliability engineers, or SREs, lead these efforts.

DevOps

DevOps Best Practices Innovation Strategy

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls. Using simple lookup indices in Cassandra gives us the ability to maintain acceptable read latencies while doing heavy writes.

Infrastructure

Infrastructure Transportation Storage Open Source

Making Cloud.typography Fast(er)

CSS Wizardry

AUGUST 13, 2019

I was doing some cursory research and running a few tests against a potential client’s site so as to get a good understanding of the shape of things before we were to work together. Although this response has a 0B filesize, we will always take the latency hit on every single page view (and this response is basically 100% latency).

Latency

Latency Cache Strategy Media

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

A cloud migration strategy, however, provides technical optimization that’s also firmly rooted in the business value chain. With cloud-based resources, teams can spin up infrastructure in seconds, begin testing immediately, scale up or down as needed, and easily eliminate resources that are no longer needed. Read eBook now!

Cloud

Cloud Traffic Best Practices Strategy

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

Scalegrid

OCTOBER 17, 2019

While there is plenty of well-documented benefits to using a connection pooler, there are some arguments to be made against using one: Introducing a middleware in the communication inevitably introduces some latency. Our tests show that even a small number of clients can significantly benefit from using a connection pooler.

Architecture

Architecture Database Latency Servers

Managing risk for financial services: The secret to visibility and control during times of volatility

Dynatrace

APRIL 8, 2024

Automate stress-testing and regulatory reporting requirements. Maximize performance for high-frequency and low-latency trading strategies. Institutionalize processes for technical stress testing and technical audits of trading systems. Break down data silos.

Analytics

Analytics Infrastructure Efficiency Technology

What is observability? Not just logs, metrics and traces

Dynatrace

OCTOBER 1, 2021

By bringing AI to everything — from gathering telemetry to analyzing what’s happening across the full technology stack — your organization can have the reliable answers essential for automating application monitoring, testing, continuous delivery, application security, and incident response. Bring observability to everything.

Metrics

Metrics Open Source Monitoring Cloud

Performance Hero: Annie Sullivan

Speed Curve

JANUARY 19, 2025

So in addition to all the optimization work we did for Google Docs, I got to spend a lot of time and energy working on the measurement problem: how can we get end-to-end latency numbers? There were two case studies highlighting third party wins published on web.dev ( 1 , 2 ), and Google Publisher Tag launched a new yielding strategy.

Performance

Performance Google Speed Metrics

Balancing Low Latency, High Availability, and Cloud Choice

VoltDB

MAY 14, 2024

Balancing Low Latency, High Availability and Cloud Choice Cloud hosting is no longer just an option — it’s now, in many cases, the default choice. Prototypes, experiments, and tests Development and testing historically involved end-of-life or ‘spare’ hardware. With the cloud, your testing capacity is only limited by your budget.

Latency

Latency Availability Cloud Hardware

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly

MARCH 25, 2025

Weve seen this across dozens of companies, and the teams that break out of this trap all adopt some version of Evaluation-Driven Development (EDD), where testing, monitoring, and evaluation drive every decision from the start. What breaks your app in production isnt always what you tested for in dev! The way out?

Systems

Systems Development Tuning Monitoring

Evolution of ML Fact Store

The Netflix TechBlog

APRIL 26, 2022

Low-latency Queries To avoid downloading all of the fact data from s3 in a spark executor and then dropping it, we analyzed our query patterns and figured out that there is a way to only access the data that we are interested in. Corruption in data can significantly impact production model performance and A/B test results.

Storage

Storage Design Scalability Latency

Fixing Performance Regressions Before they Happen

The Netflix TechBlog

JANUARY 24, 2022

This post describes how the Netflix TVUI team implemented a robust strategy to quickly and easily detect performance anomalies before they are released?—?and Technically, “performance” metrics are those relating to the responsiveness or latency of the app, including start up time. Why do we run Performance Tests on commits?

Performance

Performance Performance Testing Metrics Testing

Understanding operational 5G: a first measurement study on its coverage, performance and energy consumption

The Morning Paper

OCTOBER 4, 2020

We are standing on the eve of the 5G era… 5G, as a monumental shift in cellular communication technology, holds tremendous potential for spurring innovations across many vertical industries, with its promised multi-Gbps speed, sub-10 ms low latency, and massive connectivity. Throughput and latency. energy consumption).

Energy

Energy Latency Performance Network

Introducing Dynatrace built-in data observability on Davis AI and Grail

Dynatrace

JANUARY 31, 2024

In the age of AI, data observability has become foundational and complementary to AI observability, data quality being essential for training and testing AI models. Data is the foundation upon which strategies are built, directions are chosen, and innovations are pursued.

DevOps

DevOps Analytics Airlines Metrics

AI Essentials for Tech Executives

O'Reilly

FEBRUARY 18, 2025

During the interview, Jake made a statement about AI testing that was widely shared: One of the things we learned is that after it passes 100 tests, the odds that it will pass a random distribution of 100k user inputs with 100% accuracy is very high. If youre not hands-on with AI, this advice might sound reasonable.

Latency

Latency Tuning Metrics Testing

KeyCDN Launches New POP in Mexico

KeyCDN

NOVEMBER 4, 2021

The POP is strategially located within the country and lowers latency overall. KeyCDN is always on the lookout for ways to minimize latency and accelerate asset delivery worldwide. Hola Mexico! We've launched our new point of presence (POP) in Mexico City. In this case, the POP's identifier is mxmc.

Latency

Latency Tuning Cache Traffic

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

Scalegrid

APRIL 16, 2020

In this post, we compare ScaleGrid’s Bring Your Own Cloud (BYOC) plan vs. the standard Dedicated Hosting model to help you determine the best strategy for your MySQL, PostgreSQL, Redis™ and MongoDB® database deployment. Deploying your application and database on the same VPC also provides the lowest possible latency path. Expert Tip.

Cloud

Cloud Azure AWS Database

Meet Hydrogen: A React Framework For Dynamic, Contextual And Personalized E-Commerce

Smashing Magazine

NOVEMBER 8, 2021

As developers, we rightfully obsess about the customer experience, relentlessly working to squeeze every millisecond out of the critical rendering path, optimize input latency, and eliminate jank. At the limit, statically generated, edge delivered, and HTML-first pages look like the optimal strategy.

Cache

Cache Best Practices Strategy Servers

Netflix Video Quality at Scale with Cosmos Microservices

The Netflix TechBlog

NOVEMBER 2, 2021

Perceptual quality measurements are used to drive video encoding optimizations , perform video codec comparisons , carry out A/B testing and optimize streaming QoE decisions to mention a few. This enables us to use our scale to increase throughput and reduce latencies. VQS is called using the measureQuality endpoint.

Media

Media Innovation Metrics Latency

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

Dealing with ambiguities and missing data : Sometimes the entries in BDP are contaminated with testing entries and NULL values, along with ambiguous values that have no meaning or just simply contradictory values due to unreal test environments. Restricting Testing and Analysis to one day and device at a time.

Big Data

Big Data Cache Engineering Data Engineering

Crucial Redis Monitoring Metrics You Must Watch

Scalegrid

JANUARY 25, 2024

Key Takeaways Critical performance indicators such as latency, CPU usage, memory utilization, hit rate, and number of connected clients/slaves/evictions must be monitored to maintain Redis’s high throughput and low latency capabilities. It can achieve impressive performance, handling up to 50 million operations per second.

Metrics

Metrics Monitoring Latency Cache

Netflix’s Distributed Counter Abstraction

The Three Cs: Concatenate, Compress, Cache

Trending Sources

Migrating Netflix to GraphQL Safely

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Reducing Your Database Hosting Costs: DigitalOcean vs. AWS vs. Azure

The Three Types of Performance Testing

Interpreting A/B test results: false negatives and power

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

OpenTelemetry 101: A nontechnical guide for IT leaders and enthusiasts

Self-Host Your Static Assets

Introducing Netflix’s Key-Value Data Abstraction Layer

How to maximize serverless benefits and overcome its challenges

Predictive CPU isolation of containers at Netflix

Consistent caching mechanism in Titus Gateway

Automated Change Impact Analysis with Site Reliability Guardian

Real-World Effectiveness of Brotli

Implementing AWS well-architected pillars with automated workflows

Seamlessly Swapping the API backend of the Netflix Android app

Dynatrace accelerates business transformation with new AI observability solution

SRE vs DevOps: What you need to know

Real user monitoring vs. synthetic monitoring: Understanding best practices

The Best Way to Host MongoDB on DigitalOcean

DevOps observability: A guide for DevOps and DevSecOps teams

Building Netflix’s Distributed Tracing Infrastructure

Making Cloud.typography Fast(er)

What is cloud migration?

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

Managing risk for financial services: The secret to visibility and control during times of volatility

What is observability? Not just logs, metrics and traces

Performance Hero: Annie Sullivan

Balancing Low Latency, High Availability, and Cloud Choice

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Evolution of ML Fact Store

Fixing Performance Regressions Before they Happen

Understanding operational 5G: a first measurement study on its coverage, performance and energy consumption

Introducing Dynatrace built-in data observability on Davis AI and Grail

AI Essentials for Tech Executives

KeyCDN Launches New POP in Mexico

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

Meet Hydrogen: A React Framework For Dynamic, Contextual And Personalized E-Commerce

Netflix Video Quality at Scale with Cosmos Microservices

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

Crucial Redis Monitoring Metrics You Must Watch

Stay Connected