Cache, Latency and Software - Technology Performance Pulse

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

Optimising for High Latency Environments

CSS Wizardry

SEPTEMBER 16, 2024

This gives fascinating insights into the network topography of our visitors, and how much we might be impacted by high latency regions. Round-trip-time (RTT) is basically a measure of latency—how long did it take to get from one endpoint to another and back again? What is RTT? RTT isn’t a you-thing, it’s a them-thing.

Latency

Latency Cache Transportation Mobile

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

We introduce a caching mechanism in the API gateway layer, allowing us to offload processing from singleton leader elected controllers without giving up strict data consistency and guarantees clients observe. We started seeing increased response latencies and leader servers running at dangerously high utilization.

Cache

Cache Latency Traffic Systems

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

John McCalpin

FEBRUARY 17, 2025

“Latency” is the duration from the execution of a load instruction (to an address that misses in all the caches), and the completion of that load instruction when the data is returned from memory. . The example below is for a 2005-era processor with 60 ns memory latency and 6.4

Latency

Latency Hardware Cache Systems

Front-End: Cache Strategies You Should Know

DZone

MAY 1, 2023

Caches are very useful software components that all engineers must know. In this article, we are going to describe what is a cache and explain specific use cases focusing on the frontend and client side. What Is a Cache?

Cache

Cache Strategy Latency Operating System

Not a Single Trace

DZone

OCTOBER 4, 2023

Your team celebrates a success story where a trace identified a pesky latency issue in your application's authentication service. It turns out that the fix we made did improve performance at one point but created a situation in which key information was never cached. But the celebrations are short-lived.

Latency

Latency Cache Software Software

Dynatrace supports SnapStart for Lambda as an AWS launch partner

Dynatrace

NOVEMBER 28, 2022

The new Amazon capability enables customers to improve the startup latency of their functions from several seconds to as low as sub-second (up to 10 times faster) at P99 (the 99th latency percentile). This can cause latency outliers and may lead to a poor end-user experience for latency-sensitive applications.

Lambda

Lambda AWS Serverless Latency

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Smashing Magazine

JANUARY 7, 2025

In this article, well discuss six ways to design websites for high-traffic events like product drops and sales: Compress and optimize images , Choose a scalable web host , Use a CDN , Leverage caching , Stress test websites , Refine the backend. You can also find optimization plugins or caching solutions that give you access to a CDN.

Traffic

Traffic Website Design Cache

Designing Instagram

High Scalability

JANUARY 11, 2022

When a user requests for feed then there will be two parallel threads involved in fetching the user feeds to optimize for latency. FUN FACT : In this talk , Dikang Gu, a software engineer at Instagram core infra team has mentioned about how they use Cassandra to serve critical usecases, high scalability requirements, and some pain points.

Design

Design Media Storage Logistics

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Utilizing cloned real traffic, we can exercise the diversity of inputs from a wide range of devices and device application software versions in production. It provides a good read on the availability and latency ranges under different production conditions. Logging is selective to cases where the old and new responses do not match.

Traffic

Traffic Latency Tuning Systems

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

In this blog post, we’ll demonstrate how Dynatrace automation and the Dynatrace Site Reliability Guardian can help you implement your applications according to all six AWS Well-Architected pillars by integrating them into your software development lifecycle (SDLC).

AWS

AWS Efficiency Azure Cloud

Re-Architecting the Video Gatekeeper

The Netflix TechBlog

JULY 12, 2019

The Tech Hollow , an OSS technology we released a few years ago, has been best described as a total high-density near cache : Total : The entire dataset is cached on each node?—?there there is no eviction policy, and there are no cache misses. Near : the cache exists in RAM on any instance which requires access to the dataset.

Cache

Cache Architecture Latency Engineering

Crucial Redis Monitoring Metrics You Must Watch

Scalegrid

JANUARY 25, 2024

Key Takeaways Critical performance indicators such as latency, CPU usage, memory utilization, hit rate, and number of connected clients/slaves/evictions must be monitored to maintain Redis’s high throughput and low latency capabilities. It can achieve impressive performance, handling up to 50 million operations per second.

Metrics

Metrics Monitoring Latency Cache

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Deployment: Cache To produce business value, all our Metaflow projects are deployed to work with other production systems. In other cases, it is more convenient to share the results via a low-latency API. A Streamlit app houses the visualization software and data aggregation logic.

Systems

Systems Media Cache Open Source

Observability vs. monitoring: What’s the difference?

Dynatrace

NOVEMBER 3, 2021

For example, when monitoring a database, you’ll want to know about any latency when writing data to a disk or average query response time. Examples include a spike in memory utilization, a decrease in cache hit ratio, or an increase in CPU utilization. Luckily, there are tools and practices that address these challenges.

Monitoring

Monitoring Metrics DevOps Scalability

Redis® Monitoring Strategies for 2025

Scalegrid

JANUARY 21, 2025

Identifying key Redis metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. To monitor Redis instances effectively, collect Redis metrics focusing on cache hit ratio, memory allocated, and latency threshold.

Strategy

Strategy Monitoring Latency DevOps

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

Data lakehouses deliver the query response with minimal latency. By applying massively parallel processing and high-performance caches, all this contextualized data can be interrogated at high speeds for ad-hoc analytics or AI-powered precise answers. Massively parallel processing. The Dynatrace difference, now powered by Grail.

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

Redis® Monitoring Strategies for 2024

Scalegrid

DECEMBER 21, 2023

Identifying key Redis® metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. To monitor Redis® instances effectively, collect Redis metrics focusing on cache hit ratio, memory allocated, and latency threshold.

Strategy

Strategy Monitoring Latency DevOps

GraphQL Search Indexing

The Netflix TechBlog

NOVEMBER 4, 2019

Best of all, our page can load much faster since everything is cached in Elasticsearch. Listening to Kafka events adds little latency, our fan out operations are really quick since we store foreign keys to identify the edges, and looking up data in an inverted index is fast as well. Our data changes constantly?—?

Database

Database Cache Servers Performance

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

Amazon DynamoDB offers low, predictable latencies at any scale. This is not just predictability of median performance and latency, but also at the end of the distribution (the 99.9th percentile), so we could provide acceptable performance for virtually every customer. s read latency, particularly as dataset sizes grow.

Scalability

Scalability Database Ecommerce Latency

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

for us at Netflix, this is a combination of the device type, app session ID and software development kit version (SDK version). Some features (as an example) include Device Type ID, SDK Version, Buffer Sizes, Cache Capacities, UI resolution, Chipset Manufacturer and Brand. This is because since the kill happens rarely (0.9%

Big Data

Big Data Cache Engineering Data Engineering

Amazon DynamoDB Accelerator (DAX): Speed Up DynamoDB Response Times from Milliseconds to Microseconds without Application Rewrite.

All Things Distributed

JUNE 21, 2017

Today, I'm excited to announce the general availability of Amazon DynamoDB Accelerator (DAX) , a fully managed, highly available, in-memory cache that can speed up DynamoDB response times from milliseconds to microseconds, even at millions of requests per second. Adding caching when your app is already experiencing load is not easy.

Speed

Speed Cache Latency AWS

How To Add eBPF Observability To Your Product

Brendan Gregg

JULY 2, 2021

biolatency Disk I/O latency histogram heat map. cachestat File system cache statistics line charts. runqlat CPU scheduler latency heat map. This is thinking like a sysadmin who installs and maintains software, and not like a programmer who codes everything. execsnoop New processes (via exec(2)) table. That's the fast way.

Latency

Latency Cache Energy Systems

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

John McCalpin

FEBRUARY 17, 2025

“Latency” is the duration from the execution of a load instruction (to an address that misses in all the caches), and the completion of that load instruction when the data is returned from memory. . The example below is for a 2005-era processor with 60 ns memory latency and 6.4

Latency

Latency Hardware Cache Systems

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

There are two main types of DNS servers: authoritative servers and caching resolvers. But the real robustness of the DNS system comes through the way lookups are handled, which is what caching resolvers do. Caching techniques ensure that the DNS system doesnt get overloaded with queries.

Cloud

Cloud Internet Internet AWS

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

All Things Distributed

JULY 14, 2015

Streams provide you with the underlying infrastructure to create new applications, such as continuously updated free-text search indexes, caches, or other creative extensions requiring up-to-date table changes. DynamoDB Streams enables your application to get real-time notifications of your tables’ item-level changes.

Database

Database Lambda AWS IoT

How Caches Become Databases – And Why You Don’t Want This

VoltDB

MARCH 21, 2024

The most obvious and common way this happens is when companies try to evolve their caches into a data platform that can, for example, be used as highly available enterprise key-value stores for volatile data. Software always evolves, but not always in a good way. Why can’t you just use an abstract cache that loads data lazily?

Cache

Cache Database Storage Availability

Accelerating Data: Faster and More Scalable ElastiCache for Redis

All Things Distributed

OCTOBER 12, 2016

Three years ago, as part of our AWS Fast Data journey we introduced Amazon ElastiCache for Redis , a fully managed in-memory data store that operates at sub-millisecond latency. While caching continues to be a dominant use of ElastiCache for Redis, we see customers increasingly use it as an in-memory NoSQL database.

Scalability

Scalability Cache Analytics AWS

This spring: High-Performance and Low-Latency C++ (Stockholm) and ACCU (Bristol)

Sutter's Mill

FEBRUARY 13, 2017

Tue-Thu Apr 25-27: High-Performance and Low-Latency C++ (Stockholm). On April 25-27, I’ll be in Stockholm (Kista) giving a three-day seminar on “High-Performance and Low-Latency C++.” If you’re interested in attending, please check out the links, and I look forward to meeting and re-meeting many of you there.

Latency

Latency C++ Hardware Performance

Rethinking Server-Timing As A Critical Monitoring Tool

Smashing Magazine

MAY 16, 2022

All of the web performance monitoring software we work with provide RUM clients that put additional layers of monkey patching on the page to maintain direct access to a request being made or the response coming back. Here are a few that come to mind: Is this request served from the service worker cache?

Servers

Servers Monitoring Cache Network

AppFabric Caching: Retry Later

ScaleOut Software

MAY 15, 2014

We have spent a great deal of time at ScaleOut Software re-architecting our in-memory data grid (IMDG)’s code base to make best use of many cores and large memory. Likewise, object access paths must be heavily multi-threaded and avoid lock contention to minimize access latency and maximize throughput. Testing Scale-Up Performance.

Cache

Cache Servers Network Design

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

FEBRUARY 28, 2023

My personal opinion is that I don't see a widespread need for more capacity given horizontal scaling and servers that can already exceed 1 Tbyte of DRAM; bandwidth is also helpful, but I'd be concerned about the increased latency for adding a hop to more memory. Ford, et al., “TCP

Performance

Performance Latency Cache Virtualization

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

We push as much data processing as possible onto warehouse-scale computers and systems software. It’s limited by the laws of physics in terms of end-to-end latency. We saw earlier that there is end-user pressure to replace batch systems with much lower latency online systems. Emphasis mine ). Emphasis mine ).

Cloud

Cloud Big Data Latency Architecture

MySQL Key Performance Indicators (KPI) With PMM

Percona

JUNE 22, 2023

This includes metrics such as query execution time, the number of queries executed per second, and the utilization of query cache and adaptive hash index. query cache: Disable (query_cache_size: 0, query_cache_type:OFF) innodb_adaptive_hash_index: Check adaptive hash index usage to determine its efficiency.

Performance

Performance Monitoring Traffic Database

Fixing a slow site iteratively

CSS - Tricks

APRIL 1, 2021

Redirects are often pretty light in terms of the latency that they add to a website, but they are an easy first thing to check, and they can generally be removed with little effort. I’m going to update my referenced URL to the new site to help decrease latency that adds drag to the initial page load. Text-based assets.

Cache

Cache Social Media Media Website

Percentiles don’t work: Analyzing the distribution of response times for web services

Adrian Cockcroft

JANUARY 29, 2023

The mean and percentile measurements hide this structure, but the rest of this post will show how the structure can be measured and analyzed so that you can figure out a useful model of your system, understand what is driving the long tail of latencies and come up with better SLAs and measures of capacity.

Lambda

Lambda Latency Cache C++

AnyLog: a grand unification of the Internet of things

The Morning Paper

FEBRUARY 23, 2020

Caching of query results on the other hand, looks like a good business model, at large enough scale these might amount to pretty much the same thing). How’s that going to work given what we know about the throughput and latency of blockchains, and the associated mining costs?" An embodiment for structured data for IoT.

Blockchain

Blockchain Internet Internet IoT

Microservices, events, and upside-down databases

O'Reilly Software

JUNE 12, 2018

The benefits of modeling data as events as a mechanism to evolve our software systems. My own journey into microservices began with work I was doing to help organizations ship software more quickly. All too often, the software wasn’t designed in a way that made it easy to ship. And data was at the heart of the problem.

Database

Database Cache Architecture Latency

An Enterprise-Grade MongoDB Alternative Without Licensing or Lock-in

Percona

JULY 17, 2023

MongoDB Community Edition software might set the stage for achieving your high-volume database goals, but you quickly learn that its features fall short of your enterprise needs. So you look at MongoDB Enterprise software, but its costly and complex licensing structure does not meet your budget goals.

Open Source

Open Source Database Scalability Software

A thorough introduction to bpftrace

Brendan Gregg

AUGUST 18, 2019

bpftrace is a new open source tracer for Linux for analyzing production performance problems and troubleshooting software. For example, iostat(1), or a monitoring agent, may tell you your average disk latency, but not the distribution of this latency. software Kernel software-based events.

Latency

Latency C++ Cache Programming

Choosing a cloud DBMS: architectures and tradeoffs

The Morning Paper

AUGUST 29, 2019

For query executors that can be frequently started and stopped the authors explore performance with cold and warm caches (where applicable), and also the horizontal and vertical scaling performance. For cost calculations, the costs are a combination of compute costs, storage costs, data scan costs, and software license costs.

Architecture

Architecture Cloud Storage Serverless

A Decade of Dynamo: Powering the next wave of high-performance, internet-scale applications

All Things Distributed

OCTOBER 2, 2017

Performant – DynamoDB consistently delivers single-digit millisecond latencies even as your traffic volume increases. DynamoDB automatically re-distributes your data to healthy servers to ensure there are always multiple replicas of your data without you needing to intervene.

Internet

Internet Internet AWS Performance

Three Other Models of Computer System Performance: Part 1

ACM Sigarch

MARCH 18, 2019

Existing systems can be studied with measurement, while prospective systems are most often studied by extrapolating from measurements of prior systems or via simulation software that mimics target system function and provides performance metrics. Can one both minimize latency and maximize throughput for unscheduled work? Little’s Law.

Systems

Systems Latency Performance Analytics

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

ACM Sigarch

MAY 31, 2023

using Compute Express Link or CXL), organizing memory components for optimal performance, adapting system software traditionally designed for homogeneous memory systems, and developing memory abstractions and programming constructs for HCM management. Figure 2: Latency characteristics of memory technologies (source: Maruf et al.,

Latency

Latency Hardware Cache Architecture

Netflix’s Distributed Counter Abstraction

Optimising for High Latency Environments

Trending Sources

Consistent caching mechanism in Titus Gateway

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

Front-End: Cache Strategies You Should Know

Not a Single Trace

Dynatrace supports SnapStart for Lambda as an AWS launch partner

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Designing Instagram

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Implementing AWS well-architected pillars with automated workflows

Re-Architecting the Video Gatekeeper

Crucial Redis Monitoring Metrics You Must Watch

Supporting Diverse ML Systems at Netflix

Observability vs. monitoring: What’s the difference?

Redis® Monitoring Strategies for 2025

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Redis® Monitoring Strategies for 2024

GraphQL Search Indexing

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

Amazon DynamoDB Accelerator (DAX): Speed Up DynamoDB Response Times from Milliseconds to Microseconds without Application Rewrite.

How To Add eBPF Observability To Your Product

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

How Caches Become Databases – And Why You Don’t Want This

Accelerating Data: Faster and More Scalable ElastiCache for Redis

This spring: High-Performance and Low-Latency C++ (Stockholm) and ACCU (Bristol)

Rethinking Server-Timing As A Critical Monitoring Tool

AppFabric Caching: Retry Later

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Helios: hyperscale indexing for the cloud & edge – part 1

MySQL Key Performance Indicators (KPI) With PMM

Fixing a slow site iteratively

Percentiles don’t work: Analyzing the distribution of response times for web services

AnyLog: a grand unification of the Internet of things

Microservices, events, and upside-down databases

An Enterprise-Grade MongoDB Alternative Without Licensing or Lock-in

A thorough introduction to bpftrace

Choosing a cloud DBMS: architectures and tradeoffs

A Decade of Dynamo: Powering the next wave of high-performance, internet-scale applications

Three Other Models of Computer System Performance: Part 1

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

Stay Connected