Cache, Design and Latency - Technology Performance Pulse

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

The Three Cs: Concatenate, Compress, Cache

CSS Wizardry

OCTOBER 16, 2023

Caching them at the other end: How long should we cache files on a user’s device? Plotted on the same horizontal axis of 1.6s, the waterfalls speak for themselves: 201ms of cumulative latency; 109ms of cumulative download. 4,362ms of cumulative latency; 240ms of cumulative download. Cache This is the easy one.

Cache

Cache Latency Strategy Speed

Optimising for High Latency Environments

CSS Wizardry

SEPTEMBER 16, 2024

This gives fascinating insights into the network topography of our visitors, and how much we might be impacted by high latency regions. Round-trip-time (RTT) is basically a measure of latency—how long did it take to get from one endpoint to another and back again? What is RTT? Where Does CrUX’s RTT Data Come From?

Latency

Latency Cache Transportation Mobile

Designing Instagram

High Scalability

JANUARY 11, 2022

Design a photo-sharing platform similar to Instagram where users can upload their photos and share it with their followers. High Level Design. Component Design. API Design. We have provided the API design of posting an image on Instagram below. API Design. Problem Statement. Architecture. Fetching User Feed.

Design

Design Media Storage Logistics

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

We introduce a caching mechanism in the API gateway layer, allowing us to offload processing from singleton leader elected controllers without giving up strict data consistency and guarantees clients observe. We started seeing increased response latencies and leader servers running at dangerously high utilization.

Cache

Cache Latency Traffic Systems

Architectural Insights: Designing Efficient Multi-Layered Caching With Instagram Example

DZone

FEBRUARY 27, 2024

Caching is a critical technique for optimizing application performance by temporarily storing frequently accessed data, allowing for faster retrieval during subsequent requests. Multi-layered caching involves using multiple levels of cache to store and retrieve data.

Cache

Cache Efficiency Architecture Design

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Smashing Magazine

JANUARY 7, 2025

How To Design For High-Traffic Events And Prevent Your Website From Crashing How To Design For High-Traffic Events And Prevent Your Website From Crashing Saad Khan 2025-01-07T14:00:00+00:00 2025-01-07T22:04:48+00:00 This article is sponsored by Cloudways Product launches and sales typically attract large volumes of traffic.

Traffic

Traffic Website Design Cache

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

Yet, many are confined to a brief temporal window due to constraints in serving latency or training costs. These insights have shaped the design of our foundation model, enabling a transition from maintaining numerous small, specialized models to building a scalable, efficient system.

Tuning

Tuning Efficiency Latency Strategy

Benchmark (YCSB) numbers for Redis, MongoDB, Couchbase2, Yugabyte and BangDB

High Scalability

FEBRUARY 17, 2021

We note that for MongoDB update latency is really very low (low is better) compared to other dbs, however the read latency is on the higher side. The latency table shows that 99th percentile latency for Yugabyte is quite high compared to others (lower is better). Again Yugabyte latency is quite high. Conclusion.

Benchmarking

Benchmarking Latency C++ Database

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination. It also serves as central configuration of access patterns such as consistency or latency targets.

Latency

Latency Storage Cache Servers

Self-Host Your Static Assets

CSS Wizardry

MAY 31, 2019

Users might already have the file cached. If website-a.com links to [link] , and a user goes from there to website-b.com who also links to [link] , then the user will already have that file in their cache. On a slower, higher-latency connection, the story is much, mush worse. Penalty: Caching. All completely avoidable.

Cache

Cache Latency Infrastructure Website

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

Dynatrace accelerates business transformation with new AI observability solution

Dynatrace

JANUARY 31, 2024

The RAG process begins by summarizing and converting user prompts into queries that are sent to a search platform that uses semantic similarities to find relevant data in vector databases, semantic caches, or other online data sources.

Cache

Cache Azure Infrastructure Monitoring

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

SEPTEMBER 24, 2021

Uploading and downloading data always come with a penalty, namely latency. Figure 2: Cloud Resource and Job Sizes This initial architecture was designed at a time when packaging from a list of chunks was not possible and terabyte-sized files were not considered. For write operations, those challenges do not apply.

Cloud

Cloud Media Storage Cache

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

SEPTEMBER 8, 2020

This allows the app to query a list of “paths” in each HTTP request, and get specially formatted JSON (jsonGraph) that we use to cache the data and hydrate the UI. Being able to canary a new route let us verify latency and error rates were within acceptable limits. This meant that data that was static (e.g.

Latency

Latency Cache Java Traffic

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

To support this growth, we’ve revisited Pushy’s past assumptions and design decisions with an eye towards both Pushy’s future role and future stability. In our case, we value low latency — the faster we can read from KeyValue, the faster these messages can get delivered.

Latency

Latency Cache Tuning Efficiency

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Data ingestion pipeline with Operation Management

The Netflix TechBlog

MARCH 7, 2023

We designed a unique concept called Annotation Operations which allows teams to create data pipelines and easily write annotations without worrying about access patterns of their data from different applications. But we cannot search or present low latency retrievals from files Etc. This is obviously very expensive.

Media

Media Latency Architecture Database

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Since its inception , Metaflow has been designed to provide a human-friendly API for building data and ML (and today AI) applications and deploying them in our production infrastructure frictionlessly. Deployment: Cache To produce business value, all our Metaflow projects are deployed to work with other production systems.

Systems

Systems Media Cache Open Source

Crucial Redis Monitoring Metrics You Must Watch

Scalegrid

JANUARY 25, 2024

Key Takeaways Critical performance indicators such as latency, CPU usage, memory utilization, hit rate, and number of connected clients/slaves/evictions must be monitored to maintain Redis’s high throughput and low latency capabilities. It can achieve impressive performance, handling up to 50 million operations per second.

Metrics

Metrics Monitoring Latency Cache

Making Cloud.typography Fast(er)

CSS Wizardry

AUGUST 13, 2019

To further exacerbate the problem, the 302 response has a Cache-Control: must-revalidate, private. header , meaning that we will always make an outgoing request for this resource regardless of whether or not we’re hitting the site from a cold or a warm cache. com , which introduces yet more latency for the connection setup.

Latency

Latency Cache Strategy Media

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

This is a set of best practices and guidelines that help you design and operate reliable, secure, efficient, cost-effective, and sustainable systems in the cloud. Storing frequently accessed data in faster storage, usually in-memory caching, improves data retrieval speed and overall system performance. Beyond

AWS

AWS Efficiency Azure Cloud

How RevenueCat Manages Caching for Handling over 1.2 Billion Daily API Requests

InfoQ

JANUARY 29, 2024

RevenueCat extensively uses caching to improve the availability and performance of its product API while ensuring consistency. The company shared its techniques to deliver the platform, which can handle over 1.2 billion daily API requests. The team at RevenueCat created an open-source memcache client that provides several advanced features.

Cache

Cache Open Source Availability Performance

Expanding the Cloud: More memory, more caching and more performance for your data

All Things Distributed

SEPTEMBER 3, 2013

Amazon ElastiCache is a fully managed, in-memory caching service for customers to optimize the latency, performance and cost of their read workloads. Today, we are further expanding the choices available for designing and developing highly scalable and high performance apps.

Cache

Cache Cloud Performance Retail

MezzFS?—?Mounting object storage in Netflix’s media processing platform

The Netflix TechBlog

MARCH 6, 2019

Video encoding is what MezzFS was originally designed for and remains one of its canonical use cases, so we’ll focus on video encoding to describe the problem that MezzFS solves. Disk Caching? — ? MezzFS can be configured to cache objects on the local disk. Regional caching? —?Netflix What problem are we solving?

Media

Media Storage Processing Cache

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Today is a very exciting day as we release Amazon DynamoDB , a fast, highly reliable and cost-effective NoSQL database service designed for internet scale applications. Amazon DynamoDB offers low, predictable latencies at any scale. Comments ().

Scalability

Scalability Database Ecommerce Latency

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

Data lakehouses deliver the query response with minimal latency. Designed to provide a single source of truth for structured data, they offer a way for organizations to simplify data management by centralizing inputs. The performance of these queries needs to be at a level where they can support ad-hoc analytics use cases.

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

Cloudburst: stateful functions-as-a-service

The Morning Paper

FEBRUARY 6, 2020

.’ Stateless is fine until you need state, at which point the coarse-grained solutions offered by current platforms limit the kinds of application designs that work well. On the Cloudburst design teams’ wish list: A running function’s ‘hot’ data should be kept physically nearby for low-latency access.

Serverless

Serverless Lambda Cache Latency

Taskbar Latency and Kernel Calls

Randon ASCII

SEPTEMBER 8, 2019

The fact that this shows up as CPU time suggests that the reads were all hitting in the system cache and the CPU time was the kernel overhead (note ntoskrnl.exe on the first sampled call stack) of grabbing data from the cache. This means that there is no caching between RuntimeBroker.exe and this file.

Latency

Latency Cache Programming Operating System

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

Key Takeaways Redis offers complex data structures and additional features for versatile data handling, while Memcached excels in simplicity with a fast, multi-threaded architecture for basic caching needs. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios.

Cache

Cache Storage Scalability Architecture

Comparisons of Proxies for MySQL

Percona

MARCH 20, 2023

When designing an architecture, many components need to be considered before deciding on the best solution. In this context, features like filtering, firewalling, or caching are redundant and may consume resources that could be allocated to scaling. MySQL Router is the one that has the higher latency no matter what.

Games

Games Latency Traffic Cache

Meet Hydrogen: A React Framework For Dynamic, Contextual And Personalized E-Commerce

Smashing Magazine

NOVEMBER 8, 2021

As developers, we rightfully obsess about the customer experience, relentlessly working to squeeze every millisecond out of the critical rendering path, optimize input latency, and eliminate jank. On top of this foundation, we add layers of caching, prerendering and edge delivery optimizations — not the other way around.

Cache

Cache Best Practices Strategy Servers

Amazon DynamoDB Accelerator (DAX): Speed Up DynamoDB Response Times from Milliseconds to Microseconds without Application Rewrite.

All Things Distributed

JUNE 21, 2017

Today, I'm excited to announce the general availability of Amazon DynamoDB Accelerator (DAX) , a fully managed, highly available, in-memory cache that can speed up DynamoDB response times from milliseconds to microseconds, even at millions of requests per second. Adding caching when your app is already experiencing load is not easy.

Speed

Speed Cache Latency AWS

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

All Things Distributed

JULY 14, 2015

Streams provide you with the underlying infrastructure to create new applications, such as continuously updated free-text search indexes, caches, or other creative extensions requiring up-to-date table changes. DynamoDB Streams simplifies and improves this design pattern with a distributed systems approach.

Database

Database Lambda AWS IoT

A one size fits all database doesn't fit anyone

All Things Distributed

JUNE 21, 2018

Further, with the growth and scale of Amazon.com, boundless horizontal scale needed to be a key design point--scaling up simply wasn't an option. Use cases such as gaming, ad tech, and IoT lend themselves particularly well to the key-value data model where the access patterns require low-latency Gets/Puts for known key values.

Database

Database AWS Games Latency

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

There are two main types of DNS servers: authoritative servers and caching resolvers. But the real robustness of the DNS system comes through the way lookups are handled, which is what caching resolvers do. Caching techniques ensure that the DNS system doesnt get overloaded with queries. Recent Entries. Amazon DynamoDB â??

Cloud

Cloud Internet Internet AWS

Fast key-value stores: an idea whose time has come and gone

The Morning Paper

JUNE 23, 2019

Why are developers using RInK systems as part of their design? Generally to cache data (including non-persistent data that never sees a backing store), to share non-persistent data across application services (e.g. The network latency of fetching data over the network, even considering fast data center networks. Who knew! ;).

Cache

Cache Latency Google Network

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

FEBRUARY 28, 2023

My personal opinion is that I don't see a widespread need for more capacity given horizontal scaling and servers that can already exceed 1 Tbyte of DRAM; bandwidth is also helpful, but I'd be concerned about the increased latency for adding a hop to more memory. Ford, et al., “TCP

Performance

Performance Latency Cache Virtualization

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

InfoQ

JULY 3, 2023

LinkedIn introduced Couchbase as a centralized caching tier for scaling member profile reads to handle increasing traffic that has outgrown their existing database cluster. The new solution achieved over 99% hit rate, helped reduce tail latencies by more than 60% and costs by 10% annually. By Rafal Gancarz

Cache

Cache Latency Traffic Database

Expanding the Cloud: Enabling Globally Distributed Applications and Disaster Recovery

All Things Distributed

NOVEMBER 26, 2013

About 5 years ago, I introduced you to AWS Availability Zones, which are distinct locations within a Region that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same region.

Cloud

Cloud AWS Traffic Latency

The evolution of single-core bandwidth in multicore processors

John McCalpin

APRIL 25, 2023

For most high-end processors these values have remained in the range of 75% to 85% of the peak DRAM bandwidth of the system over the past 15-20 years — an amazing accomplishment given the increase in core count (with its associated cache coherence issues), number of DRAM channels, and ever-increasing pipelining of the DRAMs themselves.

Benchmarking

Benchmarking Cache Latency Tuning

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

There are several pressures on such a design: The volume of data continues to grow – by another 2 orders of magnitude this decade according to IDC – as does the velocity of data arrival and the variance in arrival rates. It’s limited by the laws of physics in terms of end-to-end latency. Emphasis mine ).

Cloud

Cloud Big Data Latency Architecture

150 successful machine learning models: 6 lessons learned at Booking.com

The Morning Paper

OCTOBER 6, 2019

Prediction serving latency matters. Lesson 4: prediction serving latency matters. In a experiment introducing synthetic latency, Booking.com found that an increase of about 30% in latency cost about 0.5% Even mathematically simple models have the potential of introducing relevant latency.

Latency

Latency Metrics Cache Design

Rethinking Server-Timing As A Critical Monitoring Tool

Smashing Magazine

MAY 16, 2022

Due to the design of web browser APIs, there are currently no mechanisms for querying requests and their relative responses after the fact. Here are a few that come to mind: Is this request served from the service worker cache? How long has a resource been in service worker cache? No References Required.

Servers

Servers Monitoring Cache Network

Netflix’s Distributed Counter Abstraction

The Three Cs: Concatenate, Compress, Cache

Trending Sources

Optimising for High Latency Environments

Designing Instagram

Consistent caching mechanism in Titus Gateway

Architectural Insights: Designing Efficient Multi-Layered Caching With Instagram Example

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Foundation Model for Personalized Recommendation

Benchmark (YCSB) numbers for Redis, MongoDB, Couchbase2, Yugabyte and BangDB

Introducing Netflix’s Key-Value Data Abstraction Layer

Self-Host Your Static Assets

Predictive CPU isolation of containers at Netflix

Dynatrace accelerates business transformation with new AI observability solution

Netflix Cloud Packaging in the Terabyte Era

Seamlessly Swapping the API backend of the Netflix Android app

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Introducing Netflix TimeSeries Data Abstraction Layer

Data ingestion pipeline with Operation Management

Supporting Diverse ML Systems at Netflix

Crucial Redis Monitoring Metrics You Must Watch

Making Cloud.typography Fast(er)

Implementing AWS well-architected pillars with automated workflows

How RevenueCat Manages Caching for Handling over 1.2 Billion Daily API Requests

Expanding the Cloud: More memory, more caching and more performance for your data

MezzFS?—?Mounting object storage in Netflix’s media processing platform

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Cloudburst: stateful functions-as-a-service

Taskbar Latency and Kernel Calls

Redis vs Memcached in 2024

Comparisons of Proxies for MySQL

Meet Hydrogen: A React Framework For Dynamic, Contextual And Personalized E-Commerce

Amazon DynamoDB Accelerator (DAX): Speed Up DynamoDB Response Times from Milliseconds to Microseconds without Application Rewrite.

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

A one size fits all database doesn't fit anyone

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Fast key-value stores: an idea whose time has come and gone

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

Expanding the Cloud: Enabling Globally Distributed Applications and Disaster Recovery

The evolution of single-core bandwidth in multicore processors

Helios: hyperscale indexing for the cloud & edge – part 1

150 successful machine learning models: 6 lessons learned at Booking.com

Rethinking Server-Timing As A Critical Monitoring Tool

Stay Connected