Cache, Efficiency and Latency - Technology Performance Pulse

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

The Power of Caching: Boosting API Performance and Scalability

DZone

AUGUST 16, 2023

Caching is the process of storing frequently accessed data or resources in a temporary storage location, such as memory or disk, to improve retrieval speed and reduce the need for repetitive processing. Bandwidth optimization: Caching reduces the amount of data transferred over the network, minimizing bandwidth usage and improving efficiency.

Cache

Cache Scalability Performance Latency

Architectural Insights: Designing Efficient Multi-Layered Caching With Instagram Example

DZone

FEBRUARY 27, 2024

Caching is a critical technique for optimizing application performance by temporarily storing frequently accessed data, allowing for faster retrieval during subsequent requests. Multi-layered caching involves using multiple levels of cache to store and retrieve data.

Cache

Cache Efficiency Architecture Design

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

Yet, many are confined to a brief temporal window due to constraints in serving latency or training costs. These insights have shaped the design of our foundation model, enabling a transition from maintaining numerous small, specialized models to building a scalable, efficient system.

Tuning

Tuning Efficiency Latency Strategy

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination. This model supports both simple and complex data models, balancing flexibility and efficiency.

Latency

Latency Storage Cache Servers

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

Dynatrace accelerates business transformation with new AI observability solution

Dynatrace

JANUARY 31, 2024

The RAG process begins by summarizing and converting user prompts into queries that are sent to a search platform that uses semantic similarities to find relevant data in vector databases, semantic caches, or other online data sources. Observing AI models Running AI models at scale can be resource-intensive.

Cache

Cache Azure Infrastructure Monitoring

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

SEPTEMBER 24, 2021

Figure 1: A Simplified Video Processing Pipeline With this architecture, chunk encoding is very efficient and processed in distributed cloud computing instances. Uploading and downloading data always come with a penalty, namely latency. For write operations, those challenges do not apply.

Cloud

Cloud Media Storage Cache

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

With these clear benefits, we continued to build out this functionality for more devices, enabling the same efficiency wins. It was very efficient, but it had a set job size, requiring manual intervention if we wanted to horizontally scale it, and it required manual intervention when rolling out a new version.

Latency

Latency Cache Tuning Efficiency

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

This is a set of best practices and guidelines that help you design and operate reliable, secure, efficient, cost-effective, and sustainable systems in the cloud. The framework comprises six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability.

AWS

AWS Efficiency Azure Cloud

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

SEPTEMBER 8, 2020

This allows the app to query a list of “paths” in each HTTP request, and get specially formatted JSON (jsonGraph) that we use to cache the data and hydrate the UI. Being able to canary a new route let us verify latency and error rates were within acceptable limits. This meant that data that was static (e.g.

Latency

Latency Cache Java Traffic

Re-Architecting the Video Gatekeeper

The Netflix TechBlog

JULY 12, 2019

The Tech Hollow , an OSS technology we released a few years ago, has been best described as a total high-density near cache : Total : The entire dataset is cached on each node?—?there there is no eviction policy, and there are no cache misses. Near : the cache exists in RAM on any instance which requires access to the dataset.

Cache

Cache Architecture Latency Engineering

Crucial Redis Monitoring Metrics You Must Watch

Scalegrid

JANUARY 25, 2024

Key Takeaways Critical performance indicators such as latency, CPU usage, memory utilization, hit rate, and number of connected clients/slaves/evictions must be monitored to maintain Redis’s high throughput and low latency capabilities. These essential data points heavily influence both stability and efficiency within the system.

Metrics

Metrics Monitoring Latency Cache

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

While data lakes and data warehousing architectures are commonly used modes for storing and analyzing data, a data lakehouse is an efficient third way to store and analyze data that unifies the two architectures while preserving the benefits of both. Data lakehouses deliver the query response with minimal latency. Data warehouses.

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

MezzFS?—?Mounting object storage in Netflix’s media processing platform

The Netflix TechBlog

MARCH 6, 2019

This is just one of many use cases that MezzFS supports, but all the use cases share a similar theme: stream the right bits of a remote object efficiently and expose those bits as a file on the filesystem. Disk Caching? — ? MezzFS can be configured to cache objects on the local disk. Regional caching? —?Netflix

Media

Media Storage Processing Cache

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Under the hood, Titus is powered by Kubernetes , but it provides a thick layer of enhancements over off-the-shelf Kubernetes, to make it more observable , secure , scalable , and cost-efficient. Deployment: Cache To produce business value, all our Metaflow projects are deployed to work with other production systems.

Systems

Systems Media Cache Open Source

Expanding the Cloud: More memory, more caching and more performance for your data

All Things Distributed

SEPTEMBER 3, 2013

Amazon ElastiCache is a fully managed, in-memory caching service for customers to optimize the latency, performance and cost of their read workloads. It provides customers with familiar MySQL, Microsoft SQL Server or Oracle database engines while simplifying the monitoring and management of complex RDBMSs.

Cache

Cache Cloud Performance Retail

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

Key Takeaways Redis offers complex data structures and additional features for versatile data handling, while Memcached excels in simplicity with a fast, multi-threaded architecture for basic caching needs. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios.

Cache

Cache Storage Scalability Architecture

Redis® Monitoring Strategies for 2025

Scalegrid

JANUARY 21, 2025

Identifying key Redis metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. To monitor Redis instances effectively, collect Redis metrics focusing on cache hit ratio, memory allocated, and latency threshold.

Strategy

Strategy Monitoring Latency DevOps

Redis® Monitoring Strategies for 2024

Scalegrid

DECEMBER 21, 2023

Identifying key Redis® metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. To monitor Redis® instances effectively, collect Redis metrics focusing on cache hit ratio, memory allocated, and latency threshold.

Strategy

Strategy Monitoring Latency DevOps

Comparisons of Proxies for MySQL

Percona

MARCH 20, 2023

When deciding what to pick, there are many things to consider, like where the proxy needs to be, if it “just” needs to redirect the connections, or if more features need to be in, like caching and filtering, or if it needs to be integrated with some MySQL embedded automation. Given that, there never was a single straight answer.

Games

Games Latency Traffic Cache

Meet Hydrogen: A React Framework For Dynamic, Contextual And Personalized E-Commerce

Smashing Magazine

NOVEMBER 8, 2021

As developers, we rightfully obsess about the customer experience, relentlessly working to squeeze every millisecond out of the critical rendering path, optimize input latency, and eliminate jank. On top of this foundation, we add layers of caching, prerendering and edge delivery optimizations — not the other way around.

Cache

Cache Best Practices Strategy Servers

Cloudburst: stateful functions-as-a-service

The Morning Paper

FEBRUARY 6, 2020

Last week we looked at a function shipping solution to the problem; Cloudburst uses the more common data shipping to bring data to caches next to function runtimes (though you could also make a case that the scheduling algorithm placing function execution in locations where the data is cached a flavour of function-shipping too).

Serverless

Serverless Lambda Cache Latency

MySQL Key Performance Indicators (KPI) With PMM

Percona

JUNE 22, 2023

We will also discuss related configuration variables to consider that can impact these KPIs, helping you gain a comprehensive understanding of your MySQL server’s performance and efficiency. Query performance Query performance is a key performance indicator (KPI) in MySQL, as it measures the efficiency and speed of query execution.

Performance

Performance Monitoring Traffic Database

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

Amazon DynamoDB offers low, predictable latencies at any scale. This is not just predictability of median performance and latency, but also at the end of the distribution (the 99.9th percentile), so we could provide acceptable performance for virtually every customer. s read latency, particularly as dataset sizes grow.

Scalability

Scalability Database Ecommerce Latency

A one size fits all database doesn't fit anyone

All Things Distributed

JUNE 21, 2018

Use cases such as gaming, ad tech, and IoT lend themselves particularly well to the key-value data model where the access patterns require low-latency Gets/Puts for known key values. The purpose of DynamoDB is to provide consistent single-digit millisecond latency for any scale of workloads.

Database

Database AWS Games Latency

Mastering Disk Space Management with MongoDB® Storage Engines

Scalegrid

MAY 11, 2024

In-Memory Storage Engine, as the name suggests, stores data in memory for faster performance and lower latencies. It uses a filesystem cache and write-ahead log for crash recovery. MongoDB makes use of both the filesystem cache and the WiredTiger internal cache. Compaction operation defragments data files & indexes.

Storage

Storage Engineering Cache Database

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

All Things Distributed

JULY 14, 2015

Streams provide you with the underlying infrastructure to create new applications, such as continuously updated free-text search indexes, caches, or other creative extensions requiring up-to-date table changes. This new feature will help them manage inventory better to deliver a good customer experience while gaining more business efficiency.

Database

Database Lambda AWS IoT

This spring: High-Performance and Low-Latency C++ (Stockholm) and ACCU (Bristol)

Sutter's Mill

FEBRUARY 13, 2017

Tue-Thu Apr 25-27: High-Performance and Low-Latency C++ (Stockholm). On April 25-27, I’ll be in Stockholm (Kista) giving a three-day seminar on “High-Performance and Low-Latency C++.” If you’re interested in attending, please check out the links, and I look forward to meeting and re-meeting many of you there.

Latency

Latency C++ Hardware Performance

Accelerating Data: Faster and More Scalable ElastiCache for Redis

All Things Distributed

OCTOBER 12, 2016

Three years ago, as part of our AWS Fast Data journey we introduced Amazon ElastiCache for Redis , a fully managed in-memory data store that operates at sub-millisecond latency. While caching continues to be a dominant use of ElastiCache for Redis, we see customers increasingly use it as an in-memory NoSQL database.

Scalability

Scalability Cache Analytics AWS

Fast key-value stores: an idea whose time has come and gone

The Morning Paper

JUNE 23, 2019

Generally to cache data (including non-persistent data that never sees a backing store), to share non-persistent data across application services (e.g. ” Even re-reading that today, the letter of the law there is surprisingly strict to me: you can use the local memory space or filesystem as a brief single transaction cache, but no more.

Cache

Cache Latency Google Network

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

On the surface this is a paper about fast data ingestion from high-volume streams, with indexing to support efficient querying. It’s limited by the laws of physics in terms of end-to-end latency. In Helios, this translates to coming up with efficient techniques for splitting computation between end devices, edge, and cloud.

Cloud

Cloud Big Data Latency Architecture

Procella: unifying serving and analytical data at YouTube

The Morning Paper

SEPTEMBER 10, 2019

When each of those use cases is powered by a dedicated back-end, investments in better performance, improved scalability and efficiency etc. That’s hard for many reasons, including the differing trade-offs between throughput and latency that need to be made across the use cases. Cache all the things. are divided.

Analytics

Analytics Latency Cache Google

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

FEBRUARY 28, 2023

My personal opinion is that I don't see a widespread need for more capacity given horizontal scaling and servers that can already exceed 1 Tbyte of DRAM; bandwidth is also helpful, but I'd be concerned about the increased latency for adding a hop to more memory. Ford, et al., “TCP

Performance

Performance Latency Cache Virtualization

Fixing a slow site iteratively

CSS - Tricks

APRIL 1, 2021

Redirects are often pretty light in terms of the latency that they add to a website, but they are an easy first thing to check, and they can generally be removed with little effort. I’m going to update my referenced URL to the new site to help decrease latency that adds drag to the initial page load. Text-based assets.

Cache

Cache Social Media Media Website

AppFabric Caching: Retry Later

ScaleOut Software

MAY 15, 2014

For example, the IMDG must be able to efficiently create millions of objects in each server to make use of its huge storage capacity. Likewise, object access paths must be heavily multi-threaded and avoid lock contention to minimize access latency and maximize throughput. Please retry later.

Cache

Cache Servers Network Design

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Durability Availability Fault tolerance These combined outcomes help minimize latency experienced by clients spread across different geographical regions. This makes adopting such sophisticated multi-node-based arrangements exceedingly advantageous from both operational efficiency and financial viewpoints.

Storage

Storage Systems Big Data Azure

Optimizing CDN Architecture: Enhancing Performance and User Experience

IO River

NOVEMBER 2, 2023

â€A content delivery network (CDN) is a distributed network of servers strategically located across multiple geographical locations to deliver web content to end users more efficiently. CDNs cache content on edge servers distributed globally, reducing the distance between users and the content they want.â€CDNs

Architecture

Architecture Cache Performance Latency

Optimizing CDN Architecture: Enhancing Performance and User Experience

IO River

NOVEMBER 2, 2023

A content delivery network (CDN) is a distributed network of servers strategically located across multiple geographical locations to deliver web content to end users more efficiently. When an edge server goes down, end users in the affected region may experience an increase in latency for that specific location.

Architecture

Architecture Cache Performance Latency

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

It is required for a database to execute this query efficiently, which typically applies for systems that implement range scans over primary keys. This way, log event processing can resume event-by-event afterwards, eventually discovering the watermarks, without ever needing to cache log event entries.

Database

Database Traffic Transportation Open Source

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

It is required for a database to execute this query efficiently, which typically applies for systems that implement range scans over primary keys. This way, log event processing can resume event-by-event afterwards, eventually discovering the watermarks, without ever needing to cache log event entries.

Database

Database Traffic Transportation Open Source

How We Optimized Performance To Serve A Global Audience

Smashing Magazine

AUGUST 3, 2023

This is why the async and deferred attributes are crucial, as they ensure an efficient, seamless web browsing experience. It also opens up the possibility for more effective use of caching strategies, potentially enhancing load times further. The shorter the TTFB, the better the perceived speed of the site from the user’s perspective.

Performance

Performance Cache Traffic Metrics

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

ACM Sigarch

MAY 31, 2023

Heterogeneous and Composable Memory (HCM) offers a feasible solution for terabyte- or petabyte-scale systems, addressing the performance and efficiency demands of emerging big-data applications. even lowered the latency by introducing a multi-headed device that collapses switches and memory controllers. The recently announced CXL3.0

Latency

Latency Hardware Cache Architecture

Netflix’s Distributed Counter Abstraction

The Power of Caching: Boosting API Performance and Scalability

Trending Sources

Architectural Insights: Designing Efficient Multi-Layered Caching With Instagram Example

Foundation Model for Personalized Recommendation

Introducing Netflix’s Key-Value Data Abstraction Layer

Predictive CPU isolation of containers at Netflix

Dynatrace accelerates business transformation with new AI observability solution

Netflix Cloud Packaging in the Terabyte Era

Introducing Netflix TimeSeries Data Abstraction Layer

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Implementing AWS well-architected pillars with automated workflows

Seamlessly Swapping the API backend of the Netflix Android app

Re-Architecting the Video Gatekeeper

Crucial Redis Monitoring Metrics You Must Watch

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

MezzFS?—?Mounting object storage in Netflix’s media processing platform

Supporting Diverse ML Systems at Netflix

Expanding the Cloud: More memory, more caching and more performance for your data

Redis vs Memcached in 2024

Redis® Monitoring Strategies for 2025

Redis® Monitoring Strategies for 2024

Comparisons of Proxies for MySQL

Meet Hydrogen: A React Framework For Dynamic, Contextual And Personalized E-Commerce

Cloudburst: stateful functions-as-a-service

MySQL Key Performance Indicators (KPI) With PMM

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

A one size fits all database doesn't fit anyone

Mastering Disk Space Management with MongoDB® Storage Engines

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

This spring: High-Performance and Low-Latency C++ (Stockholm) and ACCU (Bristol)

Accelerating Data: Faster and More Scalable ElastiCache for Redis

Fast key-value stores: an idea whose time has come and gone

Helios: hyperscale indexing for the cloud & edge – part 1

Procella: unifying serving and analytical data at YouTube

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Fixing a slow site iteratively

AppFabric Caching: Retry Later

What is a Distributed Storage System

Optimizing CDN Architecture: Enhancing Performance and User Experience

Optimizing CDN Architecture: Enhancing Performance and User Experience

DBLog: A Generic Change-Data-Capture Framework

DBLog: A Generic Change-Data-Capture Framework

How We Optimized Performance To Serve A Global Audience

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

Stay Connected