Latency, Scalability and Storage - Technology Performance Pulse

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

This decoupling simplifies system architecture and supports scalability in distributed environments. Message brokers handle validation, routing, storage, and delivery, ensuring efficient and reliable communication. Scalability and Redundancy Both Kafka and RabbitMQ are built for scalability and redundancy but take different approaches.

Latency

Latency Analytics Architecture Storage

Efficient Multimodal Data Processing: A Technical Deep Dive

DZone

FEBRUARY 27, 2025

In this article, I will walk through a comprehensive end-to-end architecture for efficient multimodal data processing while striking a balance in scalability, latency, and accuracy by leveraging GPU-accelerated pipelines, advanced neural networks , and hybrid storage platforms.

Efficiency

Efficiency Processing Latency Storage

The Power of Caching: Boosting API Performance and Scalability

DZone

AUGUST 16, 2023

Caching is the process of storing frequently accessed data or resources in a temporary storage location, such as memory or disk, to improve retrieval speed and reduce the need for repetitive processing. Bandwidth optimization: Caching reduces the amount of data transferred over the network, minimizing bandwidth usage and improving efficiency.

Cache

Cache Scalability Performance Latency

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

The Challenge of Title Launch Observability As engineers, were wired to track system metrics like error rates, latencies, and CPU utilizationbut what about metrics that matter to a titlessuccess? The complexity of these operational demands underscored the urgent need for a scalable solution.

Traffic

Traffic Scalability Strategy Monitoring

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Key Takeaways RabbitMQ improves scalability and fault tolerance in distributed systems by decoupling applications, enabling reliable message exchanges. This decoupling is crucial in modern architectures where scalability and fault tolerance are paramount. Keeping queues short maintains a responsive and efficient RabbitMQ setup.

Best Practices

Best Practices Traffic Strategy Scalability

Designing Instagram

High Scalability

JANUARY 11, 2022

Firstly, the synchronous process which is responsible for uploading image content on file storage, persisting the media metadata in graph data-storage, returning the confirmation message to the user and triggering the process to update the user activity. Fetching User Feed. Sample Queries supported by Graph Database. Optimization.

Design

Design Media Storage Logistics

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Central to this infrastructure is our use of multiple online distributed databases such as Apache Cassandra , a NoSQL database known for its high availability and scalability. It also serves as central configuration of access patterns such as consistency or latency targets.

Latency

Latency Storage Cache Servers

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

Werner Vogels weblog on building scalable and robust distributed systems. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. The original Dynamo design was based on a core set of strong distributed systems principles resulting in an ultra-scalable and highly reliable database system.

Scalability

Scalability Database Ecommerce Latency

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Infrastructure

Mastering Disk Space Management with MongoDB® Storage Engines

Scalegrid

MAY 11, 2024

MongoDB offers several storage engines that cater to various use cases. The default storage engine in earlier versions was MMAPv1, which utilized memory-mapped files and document-level locking. The newer, pluggable storage engine, WiredTiger, addresses this by using prefix compression, collection-level locking, and row-based storage.

Storage

Storage Engineering Cache Database

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

Nine ways technology executives can get significant business value with the right observability platform

Dynatrace

MAY 21, 2024

That’s because it does not require any pre-prepared schemas, and access to cold/hot storage is fully automatic and with zero latency. Insights are therefore dispersed in a multitude of data lakes, storage systems, and reporting platforms. Moreover, it is fast, powered by its massively parallel processing data lakehouse.

Technology

Technology Technology Analytics Storage

Improved Alerting with Atlas Streaming Eval

The Netflix TechBlog

APRIL 27, 2023

While we were able to put out the immediate fire by disabling the newly created alerts, this incident raised some critical concerns around the scalability of our alerting system. It became clear to us that we needed to solve the scalability problem with a fundamentally different approach. OK, Results?

Storage

Storage Cache Metrics Database

Optimize your environment: Unveiling Dynatrace Hyper-V extension for enhanced performance and efficient troubleshooting

Dynatrace

OCTOBER 23, 2023

Secondly, determining the correct allocation of resources (CPU, memory, storage) to each virtual machine to ensure optimal performance without over-provisioning can be difficult. Firstly, managing virtual networks can be complex as networking in a virtual environment differs significantly from traditional networking.

Efficiency

Efficiency Virtualization Hardware Performance

Stuff The Internet Says On Scalability For September 14th, 2018

High Scalability

SEPTEMBER 14, 2018

NSF : When the HL-LHC reaches full capability in 2026, it is expected to produce more than 1 billion particle collisions every second, marking a 10-fold increase that will require a similar 10-fold increase in data processing and storage, including tools to collect, analyze, and record the most relevant events. So many more quotes.

Internet

Internet Internet Scalability Education

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls. Our distributed tracing infrastructure is grouped into three sections: tracer library instrumentation, stream processing, and storage.

Infrastructure

Infrastructure Transportation Storage Open Source

Why growing AI adoption requires an AI observability strategy

Dynatrace

JANUARY 17, 2024

AI requires more compute and storage. Training AI data is resource-intensive and costly, again, because of increased computational and storage requirements. As a result, AI observability supports cloud FinOps efforts by identifying how AI adoption spikes costs because of increased usage of storage and compute resources.

Strategy

Strategy Artificial Intelligence Storage Cloud

Optimize Citrix platform performance and user experience with Dynatrace (GA)

Dynatrace

JANUARY 15, 2020

Citrix is a sophisticated, efficient, and highly scalable application delivery platform that is itself comprised of anywhere from hundreds to thousands of servers. Citrix platform performance—optimize your Citrix landscape with insights into user load and screen latency per server. Citrix VDA. SAP server. Citrix VDA. Citrix StoreFront.

Latency

Latency Performance Virtualization Infrastructure

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

The first phase involves validating functional correctness, scalability, and performance concerns and ensuring the new systems’ resilience before the migration. It provides a good read on the availability and latency ranges under different production conditions.

Traffic

Traffic Latency Tuning Systems

Stuff The Internet Says On Scalability For December 21st, 2018

High Scalability

DECEMBER 21, 2018

It's HighScalability time: Have a very scalable Xmas everyone! Tim Bray : How to talk about [Serverless Latency] · To start with, don’t just say “I need 120ms.” See you in the New Year. Do you like this sort of Stuff? Please support me on Patreon. I'd really appreciate it. Explain the Cloud Like I'm 10.

Internet

Internet Internet Scalability Serverless

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

The data warehouse is not designed to serve point requests from microservices with low latency. Therefore, we must efficiently move data from the data warehouse to a global, low-latency and highly-reliable key-value store. As most key-value storage engines support efficiently deleting a namespace (e.g.

Latency

Latency Storage Big Data Tuning

The AWS Storage Gateway - All Things Distributed

All Things Distributed

JANUARY 23, 2012

Werner Vogels weblog on building scalable and robust distributed systems. Expanding the Cloud - The AWS Storage Gateway. Today Amazon Web Services has launched the AWS Storage Gateway, making the power of secureÂ and reliable cloud storage accessible from customersâ?? s storage infrastructure. Comments ().

Storage

Storage AWS Virtualization Cloud

Scale up your Dynatrace Managed software-intelligence deployment with self-healing insights

Dynatrace

JUNE 8, 2020

Meeting the requirements of a tier-0 application demands the highest level of reliability and scalability, which Dynatrace enables through extensive self-monitoring and self-healing across the entire application stack down to the infrastructure level. It is more critical to our business than any other revenue-driving application.”

Software

Software Software Programming Metrics

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Smashing Magazine

JANUARY 7, 2025

For example, you can switch to a scalable cloud-based web host, or compress/optimize images to save bandwidth. Choose A Scalable Web Host The most convenient way to design a high-traffic website without worrying about website crashes is to upgrade your web hosting solution. Caching can help your website combat this issue.

Traffic

Traffic Website Design Cache

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

When a new leader is elected it loads all data from external storage. In that scenario, the system would need to deal with the data propagation latency directly, for example, by use of timeouts or client-originated update tracking mechanisms. Active data includes jobs and tasks that are currently running.

Cache

Cache Latency Traffic Systems

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Data Overload and Storage Limitations As IoT and especially industrial IoT -based devices proliferate, the volume of data generated at the edge has skyrocketed.

IoT

IoT Energy Logistics Latency

Stuff The Internet Says On Scalability For July 20th, 2018

High Scalability

JULY 20, 2018

A typical example of modern "microservices-inspired" Java application would function along these lines: Netflix : We observed during experimentation that RAM random read latencies were rarely higher than 1 microsecond whereas typical SSD random read speeds are between 100–500 microseconds. There are a few more quotes.

Internet

Internet Internet Scalability Automotive

How Edge and Industrial IoT Will Converge in 2025: A New Era for Smart Manufacturing

VoltDB

NOVEMBER 20, 2024

This proximity reduces latency and enables real-time decision-making. Edge computing will process and filter this data before sending only the most relevant insights to the cloud, making large-scale IIoT deployments more feasible and reducing cloud storage and bandwidth costs.

IoT

IoT Energy Latency Automotive

Distributed Algorithms in NoSQL Databases

Highly Scalable

SEPTEMBER 18, 2012

Scalability is one of the main drivers of the NoSQL movement. Historically, NoSQL paid a lot of attention to tradeoffs between consistency, fault-tolerance and performance to serve geographically distributed systems, low-latency or highly available applications. Read/Write latency. Read/Write scalability. Data Placement.

Database

Database Latency C++ Scalability

What’s New at ScaleGrid – September 2024

Scalegrid

SEPTEMBER 10, 2024

At ScaleGrid, we’re always pushing the boundaries to offer more flexibility and scalability to our customers. Additionally, we’ve added the Philadelphia AWS Local Zone , helping to reduce latency for customers operating in the eastern U.S.

Latency

Latency AWS Storage Tuning

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Dynatrace

DECEMBER 22, 2019

AWS offers a broad set of global, cloud-based services including computing, storage, networking, Internet of Things (IoT), and many others. You can use these services in combinations that are tailored to help your business move faster, lower IT costs, and support scalability. Amazon Simple Storage Service (S3). Amazon Redshift.

AWS

AWS Metrics IoT Storage

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

High Scalability

SEPTEMBER 8, 2018

They've posted about Anna's new superpowers in Going Fast and Cheap: How We Made Anna Autoscale : Using Anna v0 as an in-memory storage engine, we set out to address the cloud storage problems described above. Each storage server collects statistics about the requests it serves, the data it stores, etc. Related Articles.

Storage

Storage Performance AWS Cloud

Evolution of ML Fact Store

The Netflix TechBlog

APRIL 26, 2022

The first version of our logger library optimized for storage by deduplicating facts and optimized for network i/o using different compression methods for each fact. Since we were optimizing at the logging level for storage and performance, we had less data and metadata to play with to optimize the query performance.

Storage

Storage Design Scalability Latency

Observability platform vs. observability tools

Dynatrace

DECEMBER 22, 2021

Metrics are measures of critical system values, such as CPU utilization or average write latency to persistent storage. A database could start executing a storage management process that consumes database server resources. Observability is made up of three key pillars: metrics, logs, and traces.

Artificial Intelligence

Artificial Intelligence Metrics Architecture DevOps

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. Technology advancements in content creation and consumption have also increased its data footprint.

AWS

AWS Entertainment Open Source Benchmarking

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Dynatrace

DECEMBER 22, 2019

AWS offers a broad set of global, cloud-based services including computing, storage, networking, Internet of Things (IoT), and many others. You can use these services in combinations that are tailored to help your business move faster, lower IT costs, and support scalability. Amazon Simple Storage Service (S3). Amazon Redshift.

AWS

AWS Metrics IoT Storage

Accelerating Data: Faster and More Scalable ElastiCache for Redis

All Things Distributed

OCTOBER 12, 2016

Three years ago, as part of our AWS Fast Data journey we introduced Amazon ElastiCache for Redis , a fully managed in-memory data store that operates at sub-millisecond latency. This allows for faster failover times while minimizing latency. Amazon’s enhancements address many day-to-day challenges with running Redis.

Scalability

Scalability Cache Analytics AWS

A one size fits all database doesn't fit anyone

All Things Distributed

JUNE 21, 2018

As I have talked about before, one of the reasons why we built Amazon DynamoDB was that Amazon was pushing the limits of what was a leading commercial database at the time and we were unable to sustain the availability, scalability, and performance needs that our growing Amazon.com business demanded. The opposite is true.

Database

Database AWS Games Latency

Netflix Drive

The Netflix TechBlog

MAY 5, 2021

Netflix Drive aims to solve this problem of exposing different namespaces and attaching appropriate access control to help build a scalable, performant, globally distributed platform for storing and retrieving pertinent assets. It exposes a file/folder interface for applications to save their data and an API interface for control operations.

Media

Media Storage Architecture Cloud

Titan Graph Database Integration with DynamoDB: World-class Performance, Availability, and Scale for New Workloads

All Things Distributed

AUGUST 20, 2015

Today, we are releasing a plugin that allows customers to use the Titan graph engine with Amazon DynamoDB as the backend storage layer. It opens up the possibility to enjoy the value that graph databases bring to relationship-centric use cases, without worrying about managing the underlying storage. The importance of relationships.

Database

Database Logistics Availability Social Media

Redis® Monitoring Strategies for 2025

Scalegrid

JANUARY 21, 2025

Identifying key Redis metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. To monitor Redis instances effectively, collect Redis metrics focusing on cache hit ratio, memory allocated, and latency threshold.

Strategy

Strategy Monitoring Latency DevOps

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. can enhance Redis by handling management tasks, backups, and scalability, facilitating global reach and easy cloud integration for global businesses.

Cache

Cache Storage Architecture Scalability

Netflix’s Distributed Counter Abstraction

RabbitMQ vs. Kafka: Key Differences

Trending Sources

Efficient Multimodal Data Processing: A Technical Deep Dive

The Power of Caching: Boosting API Performance and Scalability

Optimizing data warehouse storage

Title Launch Observability at Netflix Scale

Best Practices for Scaling RabbitMQ

Designing Instagram

Introducing Netflix’s Key-Value Data Abstraction Layer

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

Introducing Netflix TimeSeries Data Abstraction Layer

Mastering Disk Space Management with MongoDB® Storage Engines

What is a Distributed Storage System

Nine ways technology executives can get significant business value with the right observability platform

Improved Alerting with Atlas Streaming Eval

Optimize your environment: Unveiling Dynatrace Hyper-V extension for enhanced performance and efficient troubleshooting

Stuff The Internet Says On Scalability For September 14th, 2018

Building Netflix’s Distributed Tracing Infrastructure

Why growing AI adoption requires an AI observability strategy

Optimize Citrix platform performance and user experience with Dynatrace (GA)

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Stuff The Internet Says On Scalability For December 21st, 2018

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The AWS Storage Gateway - All Things Distributed

Scale up your Dynatrace Managed software-intelligence deployment with self-healing insights

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Consistent caching mechanism in Titus Gateway

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Stuff The Internet Says On Scalability For July 20th, 2018

How Edge and Industrial IoT Will Converge in 2025: A New Era for Smart Manufacturing

Distributed Algorithms in NoSQL Databases

What’s New at ScaleGrid – September 2024

Get up to 300 new metrics out of the box with AWS supporting services (GA)

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

Evolution of ML Fact Store

Observability platform vs. observability tools

Netflix at AWS re:Invent 2019

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Accelerating Data: Faster and More Scalable ElastiCache for Redis

A one size fits all database doesn't fit anyone

Netflix Drive

Titan Graph Database Integration with DynamoDB: World-class Performance, Availability, and Scale for New Workloads

Redis® Monitoring Strategies for 2025

Redis vs Memcached in 2024

Stay Connected