Design, Latency and Scalability - Technology Performance Pulse

Designing Instagram

High Scalability

JANUARY 11, 2022

Design a photo-sharing platform similar to Instagram where users can upload their photos and share it with their followers. High Level Design. Component Design. API Design. We have provided the API design of posting an image on Instagram below. API Design. Problem Statement. Architecture. Fetching User Feed.

Design

Design Media Storage Logistics

API Design Principles for Optimal Performance and Scalability

DZone

JUNE 22, 2023

The goal is to help developers, technical managers, and business owners understand the importance of API performance optimization and how they can improve the speed, scalability, and reliability of their APIs. API performance optimization is the process of improving the speed, scalability, and reliability of APIs.

Scalability

Scalability Design Best Practices Performance

Scalable Annotation Service?—?Marken

The Netflix TechBlog

JANUARY 25, 2023

Scalable Annotation Service — Marken by Varun Sekhri , Meenakshi Jindal Introduction At Netflix, we have hundreds of micro services each with its own data models or entities. The service should be able to serve real-time, aka UI, applications so CRUD and search operations should be achieved with low latency.

Scalability

Scalability Latency Media Architecture

Performance and Scalability Analysis of Redis and Memcached

DZone

JULY 2, 2024

Speed and scalability are significant issues today, at least in the application landscape. We have run these benchmarks on the AWS EC2 instances and designed a custom dataset to make it as close as possible to real application use cases. However, the question arises of choosing the best one.

Scalability

Scalability Performance Benchmarking Games

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

RabbitMQ is designed for flexible routing and message reliability, while Kafka handles high-throughput event streaming and real-time data processing. This decoupling simplifies system architecture and supports scalability in distributed environments. Choosing between RabbitMQ and Kafka depends on your specific messaging needs.

Latency

Latency Analytics Architecture Storage

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Smashing Magazine

JANUARY 7, 2025

How To Design For High-Traffic Events And Prevent Your Website From Crashing How To Design For High-Traffic Events And Prevent Your Website From Crashing Saad Khan 2025-01-07T14:00:00+00:00 2025-01-07T22:04:48+00:00 This article is sponsored by Cloudways Product launches and sales typically attract large volumes of traffic.

Traffic

Traffic Website Design Cache

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

Yet, many are confined to a brief temporal window due to constraints in serving latency or training costs. These insights have shaped the design of our foundation model, enabling a transition from maintaining numerous small, specialized models to building a scalable, efficient system.

Tuning

Tuning Efficiency Latency Strategy

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

The Challenge of Title Launch Observability As engineers, were wired to track system metrics like error rates, latencies, and CPU utilizationbut what about metrics that matter to a titlessuccess? How can we design systems that recognize these nuances and empower every title to shine and bring joy to ourmembers?

Traffic

Traffic Scalability Strategy Monitoring

Spring WebFlux: publishOn vs subscribeOn for Improving Microservices Performance

DZone

SEPTEMBER 23, 2024

With the rise of microservices architecture , there has been a rapid acceleration in the modernization of legacy platforms, leveraging cloud infrastructure to deliver highly scalable, low-latency, and more responsive services. Why Use Spring WebFlux?

Performance

Performance Latency Architecture Programming

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Key Takeaways RabbitMQ improves scalability and fault tolerance in distributed systems by decoupling applications, enabling reliable message exchanges. This decoupling is crucial in modern architectures where scalability and fault tolerance are paramount. This setup prioritizes data safety, with most replicas online at any given time.

Best Practices

Best Practices Traffic Strategy Scalability

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

Werner Vogels weblog on building scalable and robust distributed systems. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Today is a very exciting day as we release Amazon DynamoDB , a fast, highly reliable and cost-effective NoSQL database service designed for internet scale applications.

Scalability

Scalability Database Ecommerce Latency

Benchmark (YCSB) numbers for Redis, MongoDB, Couchbase2, Yugabyte and BangDB

High Scalability

FEBRUARY 17, 2021

We note that for MongoDB update latency is really very low (low is better) compared to other dbs, however the read latency is on the higher side. The latency table shows that 99th percentile latency for Yugabyte is quite high compared to others (lower is better). Again Yugabyte latency is quite high. Conclusion.

Benchmarking

Benchmarking Latency C++ Database

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Such frameworks support software engineers in building highly scalable and efficient applications that process continuous data streams of massive volume. Stream processing systems, designed for continuous, low-latency processing, demand swift recovery mechanisms to tolerate and mitigate failures effectively.

Engineering

Engineering Tuning Latency Open Source

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Central to this infrastructure is our use of multiple online distributed databases such as Apache Cassandra , a NoSQL database known for its high availability and scalability. It also serves as central configuration of access patterns such as consistency or latency targets.

Latency

Latency Storage Cache Servers

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Stuff The Internet Says On Scalability For December 21st, 2018

High Scalability

DECEMBER 21, 2018

It's HighScalability time: Have a very scalable Xmas everyone! Tim Bray : How to talk about [Serverless Latency] · To start with, don’t just say “I need 120ms.” See you in the New Year. Do you like this sort of Stuff? Please support me on Patreon. I'd really appreciate it. Explain the Cloud Like I'm 10.

Internet

Internet Internet Scalability Serverless

Self-Host Your Static Assets

CSS Wizardry

MAY 31, 2019

Every new origin we need to visit needs a connection opening, and that can be very costly: DNS resolution, TCP handshakes, and TLS negotiation all add up, and the story gets worse the higher the latency of the connection is. On a slower, higher-latency connection, the story is much, mush worse. All completely avoidable. to just 3.6s.

Cache

Cache Latency Infrastructure Website

Allegro Reduces Kafka Producer Latency Outliers by 82% After Switching to XFS

InfoQ

APRIL 26, 2024

Allegro experimented with different performance optimization options to improve Apache Kafka producer tail latency and eventually switched all its clusters to the XFS filesystem. The company used Kafka protocol sniffing, JVM profiling, and eBPF, which proved instrumental in identifying and eliminating performance bottlenecks.

Latency

Latency Performance Tuning Scalability

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

To this end, we developed a Rapid Event Notification System (RENO) to support use cases that require server initiated communication with devices in a scalable and extensible manner. This is not an easy task, considering the wide variety of supported devices and the sheer volume of actions our members perform.

Systems

Systems Traffic Architecture Mobile

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

In that scenario, the system would need to deal with the data propagation latency directly, for example, by use of timeouts or client-originated update tracking mechanisms. We started seeing increased response latencies and leader servers running at dangerously high utilization.

Cache

Cache Latency Traffic Systems

LinkedIn Migrates Espresso to HTTP2 and Reduces Connections by 88% and Latency by 75%

InfoQ

DECEMBER 4, 2023

LinkedIn was able to dramatically improve the scalability and performance of its Espresso database by migrating it from HTTP1.1 to HTTP2, resulting in a reduction in the number of connections, latency, and garbage collection times. By Rafal Gancarz

Latency

Latency Scalability Database Performance

The Netflix Cosmos Platform

The Netflix TechBlog

MARCH 1, 2021

It supports both high throughput services that consume hundreds of thousands of CPUs at a time, and latency-sensitive workloads where humans are waiting for the results of a computation. The third generation, called Reloaded , has been online for about seven years and has proven to be stable and massively scalable.

Serverless

Serverless Media Latency Social Media

SRE vs DevOps: What you need to know

Dynatrace

FEBRUARY 24, 2021

SRE is the transformation of traditional operations practices by using software engineering and DevOps principles to improve the availability, performance, and scalability of releases by building resiliency into apps and infrastructure. Designating and managing Service Level Objectives (SLOs) as availability targets for a service.

DevOps

DevOps Software Engineering Speed Google

What is real user monitoring (RUM)?

Dynatrace

JANUARY 13, 2022

However, only highly scalable real user monitoring solutions can collect data on all user actions, while less scalable tools have to sample user actions and make inferences from partial data. Providing insight into the service latency to help developers identify poorly performing code. How real user monitoring works.

Monitoring

Monitoring Mobile Latency Best Practices

Growth Engineering at Netflix- Creating a Scalable Offers Platform

The Netflix TechBlog

FEBRUARY 9, 2021

In particular, it’s our job to design and build the systems and protocols that enable customers from all over the world to sign up for Netflix with the plan features and incentives that best suit their needs. This was a perfectly sufficient design for many years. To name a few: Updating XML files is error-prone and manual in nature.

Engineering

Engineering Scalability Architecture Innovation

What is AWS Lambda?

Dynatrace

APRIL 5, 2021

Lambda’s toolbox of automated processes helps developers streamline to build fast, robust, and scalable applications on accelerated timelines. AWS continues to improve how it handles latency issues. As The New Stack reports, developers spend only 32% of their time at work actually coding. It helps SRE teams automate responses.

Lambda

Lambda AWS Serverless Hardware

What is observability? Not just logs, metrics and traces

Dynatrace

OCTOBER 1, 2021

When an observability solution also analyzes user experience data using synthetic and real-user monitoring, you can discover problems before your users do and design better user experiences based on real, immediate feedback. The architects and developers who create the software must design it to be observed. Benefits of observability.

Metrics

Metrics Open Source Monitoring Cloud

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix TechBlog

JANUARY 10, 2024

This architecture shift greatly reduced the processing latency and increased system resiliency. We expanded pipeline support to serve our studio/content-development use cases, which had different latency and resiliency requirements as compared to the traditional streaming use case. divide the input video into small chunks 2.

Processing

Processing Media Latency Innovation

Why growing AI adoption requires an AI observability strategy

Dynatrace

JANUARY 17, 2024

By adopting a cloud- and edge-based AI approach, teams can benefit from the flexibility, scalability, and pay-per-use model of the cloud while also reducing the latency, bandwidth, and cost of sending AI data to cloud-based operations. Use containerization. Continuously monitor AI models’ performance.

Strategy

Strategy Artificial Intelligence Storage Cloud

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Increased latency during peak loads. Introduce scalable microservices architectures to distribute computational loads efficiently.

IoT

IoT Energy Logistics Latency

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

The data warehouse is not designed to serve point requests from microservices with low latency. Therefore, we must efficiently move data from the data warehouse to a global, low-latency and highly-reliable key-value store. Figure 1 shows how we use Bulldozer to move data at Netflix. Moving data with Bulldozer at Netflix.

Latency

Latency Storage Big Data Tuning

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

Now let’s look at how we designed the tracing infrastructure that powers Edgar. If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls. Storage: don’t break the bank!

Infrastructure

Infrastructure Transportation Storage Open Source

Evolution of ML Fact Store

The Netflix TechBlog

APRIL 26, 2022

We will share how its design has evolved over the years and the lessons learned while building it. To understand Axion’s design, we need to know the various components that interact with it. The motivation has not changed since then; the design has. Design evolution Axion fact store has four components?—?fact

Storage

Storage Design Scalability Latency

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

This entertaining romp through the tech stack serves as an introduction to how we think about and design systems, the Netflix approach to operational challenges, and how other organizations can apply our thought processes and technologies. We explore all the systems necessary to make and stream content from Netflix.

AWS

AWS Entertainment Open Source Benchmarking

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

The breadth of fully-featured services, the pay-as-you-go scalability, and the agility of cloud platforms enable organizations to expand their modern approaches to building and managing digital services in a way they can’t with on-premises apps and infrastructure. Increased scalability. Reduced cost.

Cloud

Cloud Traffic Best Practices Hardware

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Since its inception , Metaflow has been designed to provide a human-friendly API for building data and ML (and today AI) applications and deploying them in our production infrastructure frictionlessly. In other cases, it is more convenient to share the results via a low-latency API.

Systems

Systems Media Cache Open Source

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture. These principles reduce resource usage by being more efficient and effective while lowering the end-to-end latency in data processing. Transparency to end-users.

Storage

Storage Latency Efficiency Data Engineering

A one size fits all database doesn't fit anyone

All Things Distributed

JUNE 21, 2018

As I have talked about before, one of the reasons why we built Amazon DynamoDB was that Amazon was pushing the limits of what was a leading commercial database at the time and we were unable to sustain the availability, scalability, and performance needs that our growing Amazon.com business demanded. Purpose-built databases.

Database

Database AWS Games Latency

Netflix Video Quality at Scale with Cosmos Microservices

The Netflix TechBlog

NOVEMBER 2, 2021

As VMAF evolves and is integrated with more encoding and streaming workflows within Netflix, we need scalable ways of fostering video quality innovations. For example, when we design a new version of VMAF, we need to effectively roll it out throughout the entire Netflix catalog of movies and TV shows. The workflow is initiated.

Media

Media Innovation Metrics Latency

Towards a Reliable Device Management Platform

The Netflix TechBlog

AUGUST 30, 2021

The challenge, then, is to be able to ingest and process these events in a scalable manner, i.e., scaling with the number of devices, which will be the focus of this blog post. By the following morning, alerts were received regarding high memory consumption and GC latencies, to the point where the service was unresponsive to HTTP requests.

Latency

Latency Traffic Transportation Cloud

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Dynatrace

DECEMBER 22, 2019

You can use these services in combinations that are tailored to help your business move faster, lower IT costs, and support scalability. The example below visualizes average latency by API name and stage for a specific AWS API Gateway. Metrics for each service instance are presented in detailed charts—see the example for ECS below.

AWS

AWS Metrics IoT Storage

Under the Hood of Amazon EC2 Container Service

All Things Distributed

JULY 20, 2015

To be robust and scalable, this key/value store needs to be distributed for durability and availability, to protect against network partitions or hardware failures. This architecture affords Amazon ECS high availability, low latency, and high throughput because the data store is never pessimistically locked.

Latency

Latency Architecture AWS Open Source

Streaming SQL in Data Mesh

The Netflix TechBlog

NOVEMBER 3, 2023

However, this design decision led to a different set of challenges. Additionally, instead of implementing business logic by composing multiple individual Processors together, users could express their logic in a single SQL query, avoiding the additional resource and latency overhead that came from multiple Flink jobs and Kafka topics.

Processing

Processing Engineering Infrastructure Latency

Designing Instagram

API Design Principles for Optimal Performance and Scalability

Trending Sources

Scalable Annotation Service?—?Marken

Performance and Scalability Analysis of Redis and Memcached

Netflix’s Distributed Counter Abstraction

RabbitMQ vs. Kafka: Key Differences

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Foundation Model for Personalized Recommendation

Title Launch Observability at Netflix Scale

Spring WebFlux: publishOn vs subscribeOn for Improving Microservices Performance

Best Practices for Scaling RabbitMQ

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

Benchmark (YCSB) numbers for Redis, MongoDB, Couchbase2, Yugabyte and BangDB

Why applying chaos engineering to data-intensive applications matters

Introducing Netflix’s Key-Value Data Abstraction Layer

Introducing Netflix TimeSeries Data Abstraction Layer

Stuff The Internet Says On Scalability For December 21st, 2018

Self-Host Your Static Assets

Allegro Reduces Kafka Producer Latency Outliers by 82% After Switching to XFS

Rapid Event Notification System at Netflix

Consistent caching mechanism in Titus Gateway

LinkedIn Migrates Espresso to HTTP2 and Reduces Connections by 88% and Latency by 75%

The Netflix Cosmos Platform

SRE vs DevOps: What you need to know

What is real user monitoring (RUM)?

Growth Engineering at Netflix- Creating a Scalable Offers Platform

What is AWS Lambda?

What is observability? Not just logs, metrics and traces

Rebuilding Netflix Video Processing Pipeline with Microservices

Why growing AI adoption requires an AI observability strategy

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Building Netflix’s Distributed Tracing Infrastructure

Evolution of ML Fact Store

Netflix at AWS re:Invent 2019

What is cloud migration?

Supporting Diverse ML Systems at Netflix

Optimizing data warehouse storage

A one size fits all database doesn't fit anyone

Netflix Video Quality at Scale with Cosmos Microservices

Towards a Reliable Device Management Platform

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Under the Hood of Amazon EC2 Container Service

Streaming SQL in Data Mesh

Stay Connected