Design, Latency and Storage - Technology Performance Pulse

Designing Instagram

High Scalability

JANUARY 11, 2022

Design a photo-sharing platform similar to Instagram where users can upload their photos and share it with their followers. High Level Design. Component Design. API Design. We have provided the API design of posting an image on Instagram below. Problem Statement. Sending and receiving messages from other users.

Design

Design Media Storage Logistics

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

RabbitMQ is designed for flexible routing and message reliability, while Kafka handles high-throughput event streaming and real-time data processing. Message brokers handle validation, routing, storage, and delivery, ensuring efficient and reliable communication. What is RabbitMQ?

Latency

Latency Analytics Architecture Storage

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Smashing Magazine

JANUARY 7, 2025

How To Design For High-Traffic Events And Prevent Your Website From Crashing How To Design For High-Traffic Events And Prevent Your Website From Crashing Saad Khan 2025-01-07T14:00:00+00:00 2025-01-07T22:04:48+00:00 This article is sponsored by Cloudways Product launches and sales typically attract large volumes of traffic.

Traffic

Traffic Website Design Cache

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination. It also serves as central configuration of access patterns such as consistency or latency targets.

Latency

Latency Storage Cache Efficiency

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

The Challenge of Title Launch Observability As engineers, were wired to track system metrics like error rates, latencies, and CPU utilizationbut what about metrics that matter to a titlessuccess? How can we design systems that recognize these nuances and empower every title to shine and bring joy to ourmembers?

Traffic

Traffic Scalability Strategy Monitoring

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

The architecture of RabbitMQ is meticulously designed for complex message routing, enabling dynamic and flexible interactions between producers and consumers. While clustering across wide-area networks (WANs) is discouraged due to latency issues, leased links can mitigate some connectivity challenges.

Best Practices

Best Practices Traffic Strategy Efficiency

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Infrastructure

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

SEPTEMBER 24, 2021

From chunk encoding to assembly and packaging, the result of each previous processing step must be uploaded to cloud storage and then downloaded by the next processing step. Uploading and downloading data always come with a penalty, namely latency.

Cloud

Cloud Media Storage Cache

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

Why growing AI adoption requires an AI observability strategy

Dynatrace

JANUARY 17, 2024

AI requires more compute and storage. Training AI data is resource-intensive and costly, again, because of increased computational and storage requirements. As a result, AI observability supports cloud FinOps efforts by identifying how AI adoption spikes costs because of increased usage of storage and compute resources.

Strategy

Strategy Artificial Intelligence Storage Cloud

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

Data warehouses offer a single storage repository for structured data and provide a source of truth for organizations. Unlike data warehouses, however, data is not transformed before landing in storage. A data lakehouse provides a cost-effective storage layer for both structured and unstructured data. Data management.

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

The data warehouse is not designed to serve point requests from microservices with low latency. Therefore, we must efficiently move data from the data warehouse to a global, low-latency and highly-reliable key-value store. As most key-value storage engines support efficiently deleting a namespace (e.g.

Latency

Latency Storage Big Data Tuning

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Dynatrace

JUNE 25, 2020

The network latency between cluster nodes should be around 10 ms or less. For Premium HA, this has been extended from 10 ms latency (in the same network region) to around 100 ms network latency due to asynchronous data replication between regions. In the image below, three downed nodes make an entire cluster unavailable.

Availability

Availability Hardware Latency Traffic

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

Now let’s look at how we designed the tracing infrastructure that powers Edgar. If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls. Storage: don’t break the bank!

Infrastructure

Infrastructure Transportation Storage Open Source

The AWS Storage Gateway - All Things Distributed

All Things Distributed

JANUARY 23, 2012

Expanding the Cloud - The AWS Storage Gateway. Today Amazon Web Services has launched the AWS Storage Gateway, making the power of secureÂ and reliable cloud storage accessible from customersâ?? With the launch of the AWS Storage Gateway our customers can now integrate their on-premises IT environment with AWSâ??s

Storage

Storage AWS Virtualization Cloud

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

High Scalability

SEPTEMBER 8, 2018

They've posted about Anna's new superpowers in Going Fast and Cheap: How We Made Anna Autoscale : Using Anna v0 as an in-memory storage engine, we set out to address the cloud storage problems described above. Each storage server collects statistics about the requests it serves, the data it stores, etc. Related Articles.

Storage

Storage Performance AWS Cloud

Narrowing the gap between serverless and its state with storage functions

The Morning Paper

JANUARY 28, 2020

Narrowing the gap between serverless and its state with storage functions , Zhang et al., Shredder is " a low-latency multi-tenant cloud store that allows small units of computation to be performed directly within storage nodes. " SoCC’19. "Narrowing Shredder’s implementation is built on top of Seastar.

Serverless

Serverless Storage Latency Cloud

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

To support this growth, we’ve revisited Pushy’s past assumptions and design decisions with an eye towards both Pushy’s future role and future stability. KeyValue is an abstraction over the storage engine itself, which allows us to choose the best storage engine that meets our SLO needs.

Latency

Latency Cache Tuning Efficiency

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Today is a very exciting day as we release Amazon DynamoDB , a fast, highly reliable and cost-effective NoSQL database service designed for internet scale applications. Amazon DynamoDB offers low, predictable latencies at any scale. Comments ().

Scalability

Scalability Database Ecommerce Latency

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Data Overload and Storage Limitations As IoT and especially industrial IoT -based devices proliferate, the volume of data generated at the edge has skyrocketed.

IoT

IoT Energy Logistics Latency

Evolution of ML Fact Store

The Netflix TechBlog

APRIL 26, 2022

We will share how its design has evolved over the years and the lessons learned while building it. To understand Axion’s design, we need to know the various components that interact with it. The motivation has not changed since then; the design has. Design evolution Axion fact store has four components?—?fact

Storage

Storage Design Scalability Latency

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

When a new leader is elected it loads all data from external storage. In that scenario, the system would need to deal with the data propagation latency directly, for example, by use of timeouts or client-originated update tracking mechanisms. Active data includes jobs and tasks that are currently running.

Cache

Cache Latency Traffic Systems

Data ingestion pipeline with Operation Management

The Netflix TechBlog

MARCH 7, 2023

We designed a unique concept called Annotation Operations which allows teams to create data pipelines and easily write annotations without worrying about access patterns of their data from different applications. But we cannot search or present low latency retrievals from files Etc. This is obviously very expensive.

Media

Media Latency Architecture Database

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Dynatrace

DECEMBER 22, 2019

AWS offers a broad set of global, cloud-based services including computing, storage, networking, Internet of Things (IoT), and many others. Amazon Simple Storage Service (S3). The example below visualizes average latency by API name and stage for a specific AWS API Gateway. Amazon Kinesis Video Streams. Amazon Redshift.

AWS

AWS Metrics IoT Storage

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

This is a set of best practices and guidelines that help you design and operate reliable, secure, efficient, cost-effective, and sustainable systems in the cloud. Storing frequently accessed data in faster storage, usually in-memory caching, improves data retrieval speed and overall system performance. Beyond

AWS

AWS Efficiency Azure Cloud

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

This entertaining romp through the tech stack serves as an introduction to how we think about and design systems, the Netflix approach to operational challenges, and how other organizations can apply our thought processes and technologies. Technology advancements in content creation and consumption have also increased its data footprint.

AWS

AWS Entertainment Open Source Benchmarking

Netflix Drive

The Netflix TechBlog

MAY 5, 2021

Netflix Drive relies on a data store that will be the persistent storage layer for assets, and a metadata store which will provide a relevant mapping from the file system hierarchy to the data store entities. Finally, once the encoded copy is prepared, this copy can be persisted by Netflix Drive to a persistent storage tier in the cloud.

Media

Media Storage Architecture Cloud

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

ITOps refers to the process of acquiring, designing, deploying, configuring, and maintaining equipment and services that support an organization’s desired business outcomes. Besides the traditional system hardware, storage, routers, and software, ITOps also includes virtual components of the network and cloud infrastructure.

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Dynatrace

DECEMBER 22, 2019

AWS offers a broad set of global, cloud-based services including computing, storage, networking, Internet of Things (IoT), and many others. Amazon Simple Storage Service (S3). The example below visualizes average latency by API name and stage for a specific AWS API Gateway. Amazon Kinesis Video Streams. Amazon Redshift.

AWS

AWS Metrics IoT Storage

Growth Engineering at Netflix?—?Automated Imagery Generation

The Netflix TechBlog

FEBRUARY 9, 2021

Before designing a solution it’s important to understand the main product requirements for such a feature: The content needs to be new, relevant, and regional (not all countries have the same catalogue). To reduce latency, assets should be generated in an offline fashion and not in real time. This requires an asset storage solution.

Engineering

Engineering Storage Latency Entertainment

Compression Methods in MongoDB: Snappy vs. Zstd

Percona

MARCH 29, 2023

Compression in any database is necessary as it has many advantages, like storage reduction, data transmission time, etc. Storage reduction alone results in significant cost savings, and we can save more data in the same space. By default, MongoDB provides a snappy block compression method for storage and network communication.

Storage

Storage Network Open Source Latency

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

By collecting and analyzing key performance metrics of the service over time, we can assess the impact of the new changes and determine if they meet the availability, latency, and performance requirements. One can perform this comparison live on the request path or offline based on the latency requirements of the particular use case.

Traffic

Traffic Metrics Systems Strategy

Netflix Video Quality at Scale with Cosmos Microservices

The Netflix TechBlog

NOVEMBER 2, 2021

For example, when we design a new version of VMAF, we need to effectively roll it out throughout the entire Netflix catalog of movies and TV shows. This article explains how we designed microservices and workflows on top of the Cosmos platform to bolster such video quality innovations. VQS is called using the measureQuality endpoint.

Media

Media Innovation Metrics Latency

How digital experience monitoring helps deliver business observability

Dynatrace

APRIL 26, 2022

STM generates traffic that replicates the typical path or behavior of a user on a network to measure performance for example, response times, availability, packet loss, latency, jitter, and other variables). One use case for STM is to model the behavior of a customer in the form of a flow of transactions along the buyer’s journey.

Monitoring

Monitoring Social Media IoT Metrics

USENIX LISA2021 Computing Performance: On the Horizon

Brendan Gregg

JULY 4, 2021

AWS Graviton2); for memory with the arrival of DDR5 and High Bandwidth Memory (HBM) on-processor; for storage including new uses for 3D Xpoint as a 3D NAND accelerator; for networking with the rise of QUIC and eXpress Data Path (XDP); and so on. Ford, et al., “TCP

Performance

Performance Latency Hardware Storage

Taskbar Latency and Kernel Calls

Randon ASCII

SEPTEMBER 8, 2019

Now that we suspect file I/O it’s necessary to go to Graph Explorer-> Storage-> File I/O. I also don’t know why right-clicking on other programs’ icons on the task bar is also a bit slow – it’s apparently a different issue, or an odd design decision.

Latency

Latency Cache Programming Operating System

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

This article will explore how they handle data storage and scalability, perform in different scenarios, and, most importantly, how these factors influence your choice. Redis Data Types and Structures The design of Redis’s data structures emphasizes versatility. Memcached’s primary strength lies in its simplicity.

Cache

Cache Storage Architecture Scalability

Introducing Dynatrace built-in data observability on Davis AI and Grail

Dynatrace

JANUARY 31, 2024

million” – Gartner Data observability is a practice that helps organizations understand the full lifecycle of data, from ingestion to storage and usage, to ensure data health and reliability. . “Every year, poor data quality costs organizations an average $12.9

DevOps

DevOps Analytics Airlines Metrics

Crucial Redis Monitoring Metrics You Must Watch

Scalegrid

JANUARY 25, 2024

Key Takeaways Critical performance indicators such as latency, CPU usage, memory utilization, hit rate, and number of connected clients/slaves/evictions must be monitored to maintain Redis’s high throughput and low latency capabilities. Similarly, an increased throughput signifies an intensive workload on a server and a larger latency.

Metrics

Metrics Monitoring Latency Cache

Distributed Algorithms in NoSQL Databases

Highly Scalable

SEPTEMBER 18, 2012

Historically, NoSQL paid a lot of attention to tradeoffs between consistency, fault-tolerance and performance to serve geographically distributed systems, low-latency or highly available applications. Read/Write latency. Read/Write requests are processes with a minimal latency. Consistency-latency tradeoff.

Database

Database Latency C++ Scalability

Cloudburst: stateful functions-as-a-service

The Morning Paper

FEBRUARY 6, 2020

’ Stateless is fine until you need state, at which point the coarse-grained solutions offered by current platforms limit the kinds of application designs that work well. On the Cloudburst design teams’ wish list: A running function’s ‘hot’ data should be kept physically nearby for low-latency access.

Serverless

Serverless Lambda Cache Latency

A one size fits all database doesn't fit anyone

All Things Distributed

JUNE 21, 2018

Further, with the growth and scale of Amazon.com, boundless horizontal scale needed to be a key design point--scaling up simply wasn't an option. Use cases such as gaming, ad tech, and IoT lend themselves particularly well to the key-value data model where the access patterns require low-latency Gets/Puts for known key values.

Database

Database AWS Games Latency

Designing Instagram

Netflix’s Distributed Counter Abstraction

Trending Sources

RabbitMQ vs. Kafka: Key Differences

Optimizing data warehouse storage

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Introducing Netflix’s Key-Value Data Abstraction Layer

Title Launch Observability at Netflix Scale

Best Practices for Scaling RabbitMQ

Introducing Netflix TimeSeries Data Abstraction Layer

Netflix Cloud Packaging in the Terabyte Era

What is a Distributed Storage System

Why growing AI adoption requires an AI observability strategy

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Building Netflix’s Distributed Tracing Infrastructure

The AWS Storage Gateway - All Things Distributed

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

Narrowing the gap between serverless and its state with storage functions

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Evolution of ML Fact Store

Consistent caching mechanism in Titus Gateway

Data ingestion pipeline with Operation Management

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Implementing AWS well-architected pillars with automated workflows

Netflix at AWS re:Invent 2019

Netflix Drive

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Growth Engineering at Netflix?—?Automated Imagery Generation

Compression Methods in MongoDB: Snappy vs. Zstd

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Netflix Video Quality at Scale with Cosmos Microservices

How digital experience monitoring helps deliver business observability

USENIX LISA2021 Computing Performance: On the Horizon

Taskbar Latency and Kernel Calls

Redis vs Memcached in 2024

Introducing Dynatrace built-in data observability on Davis AI and Grail

Crucial Redis Monitoring Metrics You Must Watch

Distributed Algorithms in NoSQL Databases

Cloudburst: stateful functions-as-a-service

A one size fits all database doesn't fit anyone

Stay Connected