Architecture, Storage and Traffic - Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. This approach has a handful of benefits.

Traffic

Traffic Latency Tuning Systems

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic

Traffic Metrics Systems Strategy

Title Launch Observability at Netflix Scale

The Netflix TechBlog

MARCH 4, 2025

Part 3: System Strategies and Architecture By: VarunKhaitan With special thanks to my stunning colleagues: Mallika Rao , Esmir Mesic , HugoMarques This blog post is a continuation of Part 2 , where we cleared the ambiguity around title launch observability at Netflix. The response schema for the observability endpoint.

Traffic

Traffic Strategy Entertainment Innovation

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

This article outlines the key differences in architecture, performance, and use cases to help determine the best fit for your workload. RabbitMQ follows a message broker model with advanced routing, while Kafkas event streaming architecture uses partitioned logs for distributed processing. What is RabbitMQ? What is Apache Kafka?

Latency

Latency Analytics Architecture Storage

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Optimizing RabbitMQ performance through strategies such as keeping queues short, enabling lazy queues, and monitoring health checks is essential for maintaining system efficiency and effectively managing high traffic loads.

Best Practices

Best Practices Traffic Strategy Efficiency

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Architecture Overview The first pivotal step in managing impressions begins with the creation of a Source-of-Truth (SOT) dataset. The enriched data is seamlessly accessible for both real-time applications via Kafka and historical analysis through storage in an Apache Iceberg table.

Tuning

Tuning Latency Efficiency Storage

Unlock end-to-end observability insights with Dynatrace PurePath 4 seamless integration of OpenTracing for Java

Dynatrace

DECEMBER 9, 2020

Cloud-native technologies and microservice architectures have shifted technical complexity from the source code of services to the interconnections between services. Heterogeneous cloud-native microservice architectures can lead to visibility gaps in distributed traces. Dynatrace news.

Java

Java Traffic Architecture Strategy

Geek Reading - Week of June 5, 2013

DZone

OCTOBER 11, 2022

Improving testing by using real traffic from production ( Hacker News). Simpler UI Testing with CasperJS ( Architects Zone – Architectural Design Patterns & Best Practices). Using MongoDB as a cache store ( Architects Zone – Architectural Design Patterns & Best Practices). History of Lisp ( Hacker News). Hacker News).

Java

Java Best Practices Google Analytics

Network performance monitoring top of mind for CloudOps teams

Dynatrace

MAY 19, 2023

Network traffic growth is the main reason for increasing spending, largely because of the adoption of hybrid and multi-cloud architectures. What are the issues with traffic losses and connectivity drops? Without the network, nothing will happen,” Ziemianowicz said.

Network

Network Monitoring Performance Traffic

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

What is security analytics?

Dynatrace

JUNE 10, 2024

For example, an organization might use security analytics tools to monitor user behavior and network traffic. Security analytics must also contend with the multicomponent architecture of modern IT infrastructure. Dehydrated data has been compressed or otherwise altered for storage in a data warehouse.

Analytics

Analytics Network Open Source Hardware

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

In previous blog posts, we introduced the Key-Value Data Abstraction Layer and the Data Gateway Platform , both of which are integral to Netflix’s data architecture. Handling Bursty Traffic : Managing significant traffic spikes during high-demand events, such as new content launches or regional failovers.

Latency

Latency Storage Traffic Tuning

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

Website monitoring examines a cloud-hosted website’s processes, traffic, availability, and resource use. Cloud storage monitoring. Teams can keep track of storage resources and processes that are provisioned to virtual machines, services, databases, and applications. Cloud-server monitoring.

Cloud

Cloud Monitoring Best Practices Infrastructure

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

In this post, we dive deep into how Netflix’s KV abstraction works, the architectural principles guiding its design, the challenges we faced in scaling diverse use cases, and the technical innovations that have allowed us to achieve the performance and reliability required by Netflix’s global operations.

Latency

Latency Storage Cache Servers

Identify issues faster with enhanced visibility into your TIBCO EMS resources (Preview)

Dynatrace

JUNE 16, 2020

One key requirement of a microservices architecture is the ability to make information of all kinds available wherever and whenever it’s needed, without putting undue traffic on corporate and public networks. Synchronous storage size. Async storage size. Storage read size rate. Storage read count rate.

Storage

Storage Metrics Java Architecture

What Adrian Did Next — Part 4 — how I helped Netflix launch on iPad and iPhone — 2007 to 2010

Adrian Cockcroft

JANUARY 27, 2025

Terrible timing but without Stephane I was the only iOS developer anywhere at Netflix, and it wasnt my job to be a developer, by then I was leading the cloud re-architecture team. We simply didnt have enough capacity in our datacenter to run the traffic, so it had to work. I use mine most days to watch videos.

C++

C++ Mobile Hardware Java

Edgar: Solving Mysteries Faster with Observability

The Netflix TechBlog

SEPTEMBER 2, 2020

While this abundance of dashboards and information is by no means unique to Netflix, it certainly holds true within our microservices architecture. Edgar captures 100% of interesting traces , as opposed to sampling a small fixed percentage of traffic. As you can imagine, this comes with very real storage costs.

Latency

Latency Transportation Engineering Traffic

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Dynatrace

FEBRUARY 16, 2023

Grail combines the big-data storage of a data warehouse with the analytical flexibility of a data lake. You’re getting all the architectural benefits of Grail—the petabytes, the cardinality—with this implementation,” including the three pillars of observability: logs, metrics, and traces in context.

Analytics

Analytics Innovation Metrics Database

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

The original assumptions and architectural choices were no longer viable. Overview The figure below depicts a simplified high-level architecture of a single Titus cluster (a.k.a When a new leader is elected it loads all data from external storage. Active data includes jobs and tasks that are currently running. queries/sec.

Cache

Cache Latency Traffic Systems

Achieving observability in async workflows

The Netflix TechBlog

MAY 14, 2021

Managing and operating asynchronous workflows can be difficult without the proper tools and architecture that puts observability, debugging, and tracing at the forefront. Prodicle Distribution Our service is required to be elastic and handle bursty traffic. Written by Colby Callahan , Megha Manohara , and Mike Azar.

Traffic

Traffic Java Latency Google

OneAgent for Windows—Enhancements to *.msi-based deployment

Dynatrace

MAY 9, 2019

Some time ago, we decided to take a stab at a number of architectural challenges present in the OneAgent installer for Windows. Consequently, each new version of OneAgent for Windows consumed double storage space: one for the *.exe And it added to the network traffic in terms of new version distribution. Dynatrace news.

Storage

Storage Tuning Traffic Architecture

DevOps monitoring tools: How to drive DevOps efficiency

Dynatrace

MAY 8, 2023

Infrastructure monitoring Infrastructure monitoring reviews servers, storage, network connections, virtual machines, and other data center elements that support applications. Because every DevOps environment is unique, exactly how organizations implement these monitoring types will differ depending on architecture and tools.

DevOps

DevOps Efficiency Monitoring Infrastructure

Elasticsearch Indexing Strategy in Asset Management Platform (AMP)

The Netflix TechBlog

MARCH 10, 2023

are stored in secure storage layers. Amsterdam is built on top of three storage layers. These applications are built on a microservices architecture, and the Asset Management Platform provides asset management to those dozens of services for various asset types. The first layer, Cassandra , is the source of truth for us.

Strategy

Strategy Cache Storage Analytics

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

Key Takeaways Redis offers complex data structures and additional features for versatile data handling, while Memcached excels in simplicity with a fast, multi-threaded architecture for basic caching needs. It uses a hash table to manage these pairs, divided into fixed-size buckets with linked lists for key-value storage.

Cache

Cache Storage Architecture Scalability

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

s web-based applications often encounter database scaling challenges when faced with growth in users, traffic, and data. Behind the scenes, Amazon DynamoDB automatically spreads the data and traffic for a table over a sufficient number of servers to meet the request capacity specified by the customer. The growth of Amazonâ??s

Scalability

Scalability Database Ecommerce Latency

Web Development Trends in 2023

KeyCDN

FEBRUARY 22, 2023

With cloud-based infrastructure, organizations can easily scale their web applications to handle increased traffic or demand without the need for expensive hardware upgrades. Each of these platforms offers a wide range of services and tools for web application development and deployment, including storage, databases, and serverless computing.

Artificial Intelligence

Artificial Intelligence Development Serverless Website

Netflix Video Quality at Scale with Cosmos Microservices

The Netflix TechBlog

NOVEMBER 2, 2021

The Reloaded system is a well-matured and scalable system, but its monolithic architecture can slow down rapid innovation. A bridge between two worlds To live such a life, we developed several “bridging” workflows, which allow us to route video quality traffic from Reloaded into Cosmos. via bug fixes). We call this system Cosmos.

Media

Media Innovation Metrics Latency

DynamoDB One Year Later - All Things Distributed

All Things Distributed

MARCH 7, 2013

Shazam needed to handle an enormous increase in traffic for the duration of the Super Bowl and used DynamoDB as part of their architecture. This rapid adoption has allowed us to benefit from the scale economies inherent in our architecture. Indexed Storage costs : We are lowering the price of indexed storage by 75%.

Ecommerce

Ecommerce Storage Scalability Database

The Ultimate Guide to Database High Availability

Percona

JUNE 22, 2023

Load balancing : Traffic is distributed across multiple servers to prevent any one component from becoming overloaded. Load balancers can detect when a component is not responding and put traffic redirection in motion. When planning your database HA architecture, the size of your company is a great place to start to assess your needs.

Availability

Availability Database Open Source Hardware

AWS EKS Monitoring as a Self-Service with Dynatrace

Dynatrace

SEPTEMBER 17, 2019

PostgreSQL & Elastic for data storage. MaaSS for Cloud Architects: Deployment and Architecture Validations. Thanks to PurePath, architects can validate how transactions flow from service-to-service and how traffic gets routed through service mashes (AWS App Mesh, Istio, Linkerd) or proxies. NGINX as an API Gateway.

AWS

AWS Monitoring Ecommerce Lambda

Growth Engineering at Netflix?—?Automated Imagery Generation

The Netflix TechBlog

FEBRUARY 9, 2021

This requires an asset storage solution. Asset Storage We refer to asset storage and management simply as asset management. However, it would be cost-inefficient to leverage this same hardware for lightweight and more consistent traffic patterns that an asset management service requires.

Engineering

Engineering Storage Latency Entertainment

Weekend Reading: Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases.

All Things Distributed

MAY 19, 2017

It gave us the opportunity to invent a new database architecture that would address to needs of modern cloud-scale applications, departing from the traditional approaches that had their roots in the databases of nineties. In this paper, we describe the architecture of Aurora and the design considerations leading to that architecture.

Database

Database Design Cloud Storage

Datadog Creates Scalable Data Ingestion Architecture

InfoQ

JUNE 16, 2023

Datadog created a dedicated data ingestion architecture offering exactly-once semantics for their third-generation event store, Husky. The event-driven architecture (EDA) can accommodate bursts in traffic in the multi-tenant platform with reasonable ingestion latency and acceptable operational costs. By Rafal Gancarz

Architecture

Architecture Scalability Latency Traffic

Key Advantages of DBMS for Efficient Data Management

Scalegrid

JANUARY 5, 2024

The DBMS is key to maintaining these aspects by offering a storage system that allows users to perform operations such as data insertion, deletion, and selection, thereby promoting enhanced data integration across diverse applications and platforms. This is significant for modern business environments. <p>The </p>

Efficiency

Efficiency Storage Database Scalability

Stuff The Internet Says On Scalability For July 20th, 2018

High Scalability

JULY 20, 2018

That’s mapping applications to the specific architectural choices. The third wing of the architecture piece is the “domain specific system-on-chip.” And you already see that in machine learning, where there’s a really hot field in terms of deep neural nets and other implementations.

Internet

Internet Internet Scalability Automotive

Evolution of Netflix Conductor:

The Netflix TechBlog

JULY 30, 2019

External Payload Storage External payload storage was implemented to prevent the usage of Conductor as a data persistence system and to reduce the pressure on its backend datastore. Push based task scheduling interface Current Conductor architecture is based on polling from a worker to get tasks that it will execute.

Lambda

Lambda Media Open Source Metrics

What Is RabbitMQ: Key Features and Uses

Scalegrid

JUNE 7, 2024

In this article, we will explore what RabbitMQ is, its mechanisms to facilitate message queueing, its role within software architectures, and the tangible benefits it delivers in real-world scenarios. This includes acknowledgments confirming both publishing actions and storage on disk.

IoT

IoT Software Architecture Architecture Scalability

Total Cost of Ownership and the Return on Agility - All Things.

All Things Distributed

AUGUST 16, 2012

An apples to apples comparison of the costs associated with running various usage patterns on-premises and with AWS requires more than a simple comparison of hardware expense versus always-on utility pricing for compute and storage. Total Cost of Ownership and the Return on Agility. By Werner Vogels on 16 August 2012 10:00 AM. Comments ().

AWS

AWS Hardware Traffic Best Practices

What is RabbitMQ Used For

Scalegrid

JUNE 28, 2024

Integrating such a backend service system supported by RabbitMQ into a web application’s architecture can drastically alter its operational dynamics. This makes RabbitMQ an attractive option for developers and enterprises seeking to optimize their software architecture. Is RabbitMQ a good fit for a microservices architecture?

IoT

IoT Healthcare Programming Open Source

Understanding operational 5G: a first measurement study on its coverage, performance and energy consumption

The Morning Paper

OCTOBER 4, 2020

Three different 5G phones are used, including a ZTE Axon10 Pro with powerful communication (SDX 50 5G modem) and compute (Qualcomm Snapdragon TM855) capabilities together with 256GB of storage. This is a feature of the NSA architecture which requires dropping off of 5G onto 4G, doing a handover on 4G, and then upgrading to 5G again.

Energy

Energy Latency Performance Network

Setting Up and Deploying PostgreSQL for High Availability

Percona

JULY 7, 2023

Unfortunately, using certain open source database software as part of an HA architecture can present significant challenges. Downtime due to SPOFs can also be attributed to bottlenecks from architectures designed for applications instead of databases. Despite all its upside, PostgreSQL software presents such challenges.

Availability

Availability Open Source Architecture Database

10 Lessons from 10 Years of Amazon Web Services

All Things Distributed

MARCH 11, 2016

The expectation was that with each order or two of magnitude, we would need to revisit and revise the architecture to make sure we could address the issues of scale. We needed to build such an architecture that we could introduce new software components without taking the service down.

AWS

AWS Hardware Retail Virtualization

High Availability vs. Fault Tolerance: Is FT’s 00.001% Edge in Uptime Worth the Headache?

Percona

AUGUST 22, 2023

We’ll also look at the differences, as it’s important to know what architecture(s) will help you best meet your unique requirements for maximizing data assets and achieving continuous uptime. Load balancing: Traffic is distributed across multiple servers to prevent any one component from becoming overloaded.

Availability

Availability Hardware Open Source Database

Multi Cloud vs Hybrid Cloud Strategy

Scalegrid

JANUARY 8, 2024

Like ScaleGrid’s offerings for multi-cloud architecture compatibility, its solutions are well-suited for use within a single cloud provider or a hybrid cloud setup as well. Firstly, let’s take a look at Spotify’s implementation of the multi-cloud approach before exploring Netflix’s adoption of a hybrid cloud architecture.

Cloud

Cloud Strategy Scalability Artificial Intelligence

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Trending Sources

Title Launch Observability at Netflix Scale

RabbitMQ vs. Kafka: Key Differences

Best Practices for Scaling RabbitMQ

Introducing Impressions at Netflix

Unlock end-to-end observability insights with Dynatrace PurePath 4 seamless integration of OpenTracing for Java

Geek Reading - Week of June 5, 2013

Network performance monitoring top of mind for CloudOps teams

What is a Distributed Storage System

What is security analytics?

Introducing Netflix TimeSeries Data Abstraction Layer

What is cloud monitoring? How to improve your full-stack visibility

Introducing Netflix’s Key-Value Data Abstraction Layer

Identify issues faster with enhanced visibility into your TIBCO EMS resources (Preview)

What Adrian Did Next — Part 4 — how I helped Netflix launch on iPad and iPhone — 2007 to 2010

Edgar: Solving Mysteries Faster with Observability

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Consistent caching mechanism in Titus Gateway

Achieving observability in async workflows

OneAgent for Windows—Enhancements to *.msi-based deployment

DevOps monitoring tools: How to drive DevOps efficiency

Elasticsearch Indexing Strategy in Asset Management Platform (AMP)

Redis vs Memcached in 2024

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

Web Development Trends in 2023

Netflix Video Quality at Scale with Cosmos Microservices

DynamoDB One Year Later - All Things Distributed

The Ultimate Guide to Database High Availability

AWS EKS Monitoring as a Self-Service with Dynatrace

Growth Engineering at Netflix?—?Automated Imagery Generation

Weekend Reading: Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases.

Datadog Creates Scalable Data Ingestion Architecture

Key Advantages of DBMS for Efficient Data Management

Stuff The Internet Says On Scalability For July 20th, 2018

Evolution of Netflix Conductor:

What Is RabbitMQ: Key Features and Uses

Total Cost of Ownership and the Return on Agility - All Things.

What is RabbitMQ Used For

Understanding operational 5G: a first measurement study on its coverage, performance and energy consumption

Setting Up and Deploying PostgreSQL for High Availability

10 Lessons from 10 Years of Amazon Web Services

High Availability vs. Fault Tolerance: Is FT’s 00.001% Edge in Uptime Worth the Headache?

Multi Cloud vs Hybrid Cloud Strategy

Stay Connected