Architecture, Availability and Traffic - Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. This approach has a handful of benefits.

Traffic

Traffic Latency Tuning Systems

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic

Traffic Metrics Systems Strategy

Chaos Engineering With Litmus: A CNCF Incubating Project

DZone

FEBRUARY 6, 2025

We have developed a microservices architecture platform that encounters sporadic system failures when faced with heavy traffic events. System resilience stands as the key requirement for e-commerce platforms during scaling operations to keep services operational and deliver performance excellence to users.

Engineering

Engineering Traffic Architecture Network

Title Launch Observability at Netflix Scale

The Netflix TechBlog

MARCH 4, 2025

Part 3: System Strategies and Architecture By: VarunKhaitan With special thanks to my stunning colleagues: Mallika Rao , Esmir Mesic , HugoMarques This blog post is a continuation of Part 2 , where we cleared the ambiguity around title launch observability at Netflix. The response schema for the observability endpoint.

Traffic

Traffic Strategy Entertainment Innovation

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Dynatrace

NOVEMBER 22, 2022

For retail organizations, peak traffic can be a mixed blessing. While high-volume traffic often boosts sales, it can also compromise uptimes. The nirvana state of system uptime at peak loads is known as “five-nines availability.” But is five nines availability attainable? Downtime per year. 90% (one nine).

Infrastructure

Infrastructure Availability Systems Retail

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Implementing clustering and quorum queues in RabbitMQ significantly improves load distribution and data redundancy, ensuring high availability and fault tolerance for messaging services.

Best Practices

Best Practices Traffic Strategy Efficiency

Auth0 Architecture: Running In Multiple Cloud Providers And Regions

High Scalability

AUGUST 27, 2018

com and the strategies we use to keep it up and running with high availability. The number of services that compose our product in order to scale our organization and handle the increases in traffic went from under 10 to over 30 services. Core service architecture. A lot has changed since then in Auth0.

Architecture

Architecture Cloud Traffic Infrastructure

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

This article outlines the key differences in architecture, performance, and use cases to help determine the best fit for your workload. RabbitMQ follows a message broker model with advanced routing, while Kafkas event streaming architecture uses partitioned logs for distributed processing. What is RabbitMQ? What is Apache Kafka?

Latency

Latency Analytics Architecture Storage

Mastering Scalability and Performance: A Deep Dive Into Azure Load Balancing Options

DZone

JANUARY 8, 2024

As organizations increasingly migrate their applications to the cloud, efficient and scalable load balancing becomes pivotal for ensuring optimal performance and high availability. Load balancing is a critical component in cloud architectures for various reasons. What Is Load Balancing?

Azure

Azure Scalability Traffic Performance

OneAgent for Linux on IBM Z (General Availability)

Dynatrace

NOVEMBER 20, 2019

Having released this functionality in an Early Adopter Release with OneAgent version 1.173 and Dynatrace version 1.174 back in August 2019, we’re now happy to announce the General Availability of OneAgent full-stack monitoring for Linux on the IBM Z platform, sometimes informally referred to as Z/Linux. Release details.

Availability

Availability Hardware Java Tuning

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Architecture Overview The first pivotal step in managing impressions begins with the creation of a Source-of-Truth (SOT) dataset. This dual availability ensures immediate processing capabilities alongside comprehensive long-term data retention. Thus, all data in one region is processed by the Flink job deployed within thatregion.

Tuning

Tuning Latency Efficiency Storage

Ready-to-Use High Availability Architectures for MySQL and PostgreSQL

Percona

JUNE 12, 2023

When it comes to access to their applications, users demand instant, reliable, and secure interactions — and that means databases must be highly available. With database high availability (HA), services are largely uninterrupted, and end users are largely satisfied. The obvious answer is this: To achieve high availability.

Architecture

Architecture Availability Open Source Healthcare

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

Motivation With the rapid growth in Netflix member base and the increasing complexity of our systems, our architecture has evolved into an asynchronous one that enables both online and offline computation. This helps limit the outgoing traffic footprint considerably.

Systems

Systems Traffic Architecture Mobile

Architected for resiliency: How Dynatrace withstands data center outages

Dynatrace

JUNE 15, 2021

The fact is, Reliability and Resiliency must be rooted in the architecture of a distributed system. The subject line said: “Success Story: Major Issue in single AWS Frankfurt Availability Zone!” The problem started at 1:24PM PDT, with the services starting to become available again about 3 hours later.

AWS

AWS Traffic Architecture Azure

5 powerful use cases beyond debugging for Dynatrace Live Debugger

Dynatrace

MARCH 25, 2025

Load generators simulate traffic. Its a tool you can use in any environment or architecture, instantly showing you the innermost workings of your code wherever and whenever you need it. Performance benchmarking Performance benchmarking is one of the unresolved mysteries of software engineering. Sometimes, you need heavyweight tools.

Benchmarking

Benchmarking Code Open Source Engineering

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

NOVEMBER 2, 2020

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. Logs and background requests are examples of this type of traffic.

Traffic

Traffic Metrics Infrastructure Architecture

Unlock end-to-end observability insights with Dynatrace PurePath 4 seamless integration of OpenTracing for Java

Dynatrace

DECEMBER 9, 2020

Cloud-native technologies and microservice architectures have shifted technical complexity from the source code of services to the interconnections between services. Heterogeneous cloud-native microservice architectures can lead to visibility gaps in distributed traces. Dynatrace news.

Java

Java Traffic Architecture Strategy

The Ultimate Guide to Database High Availability

Percona

JUNE 22, 2023

To make data count and to ensure cloud computing is unabated, companies and organizations must have highly available databases. This guide provides an overview of what high availability means, the components involved, how to measure high availability, and how to achieve it. How does high availability work?

Availability

Availability Database Open Source Hardware

Geek Reading - Week of June 5, 2013

DZone

OCTOBER 11, 2022

Making Google’s CalDAV and CardDAV APIs available for everyone ( Google Developers Blog). Improving testing by using real traffic from production ( Hacker News). Pandora launches new HTML5 site for TVs and gaming consoles, available now on PS3 and Xbox 360 ( The Next Web). History of Lisp ( Hacker News). Hacker News).

Java

Java Best Practices Google Analytics

7 Best Performance Testing Tools to Look Out for in 2021

DZone

DECEMBER 28, 2020

The system could work efficiently with a specific number of concurrent users; however, it may get dysfunctional with extra loads during peak traffic. It is almost a part of the wider performance engineering portrait, concentrating on performance glitches in the architecture and design of any software.

Performance Testing

Performance Testing Testing Tools Testing Performance

OneAgent for Linux on IBM Z now available in Early Adopter Release

Dynatrace

AUGUST 8, 2019

We’re happy to announce the Early Adopter Release of OneAgent full-stack monitoring for Linux on the IBM Z platform, sometimes informally referred to as Z/Linux (available with OneAgent version 1.173 and Dynatrace version 1.174). For details on available metrics, see our help page on host performance monitoring. Dynatrace news.

Availability

Availability Hardware Java Tuning

Network performance monitoring top of mind for CloudOps teams

Dynatrace

MAY 19, 2023

Network traffic growth is the main reason for increasing spending, largely because of the adoption of hybrid and multi-cloud architectures. What are the issues with traffic losses and connectivity drops? Without the network, nothing will happen,” Ziemianowicz said.

Network

Network Monitoring Performance Traffic

What is a service mesh?

Dynatrace

MAY 21, 2021

This becomes even more challenging when the application receives heavy traffic, because a single microservice might become overwhelmed if it receives too many requests too quickly. A service mesh is a dedicated infrastructure layer built into an application that controls service-to-service communication in a microservices architecture.

Traffic

Traffic DevOps Infrastructure Network

General availability of OneAgent full-stack monitoring for AIX

Dynatrace

APRIL 16, 2019

We’re proud to announce the general availability of OneAgent full-stack monitoring for the AIX operating system. Monitoring IBM Power Systems isn’t a simple task, due to its specific architecture, there aren’t many tools available on the market. The ones that are available are old generation.

Availability

Availability Monitoring Metrics Operating System

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

Cloud migration is the process of transferring some or all your data, software, and operations to a cloud-based computing environment that offers unlimited scale and high availability. In case of a spike in traffic, you can automatically spin up more resources, often in a matter of seconds. Improved performance and availability.

Cloud

Cloud Traffic Best Practices Strategy

MySQL High Availability Framework Explained – Part III: Failover Scenarios

High Scalability

APRIL 16, 2019

In this three-part blog series, we introduced a High Availability (HA) Framework for MySQL hosting in Part I, and discussed the details of MySQL semisynchronous replication in Part II. Now in Part III, we review how the framework handles some of the important MySQL failure scenarios and recovers to ensure high availability.

Availability

Availability Network Azure AWS

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

With more organizations taking the multicloud plunge, monitoring cloud infrastructure is critical to ensure all components of the cloud computing stack are available, high-performing, and secure. Website monitoring examines a cloud-hosted website’s processes, traffic, availability, and resource use. Database monitoring.

Cloud

Cloud Monitoring Best Practices Infrastructure

Observe syslog with Dynatrace ActiveGate, a secure, trusted edge component

Dynatrace

JULY 15, 2024

Finally, adding additional components on the edge to filter and transform syslog messages (for example, Dynatrace OpenTelemetry distribution ) isn’t always possible due to architectural reasons or because it adds unnecessary complexity and cost of ownership when scaling your business. Setting up your first Environment ActiveGate?

Infrastructure

Infrastructure Network Azure Monitoring

Automated Deployment and Architectural Validation with Pitometer and keptn!

Dynatrace

APRIL 30, 2019

At its heart it uses Istio (for traffic control) and Knative (for event driven tool orchestration) and stores all configuration in Git – following the GitOps approach. If there is no Pitometer source implementation available for your tool no worries – check out the reference implementations of Dynatrace or Prometheus and see how easy it is.

Architecture

Architecture Open Source Azure Metrics

Lessons learned from enterprise service-level objective management

Dynatrace

MAY 19, 2022

Every organization’s goal is to keep its systems available and resilient to support business demands. Example 1: Architecture boundaries. This view shows the availability SLO for key application functions, like login and vehicle list, as well as a large set of timeframes, like last 30 minutes, last hour, today, and last six days.

Automotive

Automotive Latency Architecture Azure

Percona Monitoring and Management High Availability – A Proof of Concept

Percona

DECEMBER 21, 2023

Being software composed of different, multiple technologies can add complexity to a well-known concept: High Availability (HA). As you can see, stats are available via the port 8404. Failover The failover will be handled by the HAProxy automatically when it detects that the current primary is no longer available.

Availability

Availability Monitoring Open Source Traffic

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

Keeping pace with modern digital transformation requires ensuring that applications are responsive, resilient, and always available amid increased complexity. Microservices-based architectures and software containers enable organizations to deploy and modify applications with unprecedented speed. availability.

Best Practices

Best Practices DevOps Latency Metrics

New Dynatrace Operator elevates cloud-native observability for Kubernetes

Dynatrace

MAY 5, 2021

Today we’re proud to announce the new Dynatrace Operator, designed from the ground up to handle the lifecycle of OneAgent, Kubernetes API monitoring, OneAgent traffic routing, and all future containerized componentry such as the forthcoming extension framework. Dynatrace Operator for OneAgent, API monitoring, routing, and more.

Cloud

Cloud Traffic Monitoring Open Source

Evolving Regional Evacuation

The Netflix TechBlog

SEPTEMBER 23, 2019

This means that our microservices constantly evolve and change, but what doesn’t change is our responsibility to provide a highly available service that delivers 100+ million hours of daily streaming to our subscribers. So, if we evacuate South American traffic to North America, demand for CE and Android DRM won’t grow uniformly.

Traffic

Traffic Metrics Mobile Government

Automatic intelligent observability into Envoy-proxied services of your Istio service mesh (GA)

Dynatrace

OCTOBER 13, 2021

Additionally, with OneAgent version 1.205, out-of-the-box service-level insights into your Istio Ingress/Egress Envoys is also generally available. Istio is one of the most popular service meshes It allows you to manage complex microservice architectures based on configuration—there’s no need to change any application code.

Traffic

Traffic Monitoring Technology Technology

What is security analytics?

Dynatrace

JUNE 10, 2024

For example, an organization might use security analytics tools to monitor user behavior and network traffic. Security analytics must also contend with the multicomponent architecture of modern IT infrastructure. Additionally, with the Dynatrace Query Language, data is available in real time.

Analytics

Analytics Network Open Source Hardware

Efficient SLO event integration powers successful AIOps

Dynatrace

APRIL 5, 2024

When the SLO status converges to an optimal value of 100%, and there’s substantial traffic (calls/min), BurnRate becomes more relevant for anomaly detection. Let’s assume we created a service-availability SLO, monitoring the request failure count against the overall request counts. What characterizes a weak SLO?

Efficiency

Efficiency Traffic Tuning Metrics

Implementing service-level objectives to improve software quality

Dynatrace

DECEMBER 27, 2022

As more organizations embrace microservices-based architecture to deliver goods and services digitally, maintaining customer satisfaction has become exponentially more challenging. First, it helps to understand that applications and all the services and infrastructure that support them generate telemetry data based on traffic from real users.

Software

Software Software Benchmarking Latency

Zero Configuration Service Mesh with On-Demand Cluster Discovery

The Netflix TechBlog

AUGUST 29, 2023

Since there were no existing solutions available, we needed to build them ourselves. To improve availability, we designed systems where components could fail separately and avoid single points of failure. In this architecture, service to service communication no longer goes through the single point of failure of a load balancer.

Traffic

Traffic Latency Cloud C++

How to Optimize Digital Experience and Operations with Dynatrace

Dynatrace

AUGUST 30, 2019

We have several YouTube Tutorials and blog posts available that show how you can use Dynatrace RUM data for Web Performance & User Experience Optimization. Reducing performance and architectural issues in their backend system gave them a 99% performance improvement! Impressive results I have to say!

Cache

Cache Database Architecture Government

Setting Up and Deploying PostgreSQL for High Availability

Percona

JULY 7, 2023

With the average cost of unplanned downtime running from $300,000 to $500,000 per hour , businesses are increasingly using high availability (HA) technologies to maximize application uptime. Unfortunately, using certain open source database software as part of an HA architecture can present significant challenges.

Availability

Availability Open Source Architecture Database

High Availability vs. Fault Tolerance: Is FT’s 00.001% Edge in Uptime Worth the Headache?

Percona

AUGUST 22, 2023

With so much at stake, database high availability and fault tolerance have become must-have items, but many companies just aren’t certain which one they must have. This blog article will examine shared attributes of high availability (HA) and fault tolerance (FT). What does high availability mean?

Availability

Availability Hardware Open Source Database

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

Azure Traffic Manager. Get insights into various aspects of database performance, including SQL queries or procedures, SQL modifications, SQL transactions, any detected problems or availability issues, hotspots, and more—all the valuable information that a DevOps team could ask for to optimize database performance. Azure Batch.

Azure

Azure Cloud Big Data Virtualization

Telltale: Netflix Application Monitoring Simplified

The Netflix TechBlog

AUGUST 13, 2020

We also highlight interesting broader events such as regional traffic evacuations and nearby deployments , information that is vital to understanding health holistically. Regional traffic evacuations. A regional traffic shift means one region ends up with zero traffic while another region has double. But we’re not done.

Monitoring

Monitoring Tuning Traffic Metrics

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Trending Sources

Chaos Engineering With Litmus: A CNCF Incubating Project

Title Launch Observability at Netflix Scale

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Best Practices for Scaling RabbitMQ

Auth0 Architecture: Running In Multiple Cloud Providers And Regions

RabbitMQ vs. Kafka: Key Differences

Mastering Scalability and Performance: A Deep Dive Into Azure Load Balancing Options

OneAgent for Linux on IBM Z (General Availability)

Introducing Impressions at Netflix

Ready-to-Use High Availability Architectures for MySQL and PostgreSQL

Rapid Event Notification System at Netflix

Architected for resiliency: How Dynatrace withstands data center outages

5 powerful use cases beyond debugging for Dynatrace Live Debugger

Keeping Netflix Reliable Using Prioritized Load Shedding

Unlock end-to-end observability insights with Dynatrace PurePath 4 seamless integration of OpenTracing for Java

The Ultimate Guide to Database High Availability

Geek Reading - Week of June 5, 2013

7 Best Performance Testing Tools to Look Out for in 2021

OneAgent for Linux on IBM Z now available in Early Adopter Release

Network performance monitoring top of mind for CloudOps teams

What is a service mesh?

General availability of OneAgent full-stack monitoring for AIX

What is cloud migration?

MySQL High Availability Framework Explained – Part III: Failover Scenarios

What is cloud monitoring? How to improve your full-stack visibility

Observe syslog with Dynatrace ActiveGate, a secure, trusted edge component

Automated Deployment and Architectural Validation with Pitometer and keptn!

Lessons learned from enterprise service-level objective management

Percona Monitoring and Management High Availability – A Proof of Concept

Site reliability done right: 5 SRE best practices that deliver on business objectives

New Dynatrace Operator elevates cloud-native observability for Kubernetes

Evolving Regional Evacuation

Automatic intelligent observability into Envoy-proxied services of your Istio service mesh (GA)

What is security analytics?

Efficient SLO event integration powers successful AIOps

Implementing service-level objectives to improve software quality

Zero Configuration Service Mesh with On-Demand Cluster Discovery

How to Optimize Digital Experience and Operations with Dynatrace

Setting Up and Deploying PostgreSQL for High Availability

High Availability vs. Fault Tolerance: Is FT’s 00.001% Edge in Uptime Worth the Headache?

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Telltale: Netflix Application Monitoring Simplified

Stay Connected