Availability, Strategy and Traffic - Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. This approach has a handful of benefits.

Traffic

Traffic Latency Tuning Systems

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic

Traffic Metrics Systems Strategy

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Dynatrace

JANUARY 21, 2025

Activate Davis AI to analyze charts within seconds Davis AI can help you expand your dashboards and dive deeper into your available data to extract additional information. For example, if you’re monitoring network traffic and the average over the past 7 days is 500 Mbps, the threshold will adapt to this baseline.

Traffic

Traffic Metrics Analytics Monitoring

Title Launch Observability at Netflix Scale

The Netflix TechBlog

MARCH 4, 2025

Part 3: System Strategies and Architecture By: VarunKhaitan With special thanks to my stunning colleagues: Mallika Rao , Esmir Mesic , HugoMarques This blog post is a continuation of Part 2 , where we cleared the ambiguity around title launch observability at Netflix. We call this capability TimeTravel.

Traffic

Traffic Strategy Entertainment Innovation

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Youll also learn strategies for maintaining data safety and managing node failures so your RabbitMQ setup is always up to the task. This decoupling is crucial in modern architectures where scalability and fault tolerance are paramount.

Best Practices

Best Practices Traffic Strategy Efficiency

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Scalegrid

SEPTEMBER 5, 2024

Managing High Availability (HA) in your PostgreSQL hosting is very important to ensuring your database deployment clusters maintain exceptional uptime and strong operational performance so your data is always available to your application. Automatic failover is a critical strategy to achieve this.

Availability

Availability Servers Database Open Source

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

The three strategies we will discuss today are AB Testing , Replay Testing, and Sticky Canaries. Let’s discuss the three testing strategies in further detail. The control group’s traffic utilized the legacy Falcor stack, while the experiment population leveraged the new GraphQL client and was directed to the GraphQL Shim.

Traffic

Traffic Latency Metrics Cache

FIFO vs. LIFO: Which Queueing Strategy Is Better for Availability and Latency?

DZone

MARCH 14, 2023

As an engineer, you probably know that server performance under heavy load is crucial for maintaining the availability and responsiveness of your services. But what happens when traffic bursts overwhelm your system? Queueing requests is a common solution, but what's the best approach: FIFO or LIFO?

Strategy

Strategy Latency Availability Traffic

Ensuring the Successful Launch of Ads on Netflix

The Netflix TechBlog

JUNE 1, 2023

To do this, we devised a novel way to simulate the projected traffic weeks ahead of launch by building upon the traffic migration framework described here. New content or national events may drive brief spikes, but, by and large, traffic is usually smoothly increasing or decreasing.

Traffic

Traffic Best Practices Systems Testing

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

We can experiment with different content placements or promotional strategies to boost visibility and engagement. Analyzing impression history, for example, might help determine how well a specific row on the home page is functioning or assess the effectiveness of a merchandising strategy.

Tuning

Tuning Latency Efficiency Storage

COVID-19 and Digital Services: An Action Plan for the Unexpected

Dynatrace

APRIL 22, 2020

While most government agencies and commercial enterprises have digital services in place, the current volume of usage — including traffic to critical employment, health and retail/eCommerce services — has reached levels that many organizations have never seen before or tested against. There are proven strategies for handling this.

Traffic

Traffic Ecommerce Retail Government

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

Cloud migration is the process of transferring some or all your data, software, and operations to a cloud-based computing environment that offers unlimited scale and high availability. A cloud migration strategy, however, provides technical optimization that’s also firmly rooted in the business value chain.

Cloud

Cloud Traffic Best Practices Strategy

Automate CI/CD pipelines with Dynatrace: Part 2, Deploy stage

Dynatrace

NOVEMBER 28, 2023

Even when the staging environment closely mirrors the production environment, achieving a complete replication of all potential scenarios, such as simulating extremely high traffic volumes to assess software performance, remains challenging. This can lead to a lack of insight into how the code will behave when exposed to heavy traffic.

Traffic

Traffic Best Practices Strategy Engineering

A Dynatrace champions guide to get ahead of digital marketing campaigns

Dynatrace

JULY 1, 2020

In my last blog , I’ve provided an example of this happening, whereby the traffic spiked and quadrupled the usual incoming traffic. These are all interesting metrics from marketing point of view, and also highly interesting to you as they allow you to engage with the teams that are driving the traffic against your IT-system.

Traffic

Traffic Analytics Metrics Servers

Unlock end-to-end observability insights with Dynatrace PurePath 4 seamless integration of OpenTracing for Java

Dynatrace

DECEMBER 9, 2020

Let’s consider the business challenges of an online shop that is powered by a microservice architecture where several instances of each microservice run, including the shopping cart service, to ensure the highest possible availability. With Dynatrace OneAgent you also benefit from support for traffic routing and traffic control.

Java

Java Traffic Architecture Strategy

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

With more organizations taking the multicloud plunge, monitoring cloud infrastructure is critical to ensure all components of the cloud computing stack are available, high-performing, and secure. Website monitoring examines a cloud-hosted website’s processes, traffic, availability, and resource use. Database monitoring.

Cloud

Cloud Monitoring Best Practices Infrastructure

Auth0 Architecture: Running In Multiple Cloud Providers And Regions

High Scalability

AUGUST 27, 2018

com and the strategies we use to keep it up and running with high availability. The number of services that compose our product in order to scale our organization and handle the increases in traffic went from under 10 to over 30 services. A lot has changed since then in Auth0.

Architecture

Architecture Cloud Traffic Infrastructure

Network performance monitoring top of mind for CloudOps teams

Dynatrace

MAY 19, 2023

Network traffic growth is the main reason for increasing spending, largely because of the adoption of hybrid and multi-cloud architectures. Unifying data, unifying observability Merging siloed data streams in a unified observ ability strategy requires a different approach.

Network

Network Monitoring Performance Traffic

Multi Cloud vs Hybrid Cloud Strategy

Scalegrid

JANUARY 8, 2024

Confused about multi-cloud vs hybrid cloud and which is the right strategy for your organization? Real-world examples like Spotify’s multi-cloud strategy for cost reduction and performance, and Netflix’s hybrid cloud setup for efficient content streaming and creation, illustrate the practical applications of each model.

Cloud

Cloud Strategy Scalability Artificial Intelligence

APRA CPS 230 compliance, explained

Dynatrace

NOVEMBER 2, 2023

Enhanced customer confidence through excellent service availability. The good news: even for latecomers to the compliance party, compliance is perfectly doable within the timeframe given the right tools and strategies. Organisations typically waste valuable time discussing and deciding the right strategy for hunting down the problem.

Cloud

Cloud Infrastructure Strategy Open Source

The Ultimate Guide to Database High Availability

Percona

JUNE 22, 2023

To make data count and to ensure cloud computing is unabated, companies and organizations must have highly available databases. This guide provides an overview of what high availability means, the components involved, how to measure high availability, and how to achieve it. How does high availability work?

Availability

Availability Database Open Source Hardware

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Its design prioritizes high availability and efficient data transfer with minimal overhead, making it a practical choice for handling real-time data pipelines and distributed event processing. It follows a push-based approach, ensuring messages are distributed to consumers as soon as they become available.

Latency

Latency Analytics Architecture Storage

Get quick alerts and avoid false positives with the new baseline setting

Dynatrace

MARCH 26, 2020

This means that Dynatrace alerts more quickly when an error spike occurs in a high-traffic service (compared to a low-traffic service where statistical confidence is lower). The configuration is available at the global level as well as the service level. Close issues sooner with shorter event timeouts.

Traffic

Traffic Monitoring Efficiency Strategy

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

Scalegrid

APRIL 16, 2020

In this post, we compare ScaleGrid’s Bring Your Own Cloud (BYOC) plan vs. the standard Dedicated Hosting model to help you determine the best strategy for your MySQL, PostgreSQL, Redis™ and MongoDB® database deployment. Both AWS EC2 instances and Azure VM instances are available as Reserved Instances, and can be used through the BYOC plan.

Cloud

Cloud Azure AWS Database

What is synthetic testing?

Dynatrace

OCTOBER 16, 2023

Also called continuous monitoring or synthetic monitoring , synthetic testing mimics actual users’ behaviors to help companies identify and remediate potential availability and performance issues. Types of synthetic testing There are three broad types of synthetic testing: availability, web performance, and transaction.

Testing

Testing Best Practices Testing Tools Monitoring

Automated Change Impact Analysis with Site Reliability Guardian

Dynatrace

FEBRUARY 15, 2023

Streamline development and delivery processes Nowadays, digital transformation strategies are executed by almost every organization across all industries. SREs use Service-Level Indicators (SLI) to see the complete picture of service availability, latency, performance, and capacity across various systems, especially revenue-critical systems.

DevOps

DevOps Latency Traffic Best Practices

Evolving Regional Evacuation

The Netflix TechBlog

SEPTEMBER 23, 2019

This means that our microservices constantly evolve and change, but what doesn’t change is our responsibility to provide a highly available service that delivers 100+ million hours of daily streaming to our subscribers. So, if we evacuate South American traffic to North America, demand for CE and Android DRM won’t grow uniformly.

Traffic

Traffic Metrics Mobile Government

Efficient SLO event integration powers successful AIOps

Dynatrace

APRIL 5, 2024

When the SLO status converges to an optimal value of 100%, and there’s substantial traffic (calls/min), BurnRate becomes more relevant for anomaly detection. Let’s assume we created a service-availability SLO, monitoring the request failure count against the overall request counts. What characterizes a weak SLO?

Efficiency

Efficiency Traffic Tuning Metrics

Five observability predictions for 2025

Dynatrace

DECEMBER 13, 2024

Observability becomes mandatory for any serious sustainability strategy in IT. During the holiday season, an e-commerce platform anticipating a traffic surge could use preventive observability to predict slowdowns or overloads, proactively scale resources, optimize performance, and balance cloud costs.

Energy

Energy Logistics Healthcare Retail

Dynatrace observability for e-commerce helps SAP deliver a record holiday shopping season

Dynatrace

APRIL 30, 2021

Auer oversees the initiative to maintain the performance and availability of SAP Commerce Cloud , which underpins more than 3,500 e-commerce sites in more than 200 countries, processing transactions worth $500 billion annually. You can see the traffic reaching each milestone of an online shopping journey.

Retail

Retail Cloud Monitoring Hardware

Transparent and confident software delivery with Dynatrace Release Analysis

Dynatrace

APRIL 28, 2021

Organizations that have transitioned to agile software development strategies (including the adoption of a DevOps culture and continuous delivery automation) enforce automated solutions for such decision making—or at the very least, use automation in the gathering of a release-quality metrics. How Release Analysis works. Kubernetes metadata.

Software

Software Software Strategy Metrics

CrowdStrike update crisis: How Dynatrace helped customers recover in hours

Dynatrace

JULY 31, 2024

The crisis has emphasized the importance of having a strategy for maintaining stability and performance. By simulating user interactions and running tests from various locations worldwide, synthetic monitoring provides a comprehensive view of application performance and availability. Before a crisis. During a crisis.

Airlines

Airlines Monitoring Healthcare Traffic

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

As adoption rates for Microsoft Azure continue to skyrocket, Dynatrace is developing a deeper integration with the platform to provide even more value to organizations that run their businesses on Azure or use it as a part of their multi-cloud strategy. Azure Traffic Manager. Azure Batch. Azure DB for MariaDB. Azure DB for MySQL.

Azure

Azure Cloud Big Data Virtualization

5 Steps to Accelerate your Cloud Migration with Dynatrace

Dynatrace

AUGUST 5, 2019

Resource consumption & traffic analysis. If you want to read up on migration strategies check out my blog on 6-R Migration Strategies. What is the network traffic going to be between services we migrate and those that have to stay in the current data center? All available in Dynatrace in the UI or through the API!

Cloud

Cloud Traffic Database Network

The Best Way to Host MongoDB on DigitalOcean

Scalegrid

DECEMBER 16, 2019

In this post, we’ll walk you through the best way to host MongoDB on DigitalOcean, including the best instance types to use, disk types, replication strategy, and managed service providers. MongoDB Replication Strategies. DigitalOcean Advantages for MongoDB. minutes of downtime in one year.

Azure

Azure AWS Database Latency

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

However, storing and querying such data presents a unique set of challenges: High Throughput : Managing up to 10 million writes per second while maintaining high availability. Handling Bursty Traffic : Managing significant traffic spikes during high-demand events, such as new content launches or regional failovers.

Latency

Latency Storage Traffic Tuning

From syslog to AWS Firehose: Dynatrace log management innovations that enhance observability

Dynatrace

SEPTEMBER 5, 2024

Let’s delve deeper into how these capabilities can transform your observability strategy, starting with our new syslog support. It also enhances syslog messages with additional context and optimizes network traffic, improving overall system resilience and security.

Innovation

Innovation AWS Analytics Storage

Data Reprocessing Pipeline in Asset Management Platform @Netflix

The Netflix TechBlog

MARCH 10, 2023

Existing data got updated to be backward compatible without impacting the existing running production traffic. Data Sharding strategy in elasticsearch is updated to provide low search latency (as described in blog post) Design of new Cassandra reverse indices to support different sets of queries.

Media

Media Traffic Processing Design

Real user monitoring vs. synthetic monitoring: Understanding best practices

Dynatrace

JUNE 27, 2022

RUM, however, has some limitations, including the following: RUM requires traffic to be useful. Because RUM relies on user-generated traffic, it’s hard to indicate persistent issues across the board. This includes development, user acceptance testing, beta testing, and general availability. RUM generates a lot of data.

Best Practices

Best Practices Monitoring Wireless Traffic

What is security analytics?

Dynatrace

JUNE 10, 2024

For example, an organization might use security analytics tools to monitor user behavior and network traffic. Additionally, with the Dynatrace Query Language, data is available in real time. Data retention Data loaded into Grail is automatically retained for up to three years and is fully flexible based on business needs.

Analytics

Analytics Network Open Source Hardware

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Optimizing placements through combinatorial optimization What the OS task scheduler is doing is essentially solving a resource allocation problem: I have X threads to run but only Y CPUs available, how do I allocate the threads to the CPUs to give the illusion of concurrency? It has 8 physical hyperthreaded cores, split on 2 NUMA sockets.

Cache

Cache Latency Airlines Logistics

Delivering excellent digital experience for customers in a complex digital world

Dynatrace

MAY 24, 2021

In their case, this is specifically about the pensions element of their platform which had seen 6-7x as much traffic during the pandemic. Due to the uncertainty created during the pandemic, customers wanted instant, self-service access to know the value of their account,” the Lead for Service Reliability explained.

Games

Games Traffic Virtualization Google

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

Nonetheless, we found a number of limitations that could not satisfy our requirements e.g. stalling the processing of log events until a dump is complete, missing ability to trigger dumps on demand, or implementations that block write traffic by using table locks. Designed with High Availability in mind. Writing events to any output.

Database

Database Traffic Transportation Open Source

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Trending Sources

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Title Launch Observability at Netflix Scale

Best Practices for Scaling RabbitMQ

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Migrating Netflix to GraphQL Safely

FIFO vs. LIFO: Which Queueing Strategy Is Better for Availability and Latency?

Ensuring the Successful Launch of Ads on Netflix

Introducing Impressions at Netflix

COVID-19 and Digital Services: An Action Plan for the Unexpected

What is cloud migration?

Automate CI/CD pipelines with Dynatrace: Part 2, Deploy stage

A Dynatrace champions guide to get ahead of digital marketing campaigns

Unlock end-to-end observability insights with Dynatrace PurePath 4 seamless integration of OpenTracing for Java

What is cloud monitoring? How to improve your full-stack visibility

Top PostgreSQL 17 New Features

Auth0 Architecture: Running In Multiple Cloud Providers And Regions

Network performance monitoring top of mind for CloudOps teams

Multi Cloud vs Hybrid Cloud Strategy

APRA CPS 230 compliance, explained

The Ultimate Guide to Database High Availability

RabbitMQ vs. Kafka: Key Differences

Get quick alerts and avoid false positives with the new baseline setting

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

What is synthetic testing?

Automated Change Impact Analysis with Site Reliability Guardian

Evolving Regional Evacuation

Efficient SLO event integration powers successful AIOps

Five observability predictions for 2025

Dynatrace observability for e-commerce helps SAP deliver a record holiday shopping season

Transparent and confident software delivery with Dynatrace Release Analysis

CrowdStrike update crisis: How Dynatrace helped customers recover in hours

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

5 Steps to Accelerate your Cloud Migration with Dynatrace

The Best Way to Host MongoDB on DigitalOcean

Introducing Netflix TimeSeries Data Abstraction Layer

From syslog to AWS Firehose: Dynatrace log management innovations that enhance observability

Data Reprocessing Pipeline in Asset Management Platform @Netflix

Real user monitoring vs. synthetic monitoring: Understanding best practices

What is security analytics?

Predictive CPU isolation of containers at Netflix

Delivering excellent digital experience for customers in a complex digital world

DBLog: A Generic Change-Data-Capture Framework

Stay Connected