Strategy, Systems and Traffic - Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience.

Traffic

Traffic Latency Tuning Systems

Black Friday traffic exposes gaps in observability strategies

Dynatrace

SEPTEMBER 2, 2022

What’s the problem with Black Friday traffic? But that’s difficult when Black Friday traffic brings overwhelming and unpredictable peak loads to retailer websites and exposes the weakest points in a company’s infrastructure, threatening application performance and user experience. Why Black Friday traffic threatens customer experience.

Traffic

Traffic Strategy Retail Ecommerce

Kubernetes security essentials: Kubernetes misconfiguration attack paths and mitigation strategies

Dynatrace

APRIL 22, 2025

The following diagram shows a brief overview of some common security misconfigurations in Kubernetes and how these map to specific attacker tactics and techniques in the K8s Threat Matrix using a common attack strategy. The Kubernetes threat matrix illustrates how attackers can exploit Kubernetes misconfigurations.

Strategy

Strategy Azure Best Practices Network

Best Practices for Designing Resilient APIs for Scalability and Reliability

DZone

JANUARY 8, 2025

API resilience is about creating systems that can recover gracefully from disruptions, such as network outages or sudden traffic spikes, ensuring they remain reliable and secure. This has become critical since APIs serve as the backbone of todays interconnected systems.

Best Practices

Best Practices Design Scalability Architecture

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic

Traffic Metrics Systems Strategy

Title Launch Observability at Netflix Scale

The Netflix TechBlog

MARCH 4, 2025

Part 3: System Strategies and Architecture By: VarunKhaitan With special thanks to my stunning colleagues: Mallika Rao , Esmir Mesic , HugoMarques This blog post is a continuation of Part 2 , where we cleared the ambiguity around title launch observability at Netflix. The request schema for the observability endpoint.

Traffic

Traffic Strategy Entertainment Innovation

A Comprehensive Guide to Database Sharding: Building Scalable Systems

DZone

OCTOBER 2, 2024

In this article, we’ll dive deep into the concept of database sharding, a critical technique for scaling databases to handle large volumes of data and high levels of traffic. This section will provide insights into the architecture and strategies to ensure efficient query processing in a sharded environment.

Database

Database Systems Scalability Traffic

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Dynatrace

JANUARY 21, 2025

For example, if you’re monitoring network traffic and the average over the past 7 days is 500 Mbps, the threshold will adapt to this baseline. An anomaly will be identified if traffic suddenly drops below 200 Mbps or above 800 Mbps, helping you identify unusual spikes or drops.

Traffic

Traffic Metrics Analytics Monitoring

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

To achieve this, we are committed to building robust systems that deliver comprehensive observability, enabling us to take full accountability for every title on ourservice. Each title represents countless hours of effort and creativity, and our systems need to honor that uniqueness. Yet, these pages couldnt be more different.

Traffic

Traffic Scalability Strategy Monitoring

The keys to selecting a platform for end-to-end observability

Dynatrace

DECEMBER 2, 2024

Clearly, continuing to depend on siloed systems, disjointed monitoring tools, and manual analytics is no longer sustainable. This enables proactive changes such as resource autoscaling, traffic shifting, or preventative rollbacks of bad code deployment ahead of time.

Artificial Intelligence

Artificial Intelligence DevOps Architecture Cloud

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Youll also learn strategies for maintaining data safety and managing node failures so your RabbitMQ setup is always up to the task. This decoupling is crucial in modern architectures where scalability and fault tolerance are paramount.

Best Practices

Best Practices Traffic Strategy Scalability

Digital transformation strategies: Success stories from three digital transformation journeys

Dynatrace

MAY 8, 2023

Digital transformation strategies are fundamentally changing how organizations operate and deliver value to customers. A comprehensive digital transformation strategy can help organizations better understand the market, reach customers more effectively, and respond to changing demand more quickly. Competitive advantage.

Strategy

Strategy Retail DevOps Traffic

Ensuring the Successful Launch of Ads on Netflix

The Netflix TechBlog

JUNE 1, 2023

To do this, we devised a novel way to simulate the projected traffic weeks ahead of launch by building upon the traffic migration framework described here. New content or national events may drive brief spikes, but, by and large, traffic is usually smoothly increasing or decreasing.

Traffic

Traffic Best Practices Systems Testing

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

The three strategies we will discuss today are AB Testing , Replay Testing, and Sticky Canaries. Let’s discuss the three testing strategies in further detail. The control group’s traffic utilized the legacy Falcor stack, while the experiment population leveraged the new GraphQL client and was directed to the GraphQL Shim.

Traffic

Traffic Latency Metrics Cache

Kubernetes security essentials: Understanding Kubernetes security misconfigurations

Dynatrace

APRIL 22, 2025

You might have state-of-the-art surveillance systems and guards at the main entrance, but if a side door is left unlocked, all the security becomes meaningless. What seems like an innocuous config file could contain the access credentials to your most sensitive systems. Security principle. Common misconfiguration. Real-world impact.

Network

Network Servers Strategy Best Practices

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. In this multi-part blog series, we take you behind the scenes of our system that processes billions of impressions daily.

Tuning

Tuning Latency Efficiency Storage

COVID-19 and Digital Services: An Action Plan for the Unexpected

Dynatrace

APRIL 22, 2020

All of this puts a lot of pressure on IT systems and applications. There are proven strategies for handling this. Step 1: Understand Traffic Patterns and Potential Spikes; Remove Team Silos. Step 1: Understand Traffic Patterns and Potential Spikes; Remove Team Silos. We refer to this as a BizDevOps strategy.

Traffic

Traffic Ecommerce Retail Government

Six causes of major software outages–And how to avoid them

Dynatrace

AUGUST 8, 2024

It’s also critical to have a strategy in place to address these outages, including both documented remediation processes and an observability platform to help you proactively identify and resolve issues to minimize customer and business impact. Outages can disrupt services, cause financial losses, and damage brand reputations.

Software

Software Software Infrastructure Network

Build automated self-healing systems with xMatters and Dynatrace (Part 3 of 3)

Dynatrace

SEPTEMBER 20, 2019

One of the several deployment strategies is the blue/green deployment approach: In this method, two identical production environments work in parallel. One is the currently-running production environment receiving all user traffic (let’s say the “blue” one), the other is a clone of it (“green”), but idle.

Systems

Systems Traffic DevOps Database

FIFO vs. LIFO: Which Queueing Strategy Is Better for Availability and Latency?

DZone

MARCH 14, 2023

But what happens when traffic bursts overwhelm your system? In this post, we'll explore both strategies through a simple simulation in Colab, allowing you to see the impact of changing parameters on system performance. Queueing requests is a common solution, but what's the best approach: FIFO or LIFO?

Strategy

Strategy Latency Availability Traffic

A Dynatrace champions guide to get ahead of digital marketing campaigns

Dynatrace

JULY 1, 2020

In my last blog , I’ve provided an example of this happening, whereby the traffic spiked and quadrupled the usual incoming traffic. These are all interesting metrics from marketing point of view, and also highly interesting to you as they allow you to engage with the teams that are driving the traffic against your IT-system.

Traffic

Traffic Analytics Metrics Servers

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

A cloud migration strategy, however, provides technical optimization that’s also firmly rooted in the business value chain. Migrating to the cloud is a strategy many organizations pursue to streamline and consolidate their security efforts. Likewise, you can scale down when your application experiences decreased traffic.

Cloud

Cloud Traffic Best Practices Hardware

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Introduction to Message Brokers Message brokers enable applications, services, and systems to communicate by acting as intermediaries between senders and receivers. This decoupling simplifies system architecture and supports scalability in distributed environments.

Latency

Latency Analytics Architecture Storage

Protect your organization against zero-day vulnerabilities

Dynatrace

AUGUST 3, 2022

Malicious attackers have gotten increasingly better at identifying vulnerabilities and launching zero-day attacks to exploit these weak points in IT systems. To ensure the safety of their customers, employees, and business data, organizations must have a strategy to protect against zero-day vulnerabilities.

Java

Java Traffic Benchmarking Strategy

Network performance monitoring top of mind for CloudOps teams

Dynatrace

MAY 19, 2023

Network traffic growth is the main reason for increasing spending, largely because of the adoption of hybrid and multi-cloud architectures. Many organizations, somewhat erroneously, respond to cloud complexity by using multiple tools to monitor and manage system health. Without the network, nothing will happen,” Ziemianowicz said.

Network

Network Monitoring Performance Traffic

Multi Cloud vs Hybrid Cloud Strategy

Scalegrid

JANUARY 8, 2024

Confused about multi-cloud vs hybrid cloud and which is the right strategy for your organization? Real-world examples like Spotify’s multi-cloud strategy for cost reduction and performance, and Netflix’s hybrid cloud setup for efficient content streaming and creation, illustrate the practical applications of each model.

Cloud

Cloud Strategy Scalability Artificial Intelligence

Responsible AI must-haves for unified observability and security

Dynatrace

JANUARY 4, 2024

However, as AI systems become more complex and sophisticated, organizations are learning that they need to ensure the AI they use is responsible and trustworthy. It can be difficult to understand the basis of AI systems’ decisions, particularly when they are trained on large and complex data sets. AI system bias.

Artificial Intelligence

Artificial Intelligence Strategy Virtualization Traffic

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. This guide delves into how these systems work, the challenges they solve, and their essential role in businesses and technology.

Storage

Storage Systems Big Data Azure

Transparent and confident software delivery with Dynatrace Release Analysis

Dynatrace

APRIL 28, 2021

Organizations that have transitioned to agile software development strategies (including the adoption of a DevOps culture and continuous delivery automation) enforce automated solutions for such decision making—or at the very least, use automation in the gathering of a release-quality metrics. Release information from issue tracking systems.

Software

Software Software Strategy Metrics

APRA CPS 230 compliance, explained

Dynatrace

NOVEMBER 2, 2023

If your organisation is involved in achieving APRA compliance, you are likely facing the daunting effort of de-risking critical system delivery. Moreover, for banking organisations, there is a good chance some of those systems are outdated. However, too often, the starting point is unclear.

Cloud

Cloud Infrastructure Strategy Open Source

Why business resiliency depends on unified observability and security

Dynatrace

SEPTEMBER 3, 2024

Implementing a robust monitoring and observability strategy has become the foundation of an organization’s ability to improve business resiliency and stay in control of their critical IT environments. Each of these factors can present unique challenges individually or in combination.

Infrastructure

Infrastructure Innovation Monitoring Software Performance

Automated Change Impact Analysis with Site Reliability Guardian

Dynatrace

FEBRUARY 15, 2023

Streamline development and delivery processes Nowadays, digital transformation strategies are executed by almost every organization across all industries. SREs use Service-Level Indicators (SLI) to see the complete picture of service availability, latency, performance, and capacity across various systems, especially revenue-critical systems.

DevOps

DevOps Latency Traffic Best Practices

CrowdStrike update crisis: How Dynatrace helped customers recover in hours

Dynatrace

JULY 31, 2024

The resulting outages wreaked havoc on customer experiences and left IT professionals scrambling to quickly find and repair affected systems. The crisis has emphasized the importance of having a strategy for maintaining stability and performance. The ripple effects on the global supply chain have been equally significant.

Airlines

Airlines Monitoring Healthcare Traffic

Why End User Experience Monitoring is critical for IT teams?

Dynatrace

AUGUST 13, 2020

With our IT systems, we’re crafting the Digital Experiences for our customers as they use the digital touch points we offer them, may it be a mobile app, web application, Amazon Alexa skill, ATM, a Check-in Kiosk at the airport, or a TV app. Extend and automate your SRE strategy to Business Level Objective Monitoring.

Monitoring

Monitoring Airlines Analytics Strategy

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Note: Contrary to what the name may suggest, this system is not built as a general-purpose time series database. Those use cases are well served by the Netflix Atlas telemetry system. Effectively managing this data at scale to extract valuable insights is crucial for ensuring optimal user experiences and system reliability.

Latency

Latency Storage Traffic Tuning

Efficient SLO event integration powers successful AIOps

Dynatrace

APRIL 5, 2024

When the SLO status converges to an optimal value of 100%, and there’s substantial traffic (calls/min), BurnRate becomes more relevant for anomaly detection. SLOs must be evaluated at 100%, even when there is currently no traffic. What characterizes a weak SLO? Use the default transformation.

Efficiency

Efficiency Traffic Tuning Metrics

What is security analytics?

Dynatrace

JUNE 10, 2024

For example, an organization might use security analytics tools to monitor user behavior and network traffic. Teams can then act before attackers have the chance to compromise key data or bring down critical systems. This data helps teams see where attacks began, which systems were targeted, and what techniques attackers used.

Analytics

Analytics Network Open Source Hardware

Evolving Regional Evacuation

The Netflix TechBlog

SEPTEMBER 23, 2019

In the event of an isolated failure we first pre-scale microservices in the healthy regions after which we can shift traffic away from the failing one. In 2013 we first developed our multi-regional availability strategy in response to a catalyst that led us to re-architect the way our service operates.

Traffic

Traffic Metrics Mobile Government

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

As the number of Titus users increased over the years, the load and pressure on the system increased substantially. cell): Titus Job Coordinator is a leader elected process managing the active state of the system. For example, a batch workflow orchestration system may create multiple jobs which are part of a single workflow execution.

Cache

Cache Latency Traffic Systems

Customer expectations for retail: Beyond digital experience

Dynatrace

AUGUST 28, 2023

IT teams spend months preparing for the peak traffic they anticipate will arrive with holiday shopping. Let’s shift our focus to the backend systems and business processes, the behind-the-scenes heroes of end-to-end customer experience. (Though the three-second rule for page load time is often misinterpreted). Technology to the rescue?

Retail

Retail Logistics Innovation Analytics

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

which is difficult when troubleshooting distributed systems. Troubleshooting a session in Edgar When we started building Edgar four years ago, there were very few open-source distributed tracing systems that satisfied our needs. Investigating a video streaming failure consists of inspecting all aspects of a member account.

Infrastructure

Infrastructure Transportation Storage Open Source

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

Scalegrid

APRIL 16, 2020

In this post, we compare ScaleGrid’s Bring Your Own Cloud (BYOC) plan vs. the standard Dedicated Hosting model to help you determine the best strategy for your MySQL, PostgreSQL, Redis™ and MongoDB® database deployment. This can result in significant cost savings for high traffic applications. Expert Tip. Security Groups.

Cloud

Cloud Azure AWS Database

5 Steps to Accelerate your Cloud Migration with Dynatrace

Dynatrace

AUGUST 5, 2019

Resource consumption & traffic analysis. If you want to read up on migration strategies check out my blog on 6-R Migration Strategies. What is the network traffic going to be between services we migrate and those that have to stay in the current data center? Step 3: Detailed Traffic Dependency Analysis.

Cloud

Cloud Traffic Database Network

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Black Friday traffic exposes gaps in observability strategies

Trending Sources

Kubernetes security essentials: Kubernetes misconfiguration attack paths and mitigation strategies

Best Practices for Designing Resilient APIs for Scalability and Reliability

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Title Launch Observability at Netflix Scale

A Comprehensive Guide to Database Sharding: Building Scalable Systems

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Title Launch Observability at Netflix Scale

The keys to selecting a platform for end-to-end observability

Best Practices for Scaling RabbitMQ

Digital transformation strategies: Success stories from three digital transformation journeys

Ensuring the Successful Launch of Ads on Netflix

Migrating Netflix to GraphQL Safely

Kubernetes security essentials: Understanding Kubernetes security misconfigurations

Introducing Impressions at Netflix

COVID-19 and Digital Services: An Action Plan for the Unexpected

Six causes of major software outages–And how to avoid them

Build automated self-healing systems with xMatters and Dynatrace (Part 3 of 3)

FIFO vs. LIFO: Which Queueing Strategy Is Better for Availability and Latency?

A Dynatrace champions guide to get ahead of digital marketing campaigns

What is cloud migration?

RabbitMQ vs. Kafka: Key Differences

Protect your organization against zero-day vulnerabilities

Network performance monitoring top of mind for CloudOps teams

Multi Cloud vs Hybrid Cloud Strategy

Top PostgreSQL 17 New Features

Responsible AI must-haves for unified observability and security

What is a Distributed Storage System

Transparent and confident software delivery with Dynatrace Release Analysis

APRA CPS 230 compliance, explained

Why business resiliency depends on unified observability and security

Automated Change Impact Analysis with Site Reliability Guardian

CrowdStrike update crisis: How Dynatrace helped customers recover in hours

Why End User Experience Monitoring is critical for IT teams?

Introducing Netflix TimeSeries Data Abstraction Layer

Efficient SLO event integration powers successful AIOps

What is security analytics?

Evolving Regional Evacuation

Consistent caching mechanism in Titus Gateway

Customer expectations for retail: Beyond digital experience

Building Netflix’s Distributed Tracing Infrastructure

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

5 Steps to Accelerate your Cloud Migration with Dynatrace

Stay Connected