Traffic and Tuning - Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. This approach has a handful of benefits.

Traffic

Traffic Latency Tuning Systems

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic

Traffic Metrics Systems Strategy

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

To detect issues proactively, we need to simulate traffic and predict system behavior in advance. Once artificial traffic is generated, discarding the response object and relying solely on logs becomes inefficient. Stay tuned for a closer look at the innovation behind thescenes!

Traffic

Traffic Scalability Strategy Monitoring

Title Launch Observability at Netflix Scale

The Netflix TechBlog

MARCH 4, 2025

Accurately Reflecting Production Behavior A key part of our solution is insights into production behavior, which necessitates our requests to the endpoint result in traffic to the real service functions that mimics the same pathways the traffic would take if it came from the usualcallers. We call this capability TimeTravel.

Traffic

Traffic Strategy Entertainment Innovation

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

This approach ensures high availability by isolating regions, so if one becomes degraded, others remain unaffected, allowing traffic to be shifted between regions to maintain service continuity. Automating Performance Tuning with Autoscalers Tuning the performance of our Apache Flink jobs is currently a manual process.

Tuning

Tuning Latency Efficiency Storage

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

The control group’s traffic utilized the legacy Falcor stack, while the experiment population leveraged the new GraphQL client and was directed to the GraphQL Shim. This helped us successfully migrate 100% of the traffic on the mobile homepage canvas to GraphQL in 6 months.

Traffic

Traffic Latency Metrics Cache

TCP: Out of Memory — Consider Tuning TCP_Mem

DZone

SEPTEMBER 25, 2019

All other application instances were handling the traffic properly. The application was running on a GNU/Linux OS, Java 8, Tomcat 8 application server. All of a sudden, one of the application instances became unresponsive. Proxy Error The proxy server received an invalid response from an upstream server.

Tuning

Tuning Java Traffic AWS

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Optimizing RabbitMQ performance through strategies such as keeping queues short, enabling lazy queues, and monitoring health checks is essential for maintaining system efficiency and effectively managing high traffic loads.

Best Practices

Best Practices Traffic Strategy Scalability

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Optimizing RabbitMQ requires clustering, queue management, and resource tuning to maintain stability and efficiency. However, performance can decline under high traffic conditions. Kafka powers real-time streaming pipelines, ensuring applications can handle massive data traffic while maintaining performance and fault tolerance.

Latency

Latency Analytics Architecture Storage

Large scale deployments are easy and cost-effective with network zones (Early Adopter)

Dynatrace

JULY 2, 2020

Unnecessary traffic between such data centers can result in wasted resources, unpredictable downtimes, and lost business. By minimizing bandwidth and preventing unrelated traffic between data centers, you can maintain healthy network infrastructure and save on costs. optimizing traffic routing. What’s next.

Network

Network Traffic Infrastructure Tuning

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

We thus assigned a priority to each use case and sharded event traffic by routing to priority-specific queues and the corresponding event processing clusters. This separation allows us to tune system configuration and scaling policies independently for different event priorities and traffic patterns.

Systems

Systems Traffic Architecture Mobile

Telltale: Netflix Application Monitoring Simplified

The Netflix TechBlog

AUGUST 13, 2020

You’re half awake and wondering, “Is there really a problem or is this just an alert that needs tuning? Telltale learns what constitutes typical health for an application, no alert tuning required. Regional traffic evacuations. A regional traffic shift means one region ends up with zero traffic while another region has double.

Monitoring

Monitoring Tuning Traffic Metrics

Unlock end-to-end observability insights with Dynatrace PurePath 4 seamless integration of OpenTracing for Java

Dynatrace

DECEMBER 9, 2020

With Dynatrace OneAgent you also benefit from support for traffic routing and traffic control. OneAgent implements network zones to create traffic routing rules and limit cross data-center traffic. Stay tuned for upcoming announcements around OpenTracing and OpenTelemetry. What’s next?

Java

Java Traffic Serverless Architecture

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

NOVEMBER 2, 2020

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. Logs and background requests are examples of this type of traffic.

Traffic

Traffic Metrics Infrastructure Architecture

Kubernetes vs Docker: What’s the difference?

Dynatrace

SEPTEMBER 29, 2021

This opens the door to auto-scalable applications, which effortlessly matches the demands of rapidly growing and varying user traffic. For a deeper look into how to gain end-to-end observability into Kubernetes environments, tune into the on-demand webinar Harness the Power of Kubernetes Observability. What is Docker? Networking.

Open Source

Open Source DevOps Traffic Cloud

Efficient SLO event integration powers successful AIOps

Dynatrace

APRIL 5, 2024

For instance, consider how fine-tuned failure rate detection can provide insights for comprehensive understanding. Please refer to How to fine-tune failure detection (dynatrace.com) for further information. SLOs must be evaluated at 100%, even when there is currently no traffic. What characterizes a weak SLO?

Efficiency

Efficiency Traffic Tuning Metrics

Using Dynatrace to master the 5 pillars of the AWS Well-Architected Framework (Part 1)

Dynatrace

NOVEMBER 24, 2020

Dynatrace as a managed AWS workload, and as an option, have the network traffic to Dynatrace run over PrivateLink so that traffic never leaves AWS. Stay tuned. Seamless monitoring of AWS Services running in AWS Cloud and AWS Outposts.

AWS

AWS Artificial Intelligence Best Practices Lambda

Dynatrace Application Security detects and blocks attacks automatically in real-time

Dynatrace

FEBRUARY 10, 2022

WAFs protect the network perimeter and monitor, filter, or block HTTP traffic. Compared to intrusion detection systems (IDS/IPS), WAFs are focused on the application traffic. RASP solutions sit in or near applications and analyze application behavior and traffic. How to get started.

Traffic

Traffic Benchmarking Innovation Java

How to get the most value out of Session Replay: Use cases and examples

Dynatrace

AUGUST 14, 2019

Fine-tune Session Replay for your business purposes—examples. Cost and traffic control. The following settings can be applied: Cost and traffic control : 100%. The following settings can be applied: Cost and traffic control : 25%. The following settings can be applied: Cost and traffic control : 100%.

Traffic

Traffic Tuning Website Strategy

How Dynatrace boosts production resilience with Site Reliability Guardian

Dynatrace

MAY 17, 2023

While the first guardian validates the traffic, the second guardian checks the business transactions generated during the observation period. In this case, the four golden signals (latency, traffic, errors, and saturation) are derived from span attributes and DQL metric queries via Dynatrace Grail™.

DevOps

DevOps Traffic Latency Best Practices

9 key DevOps metrics for success

Dynatrace

SEPTEMBER 28, 2021

Application usage and traffic. Application usage and traffic monitors the number of users accessing your system and informs many other metrics, including system uptime. For example, if your application gets too much traffic and usage, it could fail under the pressure.

DevOps

DevOps Metrics Traffic Efficiency

Easy SLA and SLO reporting for all your API endpoints with public synthetic HTTP monitors

Dynatrace

JUNE 26, 2020

Dynatrace Synthetic Monitoring helps you quickly verify if your application is delivering the expected end user experience by offering an outside-in view of all your applications and services, independent of real traffic. So stay tuned! Automated SLA/SLO monitoring using the HTTP monitoring API.

Monitoring

Monitoring Azure AWS Traffic

What is web application security? Everything you need to know.

Dynatrace

JUNE 9, 2021

Web Application Firewall (WAF) helps protect a web application against malicious HTTP traffic. Positive filters are highly effective at blocking attacks but require constant tuning. Teams need to verify and potentially adjust this tuning every time the application changes. Of these, WAF is much more commonly used today.

Open Source

Open Source Entertainment Tuning Internet

Improving our video encodes for legacy devices

The Netflix TechBlog

AUGUST 10, 2020

264/AVC Main profile family still represents a substantial portion of the members viewing hours and an even larger portion of the traffic. Further tuning of pre-defined encoding parameters. Since then, we have applied innovations such as shot-based encoding and newer codecs to deploy more efficient encode families.

Innovation

Innovation Traffic Network Efficiency

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Handling Bursty Traffic : Managing significant traffic spikes during high-demand events, such as new content launches or regional failovers. Sharded Infrastructure : Leveraging the Data Gateway Platform , we can deploy single-tenant and/or multi-tenant infrastructure with the necessary access and traffic isolation.

Latency

Latency Storage Traffic Tuning

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Dynatrace

APRIL 25, 2023

For example, to handle traffic spikes and pay only for what they use. Scale automatically based on the demand and traffic patterns. Understanding cold-start behavior is essential to tune your cloud applications cost or performance to meet your operational needs. Such anomalies can be caused by function cold-starts.

Serverless

Serverless Lambda Azure AWS

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

VPC Flow Logs VPC Flow Logs is an AWS feature that captures information about the IP traffic going to and from network interfaces in a VPC. By default, each record captures a network internet protocol (IP) traffic flow (characterized by a 5-tuple on a per network interface basis) that occurs within an aggregation interval.

Network

Network Tuning AWS Big Data

Dynatrace simplifies StatsD, Telegraf, and Prometheus observability with Davis AI

Dynatrace

OCTOBER 7, 2020

Stay tuned for an upcoming blog series where we’ll give you a more hands-on walkthrough of how to ingest any kind of data from StatsD, Telegraf, Prometheus, scripting languages, or our integrated REST API. Stay tuned. Dynatrace understands dependencies, traffic, and transaction flows and how they change over time.

Open Source

Open Source Metrics Analytics Tuning

Bending pause times to your will with Generational ZGC

The Netflix TechBlog

MARCH 5, 2024

Each of these errors is a canceled request resulting in a retry so this reduction further reduces overall service traffic by this rate: Errors rates per second. Operational simplicity Service owners often reach out to us with questions about excessive pause times and for help with tuning.

Latency

Latency Java Tuning Efficiency

OneAgent for Linux on IBM Z (General Availability)

Dynatrace

NOVEMBER 20, 2019

OneAgent for Z/Linux collects a number of network metrics: input and output traffic measured in bytes and packets, retransmissions, and connectivity. Stay tuned for more announcements on this topic. Stay tuned for more announcements. For details on available metrics, see our help page on host performance monitoring.

Availability

Availability Hardware Java Tuning

How digital experience monitoring helps deliver business observability

Dynatrace

APRIL 26, 2022

STM generates traffic that replicates the typical path or behavior of a user on a network to measure performance for example, response times, availability, packet loss, latency, jitter, and other variables). Learn more about Dynatrace today with this Power Demo: Dynatrace and Business Observability: Tying IT Metrics to Business Outcomes.

Monitoring

Monitoring Social Media IoT Metrics

Simplify observability for all your custom metrics (Part 2: OneAgent metric API)

Dynatrace

DECEMBER 22, 2020

The interface rejects any traffic that doesn’t originate from localhost. Stay tuned for Part 3 where we’ll show you how. Once you have OneAgent installed, this interface is only reachable on localhost under its designated port. We’ll use this in the future to build integrations for Micrometer, Dropwizard, and others.

Metrics

Metrics Open Source Tuning Traffic

Monitor client certificate-secured APIs with Dynatrace Synthetic

Dynatrace

OCTOBER 13, 2019

They are, of course, not a complete solution, as they can be intercepted like any other network traffic. Stay tuned for more Dynatrace Synthetic news, including: Credential vault support for HTTP monitors. By the way, it’s a good thing to remember that these certificates are used for authentication as a part of API security.

Monitoring

Monitoring IoT Tuning Traffic

Automated observability, security, and reliability at scale

Dynatrace

JULY 18, 2023

Whether tracking internal, workload-centric indicators such as errors, duration, or saturation or focusing on the golden signals and other user-centric views such as availability, latency, traffic, or engagement, SLOs-as-code enables coherent and consistent monitoring throughout the environment at scale.

Best Practices

Best Practices Code Infrastructure Latency

Prevent potential problems quickly and efficiently with Davis exploratory analysis

Dynatrace

OCTOBER 25, 2022

With the distribution of Kubernetes, there is growing interest in using service mesh technology to add secure service-to-service communication and fine-grained management of ingress/egress traffic rules while keeping platform operations teams in the driver’s seat.

Efficiency

Efficiency Best Practices DevOps Open Source

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

We started seeing signs of scale issues, like: Slowness during peak traffic moments like 12 AM UTC, leading to increased operational burden. At Netflix, the peak traffic load can be a few orders of magnitude higher than the average load. Hence, the system has to withstand bursts in traffic while still maintaining the SLO requirements.

Java

Java Scalability Traffic Architecture

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

To mitigate these issues, we implemented adaptive pagination which dynamically tunes the limits based on observed data. For subsequent pages, the server uses both the cached data and the information in the page token to fine-tune the limits. Dictionary Compression : Further reducing data size while maintaining performance.

Latency

Latency Storage Cache Servers

DevOps automation: From event-driven automation to answer-driven automation [with causal AI]

Dynatrace

JULY 24, 2023

It enables them to adapt to user feedback swiftly, fine-tune feature releases, and deliver exceptional user experiences, all while maintaining control and minimizing disruption. They can also see how the change can affect critical objectives like SLOs and golden signals, such as traffic, latency, saturation, and error rate.

DevOps

DevOps Traffic Efficiency Servers

Zero Configuration Service Mesh with On-Demand Cluster Discovery

The Netflix TechBlog

AUGUST 29, 2023

In order for a service to talk to another, it needs to know two things: the name of the destination service, and whether or not the traffic should be secure. The ability to run in a degraded but available state during an outage is still a marked improvement over completely stopping traffic flow.

Traffic

Traffic Latency Cloud C++

Best practices for alerting

Dynatrace

JULY 22, 2019

For instance, when there isn’t enough traffic (late at night), the AI will not act to avoid alert spamming. If you want to understand how Dynatrace detects errors, read my other blog on how to fine-tune it ! It’s important to know there is more going on than just detecting a threshold that has been reached.

Best Practices

Best Practices Artificial Intelligence Monitoring Tuning

Get out-of-the-box visibility into your ARM platform (Early Adopter)

Dynatrace

MAY 1, 2020

OneAgent for the ARM platform collects a number of network metrics: input and output traffic measured in bytes and packets, retransmissions, and connectivity. Stay tuned for more announcements on this topic. Stay tuned for more details. For details on available metrics, see host performance monitoring.

Java

Java Hardware Metrics Tuning

Achieving observability in async workflows

The Netflix TechBlog

MAY 14, 2021

Prodicle Distribution Our service is required to be elastic and handle bursty traffic. Our team was responsible for Google integrations, watermarking, bursty traffic management, and on-call support for this application. We had to traverse multiple codebases, and observability systems to debug errors and inefficiencies in the system.

Traffic

Traffic Java Latency Google

OneAgent for Linux on IBM Z now available in Early Adopter Release

Dynatrace

AUGUST 8, 2019

OneAgent for Z/Linux collects a number of network metrics: input and output traffic measured in bytes and packets, retransmissions, and connectivity. Stay tuned for more announcements on this topic. Stay tuned for more announcements. For details on available metrics, see our help page on host performance monitoring.

Availability

Availability Hardware Java Tuning

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Dynatrace

FEBRUARY 16, 2023

For example, these include verifying app deployments, isolating faults coming from a single IP address, identifying root causes of traffic spikes, or investigating malicious user activity. Logs on Grail, included in the 2022 release, enables an endless variety of log-based use cases.

Analytics

Analytics Innovation Metrics Database

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Trending Sources

Title Launch Observability at Netflix Scale

Title Launch Observability at Netflix Scale

Introducing Impressions at Netflix

Migrating Netflix to GraphQL Safely

TCP: Out of Memory — Consider Tuning TCP_Mem

Best Practices for Scaling RabbitMQ

RabbitMQ vs. Kafka: Key Differences

Large scale deployments are easy and cost-effective with network zones (Early Adopter)

Rapid Event Notification System at Netflix

Telltale: Netflix Application Monitoring Simplified

Unlock end-to-end observability insights with Dynatrace PurePath 4 seamless integration of OpenTracing for Java

Keeping Netflix Reliable Using Prioritized Load Shedding

Kubernetes vs Docker: What’s the difference?

Efficient SLO event integration powers successful AIOps

Using Dynatrace to master the 5 pillars of the AWS Well-Architected Framework (Part 1)

Dynatrace Application Security detects and blocks attacks automatically in real-time

How to get the most value out of Session Replay: Use cases and examples

How Dynatrace boosts production resilience with Site Reliability Guardian

9 key DevOps metrics for success

Easy SLA and SLO reporting for all your API endpoints with public synthetic HTTP monitors

What is web application security? Everything you need to know.

Improving our video encodes for legacy devices

Introducing Netflix TimeSeries Data Abstraction Layer

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Dynatrace simplifies StatsD, Telegraf, and Prometheus observability with Davis AI

Bending pause times to your will with Generational ZGC

OneAgent for Linux on IBM Z (General Availability)

How digital experience monitoring helps deliver business observability

Simplify observability for all your custom metrics (Part 2: OneAgent metric API)

Monitor client certificate-secured APIs with Dynatrace Synthetic

Automated observability, security, and reliability at scale

Prevent potential problems quickly and efficiently with Davis exploratory analysis

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Introducing Netflix’s Key-Value Data Abstraction Layer

DevOps automation: From event-driven automation to answer-driven automation [with causal AI]

Zero Configuration Service Mesh with On-Demand Cluster Discovery

Best practices for alerting

Get out-of-the-box visibility into your ARM platform (Early Adopter)

Achieving observability in async workflows

OneAgent for Linux on IBM Z now available in Early Adopter Release

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Stay Connected