Metrics, Traffic and Tuning - Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. This approach has a handful of benefits.

Traffic

Traffic Latency Tuning Systems

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic

Traffic Metrics Systems Strategy

9 key DevOps metrics for success

Dynatrace

SEPTEMBER 28, 2021

The emerging concepts of working with DevOps metrics and DevOps KPIs have really come a long way. DevOps metrics to help you meet your DevOps goals. Like any IT or business project, you’ll need to track critical key metrics. Here are nine key DevOps metrics and DevOps KPIs that will help you be successful.

DevOps

DevOps Metrics Traffic Efficiency

Simplify observability for all your custom metrics (Part 2: OneAgent metric API)

Dynatrace

DECEMBER 22, 2020

Welcome back to the blog series where we provide you with deep dives into the latest observability awesomeness from Dynatrace , demonstrating how we bring scale, zero configuration, automatic AI driven alerting, and root cause analysis to all your custom metrics, including open source observability frameworks like StatsD, Telegraf, and Prometheus.

Metrics

Metrics Open Source Tuning Traffic

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

The Challenge of Title Launch Observability As engineers, were wired to track system metrics like error rates, latencies, and CPU utilizationbut what about metrics that matter to a titlessuccess? To detect issues proactively, we need to simulate traffic and predict system behavior in advance.

Traffic

Traffic Scalability Strategy Monitoring

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

We accomplish this by gathering detailed column-level metrics that offer insights into the state and quality of each impression. These metrics include everything from validating identifiers to checking that essential columns are properly filled. Thus, all data in one region is processed by the Flink job deployed within thatregion.

Tuning

Tuning Latency Efficiency Storage

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

So, we relied on higher-level metrics-based testing: AB Testing and Sticky Canaries. The control group’s traffic utilized the legacy Falcor stack, while the experiment population leveraged the new GraphQL client and was directed to the GraphQL Shim. The Replay Tester tool samples raw traffic streams from Mantis.

Traffic

Traffic Latency Metrics Cache

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Optimizing RabbitMQ performance through strategies such as keeping queues short, enabling lazy queues, and monitoring health checks is essential for maintaining system efficiency and effectively managing high traffic loads.

Best Practices

Best Practices Traffic Strategy Efficiency

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

We thus assigned a priority to each use case and sharded event traffic by routing to priority-specific queues and the corresponding event processing clusters. This separation allows us to tune system configuration and scaling policies independently for different event priorities and traffic patterns.

Systems

Systems Traffic Architecture Mobile

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Optimizing RabbitMQ requires clustering, queue management, and resource tuning to maintain stability and efficiency. However, performance can decline under high traffic conditions. It also provides an HTTP API for retrieving performance metrics and a command-line tool for advanced management tasks.

Latency

Latency Analytics Architecture Storage

Efficient SLO event integration powers successful AIOps

Dynatrace

APRIL 5, 2024

For instance, consider how fine-tuned failure rate detection can provide insights for comprehensive understanding. Please refer to How to fine-tune failure detection (dynatrace.com) for further information. SLOs must be evaluated at 100%, even when there is currently no traffic. What characterizes a weak SLO?

Efficiency

Efficiency Traffic Tuning Metrics

Telltale: Netflix Application Monitoring Simplified

The Netflix TechBlog

AUGUST 13, 2020

A metric crossed a threshold. You’re half awake and wondering, “Is there really a problem or is this just an alert that needs tuning? Telltale learns what constitutes typical health for an application, no alert tuning required. Metrics are a key part of understanding application health. Regional traffic evacuations.

Monitoring

Monitoring Tuning Traffic Metrics

Large scale deployments are easy and cost-effective with network zones (Early Adopter)

Dynatrace

JULY 2, 2020

Unnecessary traffic between such data centers can result in wasted resources, unpredictable downtimes, and lost business. By minimizing bandwidth and preventing unrelated traffic between data centers, you can maintain healthy network infrastructure and save on costs. optimizing traffic routing. What’s next.

Network

Network Traffic Infrastructure Tuning

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

NOVEMBER 2, 2020

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. Logs and background requests are examples of this type of traffic.

Traffic

Traffic Metrics Infrastructure Architecture

Dynatrace simplifies StatsD, Telegraf, and Prometheus observability with Davis AI

Dynatrace

OCTOBER 7, 2020

Open-source metric sources automatically map to our Smartscape model for AI analytics. We’ve just enhanced Dynatrace OneAgent with an open metric API. Here’s a quick overview of what you can achieve now that the Dynatrace Software Intelligence Platform has been extended to ingest third-party metrics. Dynatrace news.

Open Source

Open Source Metrics Analytics Tuning

Unlock end-to-end observability insights with Dynatrace PurePath 4 seamless integration of OpenTracing for Java

Dynatrace

DECEMBER 9, 2020

Dynatrace is fully committed to the OpenTelemetry community and to the seamless integration of OpenTelemetry data , including ingestion of custom metrics , into the Dynatrace open analytics platform. With Dynatrace OneAgent you also benefit from support for traffic routing and traffic control. What’s next?

Java

Java Traffic Architecture Strategy

Kubernetes vs Docker: What’s the difference?

Dynatrace

SEPTEMBER 29, 2021

This opens the door to auto-scalable applications, which effortlessly matches the demands of rapidly growing and varying user traffic. For a deeper look into how to gain end-to-end observability into Kubernetes environments, tune into the on-demand webinar Harness the Power of Kubernetes Observability. What is Docker? Networking.

Open Source

Open Source DevOps Traffic Cloud

Using Dynatrace to master the 5 pillars of the AWS Well-Architected Framework (Part 1)

Dynatrace

NOVEMBER 24, 2020

Automatic collection of the entire set of services that publish metrics to Amazon CloudWatch. these metrics are also automatically analyzed by Dynatrace’s AI engine, Davis ). Dynatrace as a managed AWS workload, and as an option, have the network traffic to Dynatrace run over PrivateLink so that traffic never leaves AWS.

AWS

AWS Artificial Intelligence Best Practices Lambda

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Dynatrace

APRIL 25, 2023

For example, to handle traffic spikes and pay only for what they use. Scale automatically based on the demand and traffic patterns. Observability is typically achieved by collecting three types of data from a system, metrics, logs and traces. The elasticity of serverless services helps organizations scale as needed.

Serverless

Serverless Lambda Azure AWS

Improving our video encodes for legacy devices

The Netflix TechBlog

AUGUST 10, 2020

264/AVC Main profile family still represents a substantial portion of the members viewing hours and an even larger portion of the traffic. These are summarized below: Instead of relying on other objective metrics, such as PSNR†, VMAF is employed to guide optimization decisions. Further tuning of pre-defined encoding parameters.

Innovation

Innovation Traffic Network Efficiency

How digital experience monitoring helps deliver business observability

Dynatrace

APRIL 26, 2022

Fast, consistent application delivery creates a positive user experience that can ultimately drive customer loyalty and improve business metrics like conversion rate and user retention. It is proactive monitoring that simulates traffic with established test variables, including location, browser, network, and device type.

Monitoring

Monitoring Social Media IoT Metrics

Gain fresh insights with key performance metrics for synthetic browser monitors

Dynatrace

APRIL 23, 2019

However, understanding the performance of different application types requires an emphasis on different performance metrics, that is, key performance metrics. For many traditional web applications , User action duration is considered the best metric available for web-performance optimization.

Metrics

Metrics Monitoring Performance Speed

OneAgent for Linux on IBM Z (General Availability)

Dynatrace

NOVEMBER 20, 2019

Host performance is tracked via high-level health metrics on the home dashboard to details for each of the hosts. For details on available metrics, see our help page on host performance monitoring. Network metrics are also collected for detected processes. Disk metrics are collected for each discovered disk.

Availability

Availability Hardware Java Tuning

How Dynatrace boosts production resilience with Site Reliability Guardian

Dynatrace

MAY 17, 2023

While the first guardian validates the traffic, the second guardian checks the business transactions generated during the observation period. In this case, the four golden signals (latency, traffic, errors, and saturation) are derived from span attributes and DQL metric queries via Dynatrace Grail™.

DevOps

DevOps Traffic Latency Best Practices

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Dynatrace

FEBRUARY 16, 2023

The short answer: The three pillars of observability—logs, metrics, and traces—converging on a data lakehouse. You’re getting all the architectural benefits of Grail—the petabytes, the cardinality—with this implementation,” including the three pillars of observability: logs, metrics, and traces in context.

Analytics

Analytics Innovation Metrics Database

Easy SLA and SLO reporting for all your API endpoints with public synthetic HTTP monitors

Dynatrace

JUNE 26, 2020

Dynatrace Synthetic Monitoring helps you quickly verify if your application is delivering the expected end user experience by offering an outside-in view of all your applications and services, independent of real traffic. So stay tuned! Automated SLA/SLO monitoring using the HTTP monitoring API.

Monitoring

Monitoring Azure AWS Traffic

Crucial Redis Monitoring Metrics You Must Watch

Scalegrid

JANUARY 25, 2024

You will need to know which monitoring metrics for Redis to watch and a tool to monitor these critical server metrics to ensure its health. Redis returns a big list of database metrics when you run the info command on the Redis shell. You can pick a smart selection of relevant metrics from these.

Metrics

Metrics Monitoring Latency Cache

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Dynatrace

MAY 17, 2023

Making applications observable—relying on metrics, logs, and traces to understand what software is doing and how it’s performing—has become increasingly important as workloads are shifting to multicloud environments. We also introduced our demo app and explained how to define the metrics and traces it uses.

Metrics

Metrics Database Monitoring Network

Get out-of-the-box visibility into your ARM platform (Early Adopter)

Dynatrace

MAY 1, 2020

Host performance is tracked via high-level health metrics with details for each host (these appear on your home dashboard by default). For details on available metrics, see host performance monitoring. Network metrics are also collected for detected processes. Disk metrics are collected for each discovered disk.

Java

Java Hardware Metrics Tuning

OneAgent for Linux on IBM Z now available in Early Adopter Release

Dynatrace

AUGUST 8, 2019

Host performance is tracked via high-level health metrics on the home dashboard to details for each of the hosts. For details on available metrics, see our help page on host performance monitoring. Network metrics are also collected for detected processes. Disk metrics are collected for each discovered disk.

Availability

Availability Hardware Java Tuning

Dynatrace Cloud Automation Module provides observability-driven automation across the full lifecycle

Dynatrace

FEBRUARY 10, 2021

This capability provides version information along with an additional insight into traffic and problems per version. This enables you to easily make use of the more than 2,000 out of the box metrics provided by Dynatrace as well as bringing in your custom metrics and data ingest. What’s next.

Cloud

Cloud DevOps Speed Metrics

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

We do not use it for metrics, histograms, timers, or any such near-real time analytics use case. Handling Bursty Traffic : Managing significant traffic spikes during high-demand events, such as new content launches or regional failovers. Those use cases are well served by the Netflix Atlas telemetry system.

Latency

Latency Storage Traffic Tuning

Python at Netflix

The Netflix TechBlog

APRIL 29, 2019

Demand Engineering Demand Engineering is responsible for Regional Failovers , Traffic Distribution, Capacity Operations and Fleet Efficiency of the Netflix cloud. One example is the Spectator Python client library, a library for instrumenting code to record dimensional time series metrics.

Open Source

Open Source Network Infrastructure Big Data

DevOps automation: From event-driven automation to answer-driven automation [with causal AI]

Dynatrace

JULY 24, 2023

Observability data provides a treasure trove of performance, stability, and user experience metrics encompassing error rates, response times, and user engagement. With swift precision, an answer-driven automation solution that uses causal AI can transform these metrics into invaluable insights.

DevOps

DevOps Traffic Efficiency Servers

In-product guidance accelerates Service Level Objectives (SLO) setup for confident deployments

Dynatrace

DECEMBER 9, 2020

Which metrics are relevant for your business, anyway? Modern observability tools provide many metrics, but which ones are really important for your business? Dynatrace offers more than 2000 different metrics that are ready for use as dedicated SLIs. Read more about the basics of Site Reliability Engineering below.).

Metrics

Metrics Engineering Google Monitoring

Dynatrace PurePath 4 integrates OpenTelemetry and the latest cloud-native technologies and provides analytics and AI at scale

Dynatrace

NOVEMBER 17, 2020

The seamless integration enables enrichment of your OpenTelemetry metrics and traces with insights from the Dynatrace Software Intelligence Platform. Well-defined metrics for your applications and microservices should also be at the heart of your technical analysis. So please stay tuned for updates.

Analytics

Analytics Technology Technology Cloud

Best practices for alerting

Dynatrace

JULY 22, 2019

For instance, when there isn’t enough traffic (late at night), the AI will not act to avoid alert spamming. It doesn’t apply to infrastructure metrics such as CPU or memory. If you want to understand how Dynatrace detects errors, read my other blog on how to fine-tune it ! This is called a frequent issue.

Best Practices

Best Practices Artificial Intelligence Monitoring Tuning

Towards a Reliable Device Management Platform

The Netflix TechBlog

AUGUST 30, 2021

Canary Test Workloads In addition to serving the regular message traffic between users and DUTs, the control plane itself is stress-tested at roughly 3-hour intervals, where nearly 3000 ephemeral MQTT clients are created to connect to and generate flash traffic on the MQTT brokers.

Latency

Latency Traffic Transportation Cloud

Netflix Video Quality at Scale with Cosmos Microservices

The Netflix TechBlog

NOVEMBER 2, 2021

In particular, the VMAF metric lies at the core of improving the Netflix member’s streaming video quality. The request provides the source and the derivative whose quality is to be computed and requests that the VQS provides quality scores using VMAF, PSNR and SSIM as quality metrics. Assembly for two of the metrics (e.g.

Media

Media Innovation Metrics Latency

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

We took a hybrid head-based sampling approach that allows for recording 100% of traces for a specific and configurable set of requests, while continuing to randomly sample traffic per the policy set at ingestion point.

Infrastructure

Infrastructure Transportation Storage Open Source

A/B Testing Instant.Page With Netlify and Speedcurve

Tim Kadlec

MAY 21, 2020

By serving one version of the site with instant.page in place to some traffic, and a site without it to another, I could compare the performance of them both over the same timespan and see how it shakes out. It would have only taken moments in SpeedCurve to see live traffic come through, but I’m an impatient person.

Testing

Testing Traffic Tuning Code

MySQL Performance Tuning 101: Key Tips to Improve MySQL Database Performance

Percona

SEPTEMBER 1, 2023

While there is no magic bullet for MySQL performance tuning, there are a few areas that can be focused on upfront that can dramatically improve the performance of your MySQL installation. What are the Benefits of MySQL Performance Tuning? A finely tuned database processes queries more efficiently, leading to swifter results.

Tuning

Tuning Database Performance Hardware

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

JULY 26, 2021

Data Quality Data Mesh provides metrics and dashboards at both the processor and pipeline level for operational observability. It is generating heartbeat signals at a constant frequency with the objective of using them as a baseline to verify the health of the pipeline regardless of traffic patterns or occasional silences.

Big Data

Big Data Government Processing Analytics

ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning

The Netflix TechBlog

OCTOBER 18, 2019

The key insight was that by assuming a latent Gaussian Process (GP) prior on the key business metric actions like viral engagement, job applications, etc., And finally each new observation needs to update the policy, compute offline policy evaluation metrics and then push the policy back to production so it can generate new intents to treat.

Infrastructure

Infrastructure Metrics Architecture Efficiency

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Trending Sources

9 key DevOps metrics for success

Simplify observability for all your custom metrics (Part 2: OneAgent metric API)

Title Launch Observability at Netflix Scale

Introducing Impressions at Netflix

Migrating Netflix to GraphQL Safely

Best Practices for Scaling RabbitMQ

Rapid Event Notification System at Netflix

RabbitMQ vs. Kafka: Key Differences

Efficient SLO event integration powers successful AIOps

Telltale: Netflix Application Monitoring Simplified

Large scale deployments are easy and cost-effective with network zones (Early Adopter)

Keeping Netflix Reliable Using Prioritized Load Shedding

Dynatrace simplifies StatsD, Telegraf, and Prometheus observability with Davis AI

Unlock end-to-end observability insights with Dynatrace PurePath 4 seamless integration of OpenTracing for Java

Kubernetes vs Docker: What’s the difference?

Using Dynatrace to master the 5 pillars of the AWS Well-Architected Framework (Part 1)

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Improving our video encodes for legacy devices

How digital experience monitoring helps deliver business observability

Gain fresh insights with key performance metrics for synthetic browser monitors

OneAgent for Linux on IBM Z (General Availability)

How Dynatrace boosts production resilience with Site Reliability Guardian

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Easy SLA and SLO reporting for all your API endpoints with public synthetic HTTP monitors

Crucial Redis Monitoring Metrics You Must Watch

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Get out-of-the-box visibility into your ARM platform (Early Adopter)

OneAgent for Linux on IBM Z now available in Early Adopter Release

Dynatrace Cloud Automation Module provides observability-driven automation across the full lifecycle

Introducing Netflix TimeSeries Data Abstraction Layer

Python at Netflix

DevOps automation: From event-driven automation to answer-driven automation [with causal AI]

In-product guidance accelerates Service Level Objectives (SLO) setup for confident deployments

Dynatrace PurePath 4 integrates OpenTelemetry and the latest cloud-native technologies and provides analytics and AI at scale

Best practices for alerting

Towards a Reliable Device Management Platform

Netflix Video Quality at Scale with Cosmos Microservices

Building Netflix’s Distributed Tracing Infrastructure

A/B Testing Instant.Page With Netlify and Speedcurve

MySQL Performance Tuning 101: Key Tips to Improve MySQL Database Performance

Data Movement in Netflix Studio via Data Mesh

ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning

Stay Connected