Metrics, Systems and Traffic - Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience.

Traffic

Traffic Latency Tuning Systems

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic

Traffic Metrics Systems Strategy

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

To this end, we developed a Rapid Event Notification System (RENO) to support use cases that require server initiated communication with devices in a scalable and extensible manner. In this blog post, we will give an overview of the Rapid Event Notification System at Netflix and share some of the learnings we gained along the way.

Systems

Systems Traffic Architecture Mobile

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Dynatrace

JANUARY 21, 2025

Chances are, youre a seasoned expert who visualizes meticulously identified key metrics across several sophisticated charts. For example, if you’re monitoring network traffic and the average over the past 7 days is 500 Mbps, the threshold will adapt to this baseline. This ensures optimal resource utilization and cost efficiency.

Traffic

Traffic Metrics Analytics Monitoring

Dynatrace Cost & Carbon Optimization certified for accuracy and transparency

Dynatrace

MARCH 5, 2025

Integration with existing systems and processes : Integration with existing IT infrastructure, observability solutions, and workflows often requires significant investment and customization. We implemented a wasted energy metric in the app to enhance practitioner actionability.

Energy

Energy Analytics Traffic Cloud

9 key DevOps metrics for success

Dynatrace

SEPTEMBER 28, 2021

As we look at today’s applications, microservices, and DevOps teams, we see leaders are tasked with supporting complex distributed applications using new technologies spread across systems in multiple locations. The emerging concepts of working with DevOps metrics and DevOps KPIs have really come a long way. Deployment frequency.

DevOps

DevOps Metrics Traffic Efficiency

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

To achieve this, we are committed to building robust systems that deliver comprehensive observability, enabling us to take full accountability for every title on ourservice. Each title represents countless hours of effort and creativity, and our systems need to honor that uniqueness. Yet, these pages couldnt be more different.

Traffic

Traffic Scalability Strategy Monitoring

The keys to selecting a platform for end-to-end observability

Dynatrace

DECEMBER 2, 2024

Clearly, continuing to depend on siloed systems, disjointed monitoring tools, and manual analytics is no longer sustainable. This enables proactive changes such as resource autoscaling, traffic shifting, or preventative rollbacks of bad code deployment ahead of time.

Artificial Intelligence

Artificial Intelligence DevOps Architecture Cloud

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Key Takeaways RabbitMQ improves scalability and fault tolerance in distributed systems by decoupling applications, enabling reliable message exchanges.

Best Practices

Best Practices Traffic Strategy Efficiency

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

So, we relied on higher-level metrics-based testing: AB Testing and Sticky Canaries. The control group’s traffic utilized the legacy Falcor stack, while the experiment population leveraged the new GraphQL client and was directed to the GraphQL Shim. The Replay Tester tool samples raw traffic streams from Mantis.

Traffic

Traffic Latency Metrics Cache

Ensuring the Successful Launch of Ads on Netflix

The Netflix TechBlog

JUNE 1, 2023

To do this, we devised a novel way to simulate the projected traffic weeks ahead of launch by building upon the traffic migration framework described here. New content or national events may drive brief spikes, but, by and large, traffic is usually smoothly increasing or decreasing.

Traffic

Traffic Best Practices Systems Testing

What is log management? How to tame distributed cloud system complexities

Dynatrace

SEPTEMBER 8, 2022

Log management is an organization’s rules and policies for managing and enabling the creation, transmission, analysis, storage, and other tasks related to IT systems’ and applications’ log data. Metrics, logs , and traces make up three vital prongs of modern observability. How log management systems optimize performance and security.

Cloud

Cloud Systems Analytics DevOps

Transform log data into actionable metrics and have Davis AI do the work for you

Dynatrace

MARCH 16, 2022

Now, Dynatrace has the ability to turn numerical values from logs into metrics, which unlocks AI-powered answers, context, and automation for your apps and infrastructure, at scale. Key information about your system and applications comes from logs. Manual tracking of metrics from logs is too complex at scale.

Metrics

Metrics Lambda Infrastructure Monitoring

The road to observability with OpenTelemetry demo part 1: Identifying metrics and traces

Dynatrace

MAY 17, 2023

Anyone who’s concerned with developing, delivering, and operating software knows the importance of making software and the systems it runs on observable. That is, relying on metrics, logs, and traces to understand what software is doing and where it’s running into snags. OpenTelemetry is a free and open source take on observability.

Metrics

Metrics Open Source Traffic Cache

Simplify observability for all your custom metrics (Part 1: StatsD)

Dynatrace

NOVEMBER 3, 2020

Welcome to the blog series where we give you a deeper dive into the latest awesomeness around Dynatrace : how we bring scale, zero configuration, automatic AI driven alerting, and root cause analysis to all your custom metrics, including open source observability frameworks like StatsD, Telegraf, and Prometheus.

Metrics

Metrics Open Source Monitoring Traffic

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. In this multi-part blog series, we take you behind the scenes of our system that processes billions of impressions daily.

Tuning

Tuning Latency Efficiency Storage

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Dynatrace

NOVEMBER 22, 2022

For retail organizations, peak traffic can be a mixed blessing. While high-volume traffic often boosts sales, it can also compromise uptimes. The nirvana state of system uptime at peak loads is known as “five-nines availability.” How can IT teams deliver system availability under peak loads that will satisfy customers?

Infrastructure

Infrastructure Availability Systems Retail

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems. ETL workflows), as well as downstream (e.g.

Systems

Systems Media Cache Open Source

A Dynatrace champions guide to get ahead of digital marketing campaigns

Dynatrace

JULY 1, 2020

In my last blog , I’ve provided an example of this happening, whereby the traffic spiked and quadrupled the usual incoming traffic. These are all interesting metrics from marketing point of view, and also highly interesting to you as they allow you to engage with the teams that are driving the traffic against your IT-system.

Traffic

Traffic Analytics Metrics Servers

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

NOVEMBER 2, 2020

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. CRITICAL : This traffic affects the ability to play.

Traffic

Traffic Metrics Infrastructure Architecture

Power dashboarding part 2: Dynatrace dashboard tutorial to gain better, faster answers using AI and formatting

Dynatrace

MARCH 31, 2025

You can either continue with the custom infrastructure metrics dashboard you created in Part I or use the dashboard we prepared here (Dynatrace login required). In our Dynatrace Dashboard tutorial, we want to add a chart that shows the bytes in and out per host over time to enhance visibility into network traffic.

Metrics

Metrics Infrastructure Network Best Practices

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Introduction to Message Brokers Message brokers enable applications, services, and systems to communicate by acting as intermediaries between senders and receivers. This decoupling simplifies system architecture and supports scalability in distributed environments.

Latency

Latency Analytics Architecture Storage

The new normal of digital experience delivery – lessons learned from monitoring mission-critical websites during COVID-19

Dynatrace

MAY 6, 2020

Over the last two month s, w e’ve monito red key sites and applications across industries that have been receiving surges in traffic , including government, health insurance, retail, banking, and media. The following day, a normally mundane Wednesday , traffic soared to 128,000 sessions. seconds to 2.78

Website

Website Monitoring Retail Media

What are quality gates? How to use quality gates to deliver better software at speed and scale

Dynatrace

FEBRUARY 21, 2024

Automating quality gates is ideal, as it minimizes manually checking and validating key metrics throughout the SDLC. By actively monitoring metrics such as error rate, success rate, and CPU load, quality gates instill confidence in teams during software releases. Several tools can be used to collect metrics in load/performance testing.

Speed

Speed Software Software Latency

Kubernetes vs Docker: What’s the difference?

Dynatrace

SEPTEMBER 29, 2021

Think of containers as the packaging for microservices that separate the content from its environment – the underlying operating system and infrastructure. This opens the door to auto-scalable applications, which effortlessly matches the demands of rapidly growing and varying user traffic. What is Docker? Networking. Observability.

Open Source

Open Source Traffic DevOps Cloud

Maximize user experience with out-of-the-box service-performance SLOs

Dynatrace

AUGUST 25, 2023

These signals ( latency, traffic, errors, and saturation ) provide a solid means of proactively monitoring operative systems via SLOs and tracking business success. While this connection might sound simple, finding the right metrics to measure the needed SLIs takes time and effort.

Performance

Performance Latency Traffic Metrics

Advanced analytics: Leverage edge IoT data with OpenTelemetry and Dynatrace

Dynatrace

AUGUST 29, 2024

IoT is transforming how industries operate and make decisions, from agriculture to mining, energy utilities, and traffic management. Both methods allow you to ingest and process raw data and metrics. They enable real-time tracking and enhanced situational awareness for air traffic control and collision avoidance systems.

IoT

IoT Analytics Transportation Metrics

Dynatrace adds support for VPC Flow Logs to Kinesis Data Firehose

Dynatrace

SEPTEMBER 7, 2022

VPC Flow Logs is an Amazon service that enables IT pros to capture information about the IP traffic that traverses network interfaces in a virtual private cloud, or VPC. By default, each record captures a network internet protocol (IP), a destination, and the source of the traffic flow that occurs within your environment.

Traffic

Traffic AWS Network Cloud

Simplified observability for your SNMP devices

Dynatrace

MARCH 22, 2021

But manual configuration of observability for systems like this is nearly impossible. Each SNMP-enabled device provides access to its state and performance metrics in a simple and robust way that allows Dynatrace to fetch the metrics and run them through Davis®, our AI causation engine. Events and alerts.

Metrics

Metrics Network Infrastructure Traffic

Dynatrace simplifies StatsD, Telegraf, and Prometheus observability with Davis AI

Dynatrace

OCTOBER 7, 2020

Open-source metric sources automatically map to our Smartscape model for AI analytics. We’ve just enhanced Dynatrace OneAgent with an open metric API. Here’s a quick overview of what you can achieve now that the Dynatrace Software Intelligence Platform has been extended to ingest third-party metrics. Dynatrace news.

Open Source

Open Source Metrics Analytics Tuning

Detecting RegreSSHion with Dynatrace (CVE-2024-6387)

Dynatrace

JULY 2, 2024

The Qualys Threat Research Unit (TRU) has discovered a Remote Unauthenticated Code Execution (RCE) vulnerability in OpenSSH server (sshd) in glibc-based Linux systems. This can result in a complete system takeover, malware installation, data manipulation, and the creation of backdoors for persistent access.

AWS

AWS Network Traffic Servers

Noisy Neighbor Detection with eBPF

The Netflix TechBlog

SEPTEMBER 10, 2024

On Titus , our multi-tenant compute platform, a "noisy neighbor" refers to a container or system service that heavily utilizes the server's resources, causing performance degradation in adjacent containers. To emit a run queue latency metric, we leveraged three eBPF hooks: sched_wakeup, sched_wakeup_new, and sched_switch.

Latency

Latency Metrics Programming Monitoring

Dynatrace adds support for AWS Transit Gateway with VPC Flow Logs

Dynatrace

JULY 25, 2022

VPC Flow Logs is a feature that gives you the capability to capture more robust IP traffic data that traverses your VPCs. A full list of metrics can be found here and include dimensions such as the following: Packets. Log Metrics. What is VPC Flow Logs. The number of packets transferred during the flow. Resource type.

AWS

AWS Transportation Network Traffic

What is a service mesh?

Dynatrace

MAY 21, 2021

This becomes even more challenging when the application receives heavy traffic, because a single microservice might become overwhelmed if it receives too many requests too quickly. The Envoy proxies also collect and report telemetry on all traffic among the services in the mesh. Why do you need a service mesh?

Traffic

Traffic DevOps Infrastructure Network

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

As a result, site reliability has emerged as a critical success metric for many organizations. Uptime Institute’s 2022 Outage Analysis report found that over 60% of system outages resulted in at least $100,000 in total losses, up from 39% in 2019. More than one in seven outages cost more than $1 million. availability.

Best Practices

Best Practices DevOps Latency Metrics

From syslog to AWS Firehose: Dynatrace log management innovations that enhance observability

Dynatrace

SEPTEMBER 5, 2024

Native support for Syslog messages Syslog messages are generated by default in Linux and Unix operating systems, security devices, network devices, and applications such as web servers and databases. Native support for syslog messages extends our infrastructure log support to all Linux/Unix systems and network devices.

Innovation

Innovation AWS Analytics Storage

Service level objectives: 5 SLOs to get started

Dynatrace

JUNE 1, 2023

Certain SLOs can help organizations get started on measuring and delivering metrics that matter. It represents the percentage of time a system or service is expected to be accessible and functioning correctly. Response time Response time refers to the total time it takes for a system to process a request or complete an operation.

Latency

Latency Website Traffic Virtualization

Telltale: Netflix Application Monitoring Simplified

The Netflix TechBlog

AUGUST 13, 2020

A metric crossed a threshold. Our streaming teams need a monitoring system that enables them to quickly diagnose and remediate problems; seconds count! Our Node team needs a system that empowers a small group to operate a large fleet. Metrics are a key part of understanding application health. By Andrei U.,

Monitoring

Monitoring Tuning Traffic Metrics

Simplify complex cloud-native environments with AI-driven observability

Dynatrace

OCTOBER 3, 2024

In the latest enhancements of Dynatrace Log Management and Analytics , Dynatrace extends coverage for Native Syslog support: Use Dynatrace ActiveGate to automatically add context and optimize network traffic to your Syslog messages.

Cloud

Cloud Lambda AWS Analytics

Efficient SLO event integration powers successful AIOps

Dynatrace

APRIL 5, 2024

When the SLO status converges to an optimal value of 100%, and there’s substantial traffic (calls/min), BurnRate becomes more relevant for anomaly detection. SLOs must be evaluated at 100%, even when there is currently no traffic. Data Explorer “test your Metric Expression” for info result coming from the above metric.

Efficiency

Efficiency Traffic Tuning Metrics

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Dynatrace

APRIL 25, 2023

For example, to handle traffic spikes and pay only for what they use. Observability is essential to ensure the reliability, security and quality of any software system. Scale automatically based on the demand and traffic patterns. The elasticity of serverless services helps organizations scale as needed.

Serverless

Serverless Lambda Azure AWS

SLOs done right: how DevOps teams can build better service-level objectives

Dynatrace

MARCH 16, 2023

Enterprises now have access to myriad metrics they can track and measure, but an abundance of choice doesn’t equal actionable insight. Indeed, 54% of SREs say they handle too many metrics, making it increasingly difficult to find the most relevant ones for a particular service, according to the Dynatrace State of SRE Report.

DevOps

DevOps Latency Metrics Traffic

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Dynatrace

MAY 17, 2023

Making applications observable—relying on metrics, logs, and traces to understand what software is doing and how it’s performing—has become increasingly important as workloads are shifting to multicloud environments. We also introduced our demo app and explained how to define the metrics and traces it uses. How can we verify that?

Metrics

Metrics Database Monitoring Network

Implementing service-level objectives to improve software quality

Dynatrace

DECEMBER 27, 2022

By implementing service-level objectives, teams can avoid collecting and checking a huge amount of metrics for each service. First, it helps to understand that applications and all the services and infrastructure that support them generate telemetry data based on traffic from real users. So how can teams start implementing SLOs?

Software

Software Software Benchmarking Latency

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Trending Sources

Rapid Event Notification System at Netflix

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Dynatrace Cost & Carbon Optimization certified for accuracy and transparency

9 key DevOps metrics for success

Title Launch Observability at Netflix Scale

The keys to selecting a platform for end-to-end observability

Best Practices for Scaling RabbitMQ

Migrating Netflix to GraphQL Safely

Ensuring the Successful Launch of Ads on Netflix

What is log management? How to tame distributed cloud system complexities

Transform log data into actionable metrics and have Davis AI do the work for you

The road to observability with OpenTelemetry demo part 1: Identifying metrics and traces

Simplify observability for all your custom metrics (Part 1: StatsD)

Introducing Impressions at Netflix

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Supporting Diverse ML Systems at Netflix

A Dynatrace champions guide to get ahead of digital marketing campaigns

Keeping Netflix Reliable Using Prioritized Load Shedding

Power dashboarding part 2: Dynatrace dashboard tutorial to gain better, faster answers using AI and formatting

RabbitMQ vs. Kafka: Key Differences

The new normal of digital experience delivery – lessons learned from monitoring mission-critical websites during COVID-19

What are quality gates? How to use quality gates to deliver better software at speed and scale

Kubernetes vs Docker: What’s the difference?

Maximize user experience with out-of-the-box service-performance SLOs

Advanced analytics: Leverage edge IoT data with OpenTelemetry and Dynatrace

Dynatrace adds support for VPC Flow Logs to Kinesis Data Firehose

Simplified observability for your SNMP devices

Dynatrace simplifies StatsD, Telegraf, and Prometheus observability with Davis AI

Detecting RegreSSHion with Dynatrace (CVE-2024-6387)

Noisy Neighbor Detection with eBPF

Dynatrace adds support for AWS Transit Gateway with VPC Flow Logs

What is a service mesh?

Site reliability done right: 5 SRE best practices that deliver on business objectives

From syslog to AWS Firehose: Dynatrace log management innovations that enhance observability

Service level objectives: 5 SLOs to get started

Telltale: Netflix Application Monitoring Simplified

Simplify complex cloud-native environments with AI-driven observability

Efficient SLO event integration powers successful AIOps

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

SLOs done right: how DevOps teams can build better service-level objectives

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Implementing service-level objectives to improve software quality

Stay Connected