Metrics and Systems - Technology Performance Pulse

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

To this end, we developed a Rapid Event Notification System (RENO) to support use cases that require server initiated communication with devices in a scalable and extensible manner. In this blog post, we will give an overview of the Rapid Event Notification System at Netflix and share some of the learnings we gained along the way.

Systems

Systems Traffic Architecture Mobile

What is observability? Not just logs, metrics and traces

Dynatrace

OCTOBER 1, 2021

As dynamic systems architectures increase in complexity and scale, IT teams face mounting pressure to track and respond to conditions and issues across their multi-cloud environments. How do you make a system observable? Dynatrace news. Why is it important, and what can it actually help organizations achieve? What is observability?

Metrics

Metrics Open Source Monitoring Cloud

OpenPipeline: Simplify access to critical business data

Dynatrace

NOVEMBER 4, 2024

There’s a goldmine of business data traversing your IT systems, yet most of it remains untapped. Other data sources, including APIs and log files — are used to expand access, often to external or proprietary systems. In fact, it’s likely that some of your critical business systems already write business data to log files.

Analytics

Analytics Airlines Metrics Monitoring

How to Configure Custom Metrics in AWS Elastic Beanstalk Using Memory Metrics Example

DZone

JULY 2, 2024

Recently, I encountered a task where a business was using AWS Elastic Beanstalk but was struggling to understand the system state due to the lack of comprehensive metrics in CloudWatch. By default, CloudWatch only provides a few basic metrics such as CPU and Networks.

Metrics

Metrics AWS Virtualization Network

Elevating System Management: The Role of Monitoring and Observability in DevOps

DZone

JUNE 21, 2023

In the ever-evolving world of DevOps , the ability to gain deep insights into system behavior, diagnose issues, and improve overall performance is one of the top priorities. Monitoring and observability are two key concepts that facilitate this process, offering valuable visibility into the health and performance of systems.

DevOps

DevOps Systems Monitoring Metrics

Build systems more reliably with Dynatrace: Chaos Engineering

Dynatrace

AUGUST 21, 2024

This approach enhances key DORA metrics and enables early detection of failures in the release process, allowing SREs more time for innovation. This blog post explores the Reliability metric , which measures modern operational practices. Why reliability?

Engineering

Engineering Systems Latency Metrics

Catching up with OpenTelemetry in 2025

Dynatrace

FEBRUARY 27, 2025

In fact, observability is essential for shaping how we design smarter, more resilient systems for the future. As an open-source project, OpenTelemetry sets standards for telemetry data sets and works with a wide range of systems and platforms to collect and export telemetry data to backend systems. OpenTelemetry Collector 1.0

Tuning

Tuning Open Source Innovation Monitoring

Power Dashboarding, Part I: Start your exploration journey with Dashboards

Dynatrace

FEBRUARY 6, 2025

Even if infrastructure metrics aren’t your thing, you’re welcome to join us on this creative journey simply swap out the suggested metrics for ones that interest you. For our example dashboard, we’ll only focus on some selected key infrastructure metrics. Click on Select metric. Change it now to sum.

Metrics

Metrics Infrastructure Monitoring Best Practices

Tailored access management, Part 3: Simplified setup for enterprise-scale access management

Dynatrace

OCTOBER 14, 2024

Manage the complexity of authorization systems Most modern authorization systems provide access management using Attribute-Based Access Control (ABAC). The system demands significant effort to design, manage, and maintain, especially as an organization’s needs evolve.

Monitoring

Monitoring Metrics Systems Scalability

New continuous compliance requirements drive the need to converge observability and security

Dynatrace

DECEMBER 12, 2024

I realized that our platforms unique ability to contextualize security events, metrics, logs, traces, and user behavior could revolutionize the security domain by converging observability and security. Collect observability and security data user behavior, metrics, events, logs, traces (UMELT) once, store it together and analyze in context.

Analytics

Analytics Government Efficiency Innovation

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

Dynatrace

DECEMBER 18, 2023

For busy site reliability engineers, ensuring system reliability, scalability, and overall health is an imperative that’s getting harder to achieve in ever-expanding, cloud-native, container-based environments. To get a more granular look into telemetry data, many analysts rely on custom metrics using Prometheus.

Metrics

Metrics Engineering Energy Tuning

Dynatrace joins the Microsoft Intelligent Security Association

Dynatrace

NOVEMBER 20, 2024

This rising risk amplifies the need for reliable security solutions that integrate with existing systems. Using high-fidelity metrics, traces, logs, and user data mapped to a unified entity model, organizations enjoy enhanced automation and broader, deeper security insights into modern cloud environments.

Best Practices

Best Practices Innovation Azure Cloud

Reliability indicators that matter to your business: SLOs for all data types

Dynatrace

OCTOBER 31, 2024

This lets you build your SLOs around the indicators that matter to you and your customers—critical metrics related to availability, failure rates, request response times, or select logs and business events. Depending on the environment, the different information types provide indicators that reveal potential problems for your customers.

Metrics

Metrics Availability Monitoring Scalability

What is log management? How to tame distributed cloud system complexities

Dynatrace

SEPTEMBER 8, 2022

Log management is an organization’s rules and policies for managing and enabling the creation, transmission, analysis, storage, and other tasks related to IT systems’ and applications’ log data. Metrics, logs , and traces make up three vital prongs of modern observability. How log management systems optimize performance and security.

Cloud

Cloud Systems Analytics DevOps

Extract metrics from business events to increase the value of business analytics

Dynatrace

FEBRUARY 2, 2023

Observability fault lines The monitoring of complex and dynamic IT systems includes real-time analysis of baselines, trends, and anomalies. This is achieved, in part, by establishing actionable statistical accuracy —not necessarily precise accuracy —through practical levels of metric sampling, aggregation, and extrapolation.

Analytics

Analytics Metrics DevOps Storage

HTTP monitors on the latest Dynatrace platform extend insights into the health of your API endpoints and simplify test management

Dynatrace

DECEMBER 18, 2024

But nowadays, with complex and dynamically changing modern IT systems, the last result details might not be enough in some cases. Select any execution you’re interested in to display its details, for example, the content response body, its headers, and related metrics.

Monitoring

Monitoring Testing Metrics Analytics

Best practices and key metrics for improving mobile app performance

Dynatrace

DECEMBER 13, 2023

As a result, organizations need to monitor mobile app performance metrics that are meaningful and actionable by gaining adequate observability of mobile app performance. There are many common mobile app performance metrics that are used to measure key performance indicators (KPIs) related to user experience and satisfaction.

Best Practices

Best Practices Mobile Metrics Performance

Demo: Transform OpenTelemetry data into actionable insights with the Dynatrace Distributed Tracing app

Dynatrace

OCTOBER 29, 2024

A Dynatrace API token with the following permissions: Ingest OpenTelemetry traces ( openTelemetryTrace.ingest ) Ingest metrics ( metrics.ingest ) Ingest logs ( logs.ingest ) To set up the token, see Dynatrace API – Tokens and authentication in Dynatrace documentation. If you don’t have one, you can use a trial account.

Metrics

Metrics Tuning Monitoring Availability

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Dynatrace

JANUARY 21, 2025

Chances are, youre a seasoned expert who visualizes meticulously identified key metrics across several sophisticated charts. Seasonal Baseline: Ideal for metrics with predictable seasonal patterns, this option leverages Davis AI to create a confidence band based on historical data, accounting for expected variations.

Traffic

Traffic Metrics Analytics Monitoring

The road to observability with OpenTelemetry demo part 1: Identifying metrics and traces

Dynatrace

MAY 17, 2023

Anyone who’s concerned with developing, delivering, and operating software knows the importance of making software and the systems it runs on observable. That is, relying on metrics, logs, and traces to understand what software is doing and where it’s running into snags. OpenTelemetry is a free and open source take on observability.

Metrics

Metrics Open Source Traffic Cache

Monitoring GitHub-hosted runners with Dynatrace

Dynatrace

JANUARY 22, 2025

Enhanced observability and release validation Dynatrace already excels at delivering full-stack, end-to-end observability of your systems and user journeys. This data covers all aspects of CI/CD activity, from workflow executions to runner performance and cost metrics.

Monitoring

Monitoring DevOps Metrics Virtualization

How executives reveal breakthrough insights into customer experiences with Dynatrace to accelerate business growth

Dynatrace

OCTOBER 23, 2024

My goal was to provide IT teams with insights to optimize customer experience by collaborating with business teams, using both business KPIs and IT metrics. Automate smarter using actual customer experience metrics, not just server-side data. Using causal AI, we identified and resolved performance issues automatically.

Analytics

Analytics Logistics IoT Mobile

Helping customers unlock the Power of Possible

Dynatrace

OCTOBER 29, 2024

The Dynatrace platform automatically captures and maps metrics, logs, traces, events, user experience data, and security signals into a single datastore, performing contextual analytics through a “power of three AI”—combining causal, predictive, and generative AI. It’s about uncovering insights that move business forward. The result?

Innovation

Innovation Strategy Cloud AWS

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems. ETL workflows), as well as downstream (e.g.

Systems

Systems Media Cache Open Source

Digital first, and always: Five critical metrics for measuring customer experience at federal agencies

Dynatrace

NOVEMBER 3, 2023

The five key metrics to improve customer satisfaction To help turn this around, Dynatrace makes available its unified observability platform, which captures all CX interactions and transactions in an automated, intelligent manner – including user session replays. When combined, key metrics will generate an accurate CX index score.

Metrics

Metrics Government Website Monitoring

Dynatrace Cost & Carbon Optimization certified for accuracy and transparency

Dynatrace

MARCH 5, 2025

Integration with existing systems and processes : Integration with existing IT infrastructure, observability solutions, and workflows often requires significant investment and customization. We implemented a wasted energy metric in the app to enhance practitioner actionability.

Energy

Energy Analytics Traffic Cloud

Mastering Observability in 10 Minutes Using OpenSearch

DZone

JANUARY 16, 2025

Observability has become a key component in software development as it enables the best customer experience by ensuring system health and performance and detecting systemic issues proactively. OpenSearch simplifies this by providing an open-source, scalable solution for logging, metrics, and visualization.

Open Source

Open Source Scalability Metrics Systems

The keys to selecting a platform for end-to-end observability

Dynatrace

DECEMBER 2, 2024

Clearly, continuing to depend on siloed systems, disjointed monitoring tools, and manual analytics is no longer sustainable. It also helps to have access to OpenTelemetry, a collection of tools for examining applications that export metrics, logs, and traces for analysis.

Artificial Intelligence

Artificial Intelligence DevOps Architecture Cloud

Next-level interaction and customization of data visualizations in Dynatrace Dashboards and Notebooks

Dynatrace

OCTOBER 10, 2024

New: identify hotspots with the honeycomb visualization Honeycombs are great for visualizing health in complex and distributed systems, enabling you to visualize countless entities effectively and at scale. That way, you can compare multiple charts more easily, regardless of the metric or time span.

Latency

Latency Infrastructure Monitoring Metrics

Deliver secure, safe, and trustworthy GenAI applications with Amazon Bedrock and Dynatrace

Dynatrace

MARCH 12, 2025

Amazon Bedrock , equipped with Dynatrace Davis AI and LLM observability , gives you end-to-end insight into the Generative AI stack, from code-level visibility and performance metrics to GenAI-specific guardrails. Send unified data to Dynatrace for analysis alongside your logs, metrics, and traces.

Metrics

Metrics Serverless Analytics Innovation

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly

MARCH 25, 2025

The system is inconsistent, slow, hallucinatingand that amazing demo starts collecting digital dust. Two big things: They bring the messiness of the real world into your system through unstructured data. When your system is both ingesting messy real-world data AND producing nondeterministic outputs, you need a different approach.

Systems

Systems Development Tuning Monitoring

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

To achieve this, we are committed to building robust systems that deliver comprehensive observability, enabling us to take full accountability for every title on ourservice. Each title represents countless hours of effort and creativity, and our systems need to honor that uniqueness. Yet, these pages couldnt be more different.

Traffic

Traffic Scalability Strategy Monitoring

Introducing Configurable Metaflow

The Netflix TechBlog

DECEMBER 19, 2024

Many of these projects are under constant development by dedicated teams with their own business goals and development best practices, such as the system that supports our content decision makers , or the system that ranks which language subtitles are most valuable for a specific piece ofcontent.

Best Practices

Best Practices Cache Metrics Code

Title Launch Observability at Netflix Scale

The Netflix TechBlog

JANUARY 6, 2025

In this case, the main stakeholders are: - Title Launch Operators Role: Responsible for setting up the title and its metadata into our systems. In this context, were focused on developing systems that ensure successful title launches, build trust between content creators and our brand, and reduce engineering operational overhead.

Scalability

Scalability Cache Engineering Systems

Creating a Web Project: Refactoring

DZone

APRIL 1, 2025

There will be red lamps flashing across your metrics dashboard. Monsters Beneath the Surface In the troubleshooting guide, I already mentioned how important it was to create and closely watch the metrics of your project. System metrics like response time, memory consumption, etc.

Metrics

Metrics Engineering Systems Development

SLOs for Kubernetes clusters: Optimize resource utilization of Kubernetes clusters with service-level objectives

Dynatrace

NOVEMBER 11, 2024

Kubernetes is a widely used open source system for container orchestration. To calculate the service-level indicator for the Kubernetes namespace memory efficiency SLO, simply query the memory working set and request the memory metrics that are provided out of the box.

Efficiency

Efficiency Best Practices Monitoring Cloud

Microsoft Ignite 2024 guide: Cloud observability for AI transformation

Dynatrace

NOVEMBER 18, 2024

The power of cloud observability Modernizing legacy systems can be challenging, and it’s important to do so with purpose—not just to modernize for its own sake. “It’s not the big that will eat the small, it’s the fast that will conquer the slow.” – Jay Snyder, SVP of Partners and Alliances at Dynatrace.

Cloud

Cloud Azure Artificial Intelligence Innovation

AI-Driven Kubernetes Troubleshooting With DeepSeek and k8sgpt

DZone

MARCH 19, 2025

Traditional debugging methods, including manual inspection of logs, event streams, configurations, and system metrics, can be painstakingly slow and prone to human error, particularly under pressure.

Metrics

Metrics Engineering Systems

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Dynatrace

NOVEMBER 22, 2022

The nirvana state of system uptime at peak loads is known as “five-nines availability.” In its pursuit, IT teams hover over system performance dashboards hoping their preparations will deliver five nines—or even four nines—availability. How can IT teams deliver system availability under peak loads that will satisfy customers?

Infrastructure

Infrastructure Availability Systems Retail

Create simple workflows to automate alerts during development

Dynatrace

JANUARY 22, 2025

Your teams want to iterate rapidly but face multiple hurdles: Increased complexity: Microservices and container-based apps generate massive logs and metrics. To orchestrate the different logging services, you use Fluent Bit to forward these logs to your centralized logging system, like Dynatrace.

Development

Development Processing Monitoring Code

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. In this multi-part blog series, we take you behind the scenes of our system that processes billions of impressions daily.

Tuning

Tuning Latency Efficiency Storage

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Behind the scenes, a myriad of systems and services are involved in orchestrating the product experience. These backend systems are consistently being evolved and optimized to meet and exceed customer and product expectations. This technique facilitates validation on multiple fronts.

Traffic

Traffic Latency Tuning Systems

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Key Takeaways RabbitMQ improves scalability and fault tolerance in distributed systems by decoupling applications, enabling reliable message exchanges.

Best Practices

Best Practices Traffic Strategy Efficiency

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

So, we relied on higher-level metrics-based testing: AB Testing and Sticky Canaries. To determine customer impact, we could compare various metrics such as error rates, latencies, and time to render. The AB experiment results hinted that GraphQL’s correctness was not up to par with the legacy system.

Traffic

Traffic Latency Metrics Cache

Rapid Event Notification System at Netflix

What is observability? Not just logs, metrics and traces

Trending Sources

OpenPipeline: Simplify access to critical business data

How to Configure Custom Metrics in AWS Elastic Beanstalk Using Memory Metrics Example

Elevating System Management: The Role of Monitoring and Observability in DevOps

Build systems more reliably with Dynatrace: Chaos Engineering

Catching up with OpenTelemetry in 2025

Power Dashboarding, Part I: Start your exploration journey with Dashboards

Tailored access management, Part 3: Simplified setup for enterprise-scale access management

New continuous compliance requirements drive the need to converge observability and security

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

Dynatrace joins the Microsoft Intelligent Security Association

Reliability indicators that matter to your business: SLOs for all data types

What is log management? How to tame distributed cloud system complexities

Extract metrics from business events to increase the value of business analytics

HTTP monitors on the latest Dynatrace platform extend insights into the health of your API endpoints and simplify test management

Best practices and key metrics for improving mobile app performance

Demo: Transform OpenTelemetry data into actionable insights with the Dynatrace Distributed Tracing app

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

The road to observability with OpenTelemetry demo part 1: Identifying metrics and traces

Monitoring GitHub-hosted runners with Dynatrace

How executives reveal breakthrough insights into customer experiences with Dynatrace to accelerate business growth

Helping customers unlock the Power of Possible

Supporting Diverse ML Systems at Netflix

Digital first, and always: Five critical metrics for measuring customer experience at federal agencies

Dynatrace Cost & Carbon Optimization certified for accuracy and transparency

Mastering Observability in 10 Minutes Using OpenSearch

The keys to selecting a platform for end-to-end observability

Next-level interaction and customization of data visualizations in Dynatrace Dashboards and Notebooks

Deliver secure, safe, and trustworthy GenAI applications with Amazon Bedrock and Dynatrace

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Title Launch Observability at Netflix Scale

Introducing Configurable Metaflow

Title Launch Observability at Netflix Scale

Creating a Web Project: Refactoring

SLOs for Kubernetes clusters: Optimize resource utilization of Kubernetes clusters with service-level objectives

Microsoft Ignite 2024 guide: Cloud observability for AI transformation

AI-Driven Kubernetes Troubleshooting With DeepSeek and k8sgpt

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Create simple workflows to automate alerts during development

Introducing Impressions at Netflix

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Best Practices for Scaling RabbitMQ

Migrating Netflix to GraphQL Safely

Stay Connected