Latency, Metrics and Systems - Technology Performance Pulse

Optimising for High Latency Environments

CSS Wizardry

SEPTEMBER 16, 2024

This gives fascinating insights into the network topography of our visitors, and how much we might be impacted by high latency regions. Round-trip-time (RTT) is basically a measure of latency—how long did it take to get from one endpoint to another and back again? RTT data should be seen as an insight and not a metric.

Latency

Latency Cache Transportation Mobile

Next-level interaction and customization of data visualizations in Dynatrace Dashboards and Notebooks

Dynatrace

OCTOBER 10, 2024

New: identify hotspots with the honeycomb visualization Honeycombs are great for visualizing health in complex and distributed systems, enabling you to visualize countless entities effectively and at scale. That way, you can compare multiple charts more easily, regardless of the metric or time span.

Latency

Latency Infrastructure Monitoring Metrics

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

To this end, we developed a Rapid Event Notification System (RENO) to support use cases that require server initiated communication with devices in a scalable and extensible manner. In this blog post, we will give an overview of the Rapid Event Notification System at Netflix and share some of the learnings we gained along the way.

Systems

Systems Traffic Architecture Mobile

Build systems more reliably with Dynatrace: Chaos Engineering

Dynatrace

AUGUST 21, 2024

This approach enhances key DORA metrics and enables early detection of failures in the release process, allowing SREs more time for innovation. This blog post explores the Reliability metric , which measures modern operational practices. Why reliability?

Engineering

Engineering Systems Latency Metrics

What is observability? Not just logs, metrics and traces

Dynatrace

OCTOBER 1, 2021

As dynamic systems architectures increase in complexity and scale, IT teams face mounting pressure to track and respond to conditions and issues across their multi-cloud environments. How do you make a system observable? Dynatrace news. Why is it important, and what can it actually help organizations achieve? What is observability?

Metrics

Metrics Open Source Monitoring Cloud

Best practices and key metrics for improving mobile app performance

Dynatrace

DECEMBER 13, 2023

As a result, organizations need to monitor mobile app performance metrics that are meaningful and actionable by gaining adequate observability of mobile app performance. There are many common mobile app performance metrics that are used to measure key performance indicators (KPIs) related to user experience and satisfaction.

Best Practices

Best Practices Mobile Metrics Performance

AI-driven analysis of Spring Micrometer metrics in context, with typology at scale

Dynatrace

APRIL 7, 2022

Micrometer is used for instrumenting both out-of-the-box and custom metrics from Spring Boot applications. Davis topology-aware anomaly detection and alerting for your Micrometer metrics. Topology-related custom metrics for seamless reports and alerts. Micrometer uses a registry to export metrics to monitoring systems.

Metrics

Metrics Java Latency Cache

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Introduction to Message Brokers Message brokers enable applications, services, and systems to communicate by acting as intermediaries between senders and receivers. This decoupling simplifies system architecture and supports scalability in distributed environments.

Latency

Latency Analytics Architecture Storage

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. In this multi-part blog series, we take you behind the scenes of our system that processes billions of impressions daily.

Tuning

Tuning Latency Efficiency Storage

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

To achieve this, we are committed to building robust systems that deliver comprehensive observability, enabling us to take full accountability for every title on ourservice. Each title represents countless hours of effort and creativity, and our systems need to honor that uniqueness. Yet, these pages couldnt be more different.

Traffic

Traffic Scalability Strategy Monitoring

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Behind the scenes, a myriad of systems and services are involved in orchestrating the product experience. These backend systems are consistently being evolved and optimized to meet and exceed customer and product expectations. It provides a good read on the availability and latency ranges under different production conditions.

Traffic

Traffic Latency Tuning Systems

Noisy Neighbor Detection with eBPF

The Netflix TechBlog

SEPTEMBER 10, 2024

On Titus , our multi-tenant compute platform, a "noisy neighbor" refers to a container or system service that heavily utilizes the server's resources, causing performance degradation in adjacent containers. To emit a run queue latency metric, we leveraged three eBPF hooks: sched_wakeup, sched_wakeup_new, and sched_switch.

Latency

Latency Metrics Programming Monitoring

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Key Takeaways RabbitMQ improves scalability and fault tolerance in distributed systems by decoupling applications, enabling reliable message exchanges.

Best Practices

Best Practices Traffic Strategy Efficiency

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

So, we relied on higher-level metrics-based testing: AB Testing and Sticky Canaries. To determine customer impact, we could compare various metrics such as error rates, latencies, and time to render. The AB experiment results hinted that GraphQL’s correctness was not up to par with the legacy system.

Traffic

Traffic Latency Metrics Cache

Mastering Latency With P90, P99, and Mean Response Times

DZone

FEBRUARY 5, 2024

In the fast-paced digital world, where every millisecond counts, understanding the nuances of network latency becomes paramount for developers and system architects. Latency, the delay before a transfer of data begins following an instruction for its transfer, can significantly impact user experience and system performance.

Latency

Latency Metrics Network Systems

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems. ETL workflows), as well as downstream (e.g.

Systems

Systems Media Cache Open Source

OpenTelemetry 101: A nontechnical guide for IT leaders and enthusiasts

Dynatrace

JULY 22, 2024

Using OpenTelemetry, developers can collect and process telemetry data from applications, services, and systems. Observability Observability is the ability to determine a system’s health by analyzing the data it generates, such as logs, metrics, and traces. There are three main types of telemetry data: Metrics.

Latency

Latency Best Practices Metrics Open Source

Implementing service-level objectives to improve software quality

Dynatrace

DECEMBER 27, 2022

By implementing service-level objectives, teams can avoid collecting and checking a huge amount of metrics for each service. According to Google’s SRE handbook , best practices, there are “ Four Golden Signals ” we can convert into four SLOs for services: reliability, latency, availability, and saturation.

Software

Software Software Benchmarking Latency

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly

MARCH 25, 2025

The system is inconsistent, slow, hallucinatingand that amazing demo starts collecting digital dust. Two big things: They bring the messiness of the real world into your system through unstructured data. When your system is both ingesting messy real-world data AND producing nondeterministic outputs, you need a different approach.

Systems

Systems Development Tuning Monitoring

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Dynatrace

DECEMBER 22, 2019

To reduce your CloudWatch costs and throttling, you can now select from additional services and metrics to monitor. Get up to 300 new AWS metrics out of the box. Dynatrace ingests AWS CloudWatch metrics for multiple preselected services. Amazon Elastic File System (EFS). Select Add metric to save your settings.

AWS

AWS Metrics IoT Storage

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Dynatrace

APRIL 10, 2025

Break data silos and add context for faster, more strategic decisions : Unifying metrics, logs, traces, and user behavior within a single platform enables real-time decisions rooted in full context, not guesswork. While network security remains relevant, the emphasis is now on application observability and threat detection.

Strategy

Strategy Storage Network Architecture

Who will watch the watchers? Extended infrastructure observability for WSO2 API Manager

Dynatrace

SEPTEMBER 18, 2020

High latency or lack of responses. You receive an alert message from Dynatrace (your infrastructure observability hub) letting you know that the average response latency of all deployed APIs has tripled. Looking at the key metrics of the deployment does not reveal anything out of the ordinary. But where does the fault lie?

Infrastructure

Infrastructure Latency Metrics Cloud

What are quality gates? How to use quality gates to deliver better software at speed and scale

Dynatrace

FEBRUARY 21, 2024

Automating quality gates is ideal, as it minimizes manually checking and validating key metrics throughout the SDLC. By actively monitoring metrics such as error rate, success rate, and CPU load, quality gates instill confidence in teams during software releases. Several tools can be used to collect metrics in load/performance testing.

Speed

Speed Software Software Latency

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

This is where large-scale system migrations come into play. By collecting and analyzing key performance metrics of the service over time, we can assess the impact of the new changes and determine if they meet the availability, latency, and performance requirements. But what happens when this machinery needs a transformation?

Traffic

Traffic Metrics Systems Strategy

Optimize Citrix platform performance and user experience with Dynatrace (GA)

Dynatrace

JANUARY 15, 2020

Therefore, it requires multidimensional and multidisciplinary monitoring: Infrastructure health —automatically monitor the compute, storage, and network resources available to the Citrix system to ensure a stable platform. Citrix platform performance—optimize your Citrix landscape with insights into user load and screen latency per server.

Latency

Latency Performance Virtualization Infrastructure

Cloud infrastructure monitoring in action: Dynatrace on Dynatrace

Dynatrace

SEPTEMBER 29, 2020

Sydney, we have a disk write latency problem! It was on August 25 th at 14:00 when Davis initially alerted on a disk write latency issues to Elastic File System (EFS) on one of our EC2 instances in AWS’s Sydney Data Center. Modern hybrid-multicloud monitoring needs more than just metrics.

Infrastructure

Infrastructure Cloud Monitoring AWS

Maximize user experience with out-of-the-box service-performance SLOs

Dynatrace

AUGUST 25, 2023

These signals ( latency, traffic, errors, and saturation ) provide a solid means of proactively monitoring operative systems via SLOs and tracking business success. While this connection might sound simple, finding the right metrics to measure the needed SLIs takes time and effort.

Performance

Performance Latency Traffic Metrics

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Dynatrace

DECEMBER 22, 2019

To reduce your CloudWatch costs and throttling, you can now select from additional services and metrics to monitor. Get up to 300 new AWS metrics out of the box. Dynatrace ingests AWS CloudWatch metrics for multiple preselected services. Amazon Elastic File System (EFS). Select Add metric to save your settings.

AWS

AWS Metrics IoT Storage

Observability vs. monitoring: What’s the difference?

Dynatrace

NOVEMBER 3, 2021

Monitoring focuses on watching specific metrics. Logging provides additional data but is typically viewed in isolation of a broader system context. Observability is the ability to understand a system’s internal state by analyzing the data it generates, such as logs, metrics, and traces.

Monitoring

Monitoring Metrics DevOps Scalability

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing One approach to such a challenging scenario is stream processing, a computing paradigm and software architectural style for data-intensive software systems that emerged to cope with requirements for near real-time processing of massive amounts of data. This significantly increases event latency.

Engineering

Engineering Tuning Latency Open Source

Crucial Redis Monitoring Metrics You Must Watch

Scalegrid

JANUARY 25, 2024

You will need to know which monitoring metrics for Redis to watch and a tool to monitor these critical server metrics to ensure its health. Redis returns a big list of database metrics when you run the info command on the Redis shell. You can pick a smart selection of relevant metrics from these.

Metrics

Metrics Monitoring Latency Cache

How to Install Pixie for Kubernetes Monitoring: The Complete Guide

DZone

SEPTEMBER 4, 2022

It is important to highlight that most older monitoring systems were considered inefficient due to their operational overhead. Pixie offers monitoring, telemetry, metrics, and more with less than 5% CPU overhead and latency degradation during data collection.

Monitoring

Monitoring Latency Metrics Cloud

SLOs done right: how DevOps teams can build better service-level objectives

Dynatrace

MARCH 16, 2023

Enterprises now have access to myriad metrics they can track and measure, but an abundance of choice doesn’t equal actionable insight. Indeed, 54% of SREs say they handle too many metrics, making it increasingly difficult to find the most relevant ones for a particular service, according to the Dynatrace State of SRE Report.

DevOps

DevOps Latency Metrics Traffic

Lessons learned from enterprise service-level objective management

Dynatrace

MAY 19, 2022

Every organization’s goal is to keep its systems available and resilient to support business demands. Lastly, error budgets, as the difference between a current state and the target, represent the maximum amount of time a system can fail per the contractual agreement without repercussions. Dynatrace news. A world of misunderstandings.

Automotive

Automotive Latency Architecture Azure

Improved Alerting with Atlas Streaming Eval

The Netflix TechBlog

APRIL 27, 2023

Engineers want their alerting system to be realtime, reliable, and actionable. A few years ago, we were paged by our SRE team due to our Metrics Alerting System falling behind — critical application health alerts reached engineers 45 minutes late!

Storage

Storage Cache Metrics Database

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

As a result, site reliability has emerged as a critical success metric for many organizations. Uptime Institute’s 2022 Outage Analysis report found that over 60% of system outages resulted in at least $100,000 in total losses, up from 39% in 2019. More than one in seven outages cost more than $1 million. availability.

Best Practices

Best Practices DevOps Latency Metrics

Low Overhead Continuous Contextual Production Profiling

DZone

JUNE 15, 2023

In order to gain insight into these problems, we gather a range of metrics and logs to monitor the utilization of system resources such as CPU, memory, and application-specific latencies. It is worth noting that this data collection process does not impact the performance of the application.

Latency

Latency Storage Strategy Metrics

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

This is a set of best practices and guidelines that help you design and operate reliable, secure, efficient, cost-effective, and sustainable systems in the cloud. Storing frequently accessed data in faster storage, usually in-memory caching, improves data retrieval speed and overall system performance. Beyond

AWS

AWS Efficiency Azure Cloud

Automated Change Impact Analysis with Site Reliability Guardian

Dynatrace

FEBRUARY 15, 2023

SREs use Service-Level Indicators (SLI) to see the complete picture of service availability, latency, performance, and capacity across various systems, especially revenue-critical systems. Siloed teams and multiple tools make it difficult to align on a single version of the truth for overall system health.

DevOps

DevOps Latency Traffic Best Practices

What is AWS Lambda?

Dynatrace

APRIL 5, 2021

AWS Lambda enables organizations to access many types of functions from AWS’ cloud-based services, such as: Data processing, to execute code based on triggers, system states, or user actions. Real-time stream processing to perform live activity tracking, data cleansing, metrics generation, and more.

Lambda

Lambda AWS Serverless Hardware

Service level objectives: 5 SLOs to get started

Dynatrace

JUNE 1, 2023

Certain SLOs can help organizations get started on measuring and delivering metrics that matter. It represents the percentage of time a system or service is expected to be accessible and functioning correctly. Response time Response time refers to the total time it takes for a system to process a request or complete an operation.

Latency

Latency Website Traffic Virtualization

AI-driven analysis of Spring Micrometer metrics in context, with topology at scale

Dynatrace

APRIL 7, 2022

Micrometer is used for instrumenting both out-of-the-box and custom metrics from Spring Boot applications. Davis topology-aware anomaly detection and alerting for your Micrometer metrics. Topology-related custom metrics for seamless reports and alerts. Micrometer uses a registry to export metrics to monitoring systems.

Metrics

Metrics Java Latency Cache

AI-driven analysis of Spring Micrometer metrics in context, with topology at scale

Dynatrace

APRIL 7, 2022

Micrometer is used for instrumenting both out-of-the-box and custom metrics from Spring Boot applications. Davis topology-aware anomaly detection and alerting for your Micrometer metrics. Topology-related custom metrics for seamless reports and alerts. Micrometer uses a registry to export metrics to monitoring systems.

Metrics

Metrics Java Latency Cache

Observability platform vs. observability tools

Dynatrace

DECEMBER 22, 2021

Complex information systems fail in unexpected ways. Observability gives developers and system operators real-time awareness of a highly distributed system’s current state based on the data it generates. With observability, teams can understand what part of a system is performing poorly and how to correct the problem.

Artificial Intelligence

Artificial Intelligence Metrics Architecture DevOps

Optimising for High Latency Environments

Next-level interaction and customization of data visualizations in Dynatrace Dashboards and Notebooks

Trending Sources

Rapid Event Notification System at Netflix

Build systems more reliably with Dynatrace: Chaos Engineering

What is observability? Not just logs, metrics and traces

Best practices and key metrics for improving mobile app performance

AI-driven analysis of Spring Micrometer metrics in context, with typology at scale

RabbitMQ vs. Kafka: Key Differences

Introducing Impressions at Netflix

Title Launch Observability at Netflix Scale

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Noisy Neighbor Detection with eBPF

Best Practices for Scaling RabbitMQ

Migrating Netflix to GraphQL Safely

Mastering Latency With P90, P99, and Mean Response Times

Supporting Diverse ML Systems at Netflix

OpenTelemetry 101: A nontechnical guide for IT leaders and enthusiasts

Implementing service-level objectives to improve software quality

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Who will watch the watchers? Extended infrastructure observability for WSO2 API Manager

What are quality gates? How to use quality gates to deliver better software at speed and scale

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Optimize Citrix platform performance and user experience with Dynatrace (GA)

Cloud infrastructure monitoring in action: Dynatrace on Dynatrace

Maximize user experience with out-of-the-box service-performance SLOs

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Observability vs. monitoring: What’s the difference?

Why applying chaos engineering to data-intensive applications matters

Crucial Redis Monitoring Metrics You Must Watch

How to Install Pixie for Kubernetes Monitoring: The Complete Guide

SLOs done right: how DevOps teams can build better service-level objectives

Lessons learned from enterprise service-level objective management

Improved Alerting with Atlas Streaming Eval

Site reliability done right: 5 SRE best practices that deliver on business objectives

Low Overhead Continuous Contextual Production Profiling

Implementing AWS well-architected pillars with automated workflows

Automated Change Impact Analysis with Site Reliability Guardian

What is AWS Lambda?

Service level objectives: 5 SLOs to get started

AI-driven analysis of Spring Micrometer metrics in context, with topology at scale

AI-driven analysis of Spring Micrometer metrics in context, with topology at scale

Observability platform vs. observability tools

Stay Connected