Latency, Strategy and Systems - Technology Performance Pulse

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Dynatrace

APRIL 10, 2025

Here are five strategies executives can pursue to reduce tool sprawl, lower costs, and increase operational efficiency. Re-indexing data and rehydrating it from cold storage for incident investigation and forensics causes query latency and additional management overhead and cost. See the overview on the homepage.

Strategy

Strategy Storage Network Architecture

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

Optimising for High Latency Environments

CSS Wizardry

SEPTEMBER 16, 2024

This gives fascinating insights into the network topography of our visitors, and how much we might be impacted by high latency regions. Round-trip-time (RTT) is basically a measure of latency—how long did it take to get from one endpoint to another and back again? What is RTT? RTT isn’t a you-thing, it’s a them-thing.

Latency

Latency Cache Transportation Mobile

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

By Ko-Jen Hsiao , Yesu Feng and Sudarshan Lamkhede Motivation Netflixs personalized recommender system is a complex system, boasting a variety of specialized machine learned models each catering to distinct needs including Continue Watching and Todays Top Picks for You. Refer to our recent overview for more details).

Tuning

Tuning Efficiency Latency Strategy

How to Optimize CPU Performance Through Isolation and System Tuning

DZone

MAY 1, 2023

CPU isolation and efficient system management are critical for any application which requires low-latency and high-performance computing. These measures are especially important for high-frequency trading systems, where split-second decisions on buying and selling stocks must be made.

Tuning

Tuning Systems Latency Performance

Why growing AI adoption requires an AI observability strategy

Dynatrace

JANUARY 17, 2024

An AI observability strategy—which monitors IT system performance and costs—may help organizations achieve that balance. They can do so by establishing a solid FinOps strategy. AI observability is the use of artificial intelligence to capture the performance and cost details generated by various systems in an IT environment.

Strategy

Strategy Artificial Intelligence Storage Cloud

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. In this multi-part blog series, we take you behind the scenes of our system that processes billions of impressions daily.

Tuning

Tuning Latency Efficiency Storage

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

To achieve this, we are committed to building robust systems that deliver comprehensive observability, enabling us to take full accountability for every title on ourservice. Each title represents countless hours of effort and creativity, and our systems need to honor that uniqueness. Yet, these pages couldnt be more different.

Traffic

Traffic Scalability Strategy Monitoring

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Youll also learn strategies for maintaining data safety and managing node failures so your RabbitMQ setup is always up to the task. This decoupling is crucial in modern architectures where scalability and fault tolerance are paramount.

Best Practices

Best Practices Traffic Strategy Efficiency

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Introduction to Message Brokers Message brokers enable applications, services, and systems to communicate by acting as intermediaries between senders and receivers. This decoupling simplifies system architecture and supports scalability in distributed environments.

Latency

Latency Analytics Architecture Storage

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Behind the scenes, a myriad of systems and services are involved in orchestrating the product experience. These backend systems are consistently being evolved and optimized to meet and exceed customer and product expectations. This blog series will examine the tools, techniques, and strategies we have utilized to achieve this goal.

Traffic

Traffic Latency Tuning Systems

FIFO vs. LIFO: Which Queueing Strategy Is Better for Availability and Latency?

DZone

MARCH 14, 2023

But what happens when traffic bursts overwhelm your system? In this post, we'll explore both strategies through a simple simulation in Colab, allowing you to see the impact of changing parameters on system performance. Queueing requests is a common solution, but what's the best approach: FIFO or LIFO?

Strategy

Strategy Latency Availability Traffic

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

The three strategies we will discuss today are AB Testing , Replay Testing, and Sticky Canaries. Let’s discuss the three testing strategies in further detail. To determine customer impact, we could compare various metrics such as error rates, latencies, and time to render.

Traffic

Traffic Latency Metrics Cache

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly

MARCH 25, 2025

The system is inconsistent, slow, hallucinatingand that amazing demo starts collecting digital dust. Two big things: They bring the messiness of the real world into your system through unstructured data. When your system is both ingesting messy real-world data AND producing nondeterministic outputs, you need a different approach.

Systems

Systems Development Tuning Monitoring

OpenTelemetry 101: A nontechnical guide for IT leaders and enthusiasts

Dynatrace

JULY 22, 2024

Using OpenTelemetry, developers can collect and process telemetry data from applications, services, and systems. Observability Observability is the ability to determine a system’s health by analyzing the data it generates, such as logs, metrics, and traces. There are three main types of telemetry data: Metrics.

Latency

Latency Best Practices Metrics Open Source

Redis® Monitoring Strategies for 2025

Scalegrid

JANUARY 21, 2025

Identifying key Redis metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. With these essential support systems in place, you can effectively monitor your databases with up-to-date data about their health and functioning status at all times.

Strategy

Strategy Monitoring Latency DevOps

Noisy Neighbor Detection with eBPF

The Netflix TechBlog

SEPTEMBER 10, 2024

On Titus , our multi-tenant compute platform, a "noisy neighbor" refers to a container or system service that heavily utilizes the server's resources, causing performance degradation in adjacent containers. To emit a run queue latency metric, we leveraged three eBPF hooks: sched_wakeup, sched_wakeup_new, and sched_switch.

Latency

Latency Metrics Programming Monitoring

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination. It also serves as central configuration of access patterns such as consistency or latency targets. Useful for keeping “n-newest” or prefix path deletion.

Latency

Latency Storage Cache Servers

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

This is where large-scale system migrations come into play. By collecting and analyzing key performance metrics of the service over time, we can assess the impact of the new changes and determine if they meet the availability, latency, and performance requirements. But what happens when this machinery needs a transformation?

Traffic

Traffic Metrics Systems Strategy

Low Overhead Continuous Contextual Production Profiling

DZone

JUNE 15, 2023

In order to gain insight into these problems, we gather a range of metrics and logs to monitor the utilization of system resources such as CPU, memory, and application-specific latencies. It is worth noting that this data collection process does not impact the performance of the application.

Latency

Latency Storage Strategy Metrics

Redis® Monitoring Strategies for 2024

Scalegrid

DECEMBER 21, 2023

Identifying key Redis® metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. With these essential support systems in place, you can effectively monitor your databases with up-to-date data about their health and functioning status at all times.

Strategy

Strategy Monitoring Latency DevOps

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing One approach to such a challenging scenario is stream processing, a computing paradigm and software architectural style for data-intensive software systems that emerged to cope with requirements for near real-time processing of massive amounts of data. This significantly increases event latency.

Engineering

Engineering Tuning Latency Open Source

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

Mastering Hybrid Cloud Strategy Are you looking to leverage the best private and public cloud worlds to propel your business forward? A hybrid cloud strategy could be your answer. Understanding Hybrid Cloud Strategy A hybrid cloud merges the capabilities of public and private clouds into a singular, coherent system.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

As the number of Titus users increased over the years, the load and pressure on the system increased substantially. cell): Titus Job Coordinator is a leader elected process managing the active state of the system. For example, a batch workflow orchestration system may create multiple jobs which are part of a single workflow execution.

Cache

Cache Latency Traffic Systems

Plan Your Multi Cloud Strategy

Scalegrid

MARCH 22, 2024

A well-planned multi cloud strategy can seriously upgrade your business’s tech game, making you more agile. Key Takeaways Multi-cloud strategies have become increasingly popular due to the need for flexibility, innovation, and the avoidance of vendor lock-in. They can also bolster uptime and limit latency issues or potential downtimes.

Strategy

Strategy Cloud Government Innovation

Front-End: Cache Strategies You Should Know

DZone

MAY 1, 2023

It is a transversal component that applies to all the tech areas and architecture layers such as operating systems, data platforms, backend, frontend, and other components. Caches are very useful software components that all engineers must know.

Cache

Cache Strategy Latency Operating System

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

This is a set of best practices and guidelines that help you design and operate reliable, secure, efficient, cost-effective, and sustainable systems in the cloud. Storing frequently accessed data in faster storage, usually in-memory caching, improves data retrieval speed and overall system performance. Beyond

AWS

AWS Efficiency Azure Cloud

Automated Change Impact Analysis with Site Reliability Guardian

Dynatrace

FEBRUARY 15, 2023

Streamline development and delivery processes Nowadays, digital transformation strategies are executed by almost every organization across all industries. SREs use Service-Level Indicators (SLI) to see the complete picture of service availability, latency, performance, and capacity across various systems, especially revenue-critical systems.

DevOps

DevOps Latency Traffic Best Practices

Best practices and key metrics for improving mobile app performance

Dynatrace

DECEMBER 13, 2023

In-app purchases can help to measure the overall effectiveness of your business strategy. User demographics , such as app version, operating system, location, and device type, can help tailor an app to better meet users’ needs and preferences. Load time and network latency metrics. Performance optimization.

Best Practices

Best Practices Mobile Metrics Performance

AI-driven analysis of Spring Micrometer metrics in context, with typology at scale

Dynatrace

APRIL 7, 2022

Every company has its own strategy as to which technologies to use. Micrometer uses a registry to export metrics to monitoring systems. Dynatrace news. To remain flexible in observing all technologies used in their organization, some companies choose open-source solutions, which allow them to stay vendor-neutral. Here’s how it works.

Metrics

Metrics Java Latency Cache

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

This transition to public, private, and hybrid cloud is driving organizations to automate and virtualize IT operations to lower costs and optimize cloud processes and systems. Besides the traditional system hardware, storage, routers, and software, ITOps also includes virtual components of the network and cloud infrastructure.

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. This guide delves into how these systems work, the challenges they solve, and their essential role in businesses and technology.

Storage

Storage Systems Big Data Azure

Dynatrace accelerates business transformation with new AI observability solution

Dynatrace

JANUARY 31, 2024

GenAI is prone to erratic behavior due to unforeseen data scenarios or underlying system issues. Dynatrace provides end-to-end observability of AI applications As AI systems grow in complexity, a holistic approach to the observability of AI-powered applications becomes even more crucial.

Cache

Cache Azure Infrastructure Monitoring

What is AWS Lambda?

Dynatrace

APRIL 5, 2021

AWS Lambda enables organizations to access many types of functions from AWS’ cloud-based services, such as: Data processing, to execute code based on triggers, system states, or user actions. You will likely need to write code to integrate systems and handle complex tasks or incoming network requests.

Lambda

Lambda AWS Serverless Hardware

How Edge and Industrial IoT Will Converge in 2025: A New Era for Smart Manufacturing

VoltDB

NOVEMBER 20, 2024

This proximity reduces latency and enables real-time decision-making. This shift will enable more autonomous and dynamic systems, reducing human intervention and enhancing efficiency. IIoT systems can use edge devices to ensure that sensitive operational data remains secure on-premises, thereby protecting critical infrastructure.

IoT

IoT Energy Latency Automotive

Observability vs. monitoring: What’s the difference?

Dynatrace

NOVEMBER 3, 2021

Logging provides additional data but is typically viewed in isolation of a broader system context. Observability is the ability to understand a system’s internal state by analyzing the data it generates, such as logs, metrics, and traces. Monitoring typically provides a limited view of system data focused on individual metrics.

Monitoring

Monitoring Metrics DevOps Scalability

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

which is difficult when troubleshooting distributed systems. If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls.

Infrastructure

Infrastructure Transportation Storage Open Source

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

A cloud migration strategy, however, provides technical optimization that’s also firmly rooted in the business value chain. Migrating to the cloud is a strategy many organizations pursue to streamline and consolidate their security efforts. This can dramatically decrease network latency and its effect on the end-user experience.

Cloud

Cloud Traffic Best Practices Strategy

DevOps observability: A guide for DevOps and DevSecOps teams

Dynatrace

JANUARY 18, 2023

This methodology aims to improve software system reliability using several key categories such as availability, performance, latency, efficiency, capacity, and incident response. Organizations that are new to both practices will want to adopt a strategy that incorporates both. What are SLOs?

DevOps

DevOps Best Practices Innovation Strategy

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

Scalegrid

OCTOBER 17, 2019

On modern Linux systems, the difference in overhead between forking a process and creating a thread is much lesser than it used to be. However, all of these problems are well-discussed in the PostgreSQL community, and mitigation strategies ensure the pros of a connection pooler far exceed their cons.

Architecture

Architecture Database Latency Servers

What is observability? Not just logs, metrics and traces

Dynatrace

OCTOBER 1, 2021

As dynamic systems architectures increase in complexity and scale, IT teams face mounting pressure to track and respond to conditions and issues across their multi-cloud environments. How do you make a system observable? Dynatrace news. But what is observability? Why is it important, and what can it actually help organizations achieve?

Metrics

Metrics Open Source Monitoring Cloud

Introducing Dynatrace built-in data observability on Davis AI and Grail

Dynatrace

JANUARY 31, 2024

Data observability involves monitoring and managing the internal state of data systems to gain insight into the data pipeline, understand how data evolves, and identify any issues that could compromise data integrity or reliability. Data is the foundation upon which strategies are built, directions are chosen, and innovations are pursued.

DevOps

DevOps Analytics Airlines Metrics

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Netflix’s Distributed Counter Abstraction

Trending Sources

Optimising for High Latency Environments

Foundation Model for Personalized Recommendation

How to Optimize CPU Performance Through Isolation and System Tuning

Why growing AI adoption requires an AI observability strategy

Introducing Impressions at Netflix

Title Launch Observability at Netflix Scale

Best Practices for Scaling RabbitMQ

RabbitMQ vs. Kafka: Key Differences

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

FIFO vs. LIFO: Which Queueing Strategy Is Better for Availability and Latency?

Migrating Netflix to GraphQL Safely

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

OpenTelemetry 101: A nontechnical guide for IT leaders and enthusiasts

Redis® Monitoring Strategies for 2025

Noisy Neighbor Detection with eBPF

Introducing Netflix’s Key-Value Data Abstraction Layer

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Low Overhead Continuous Contextual Production Profiling

Redis® Monitoring Strategies for 2024

Why applying chaos engineering to data-intensive applications matters

Predictive CPU isolation of containers at Netflix

Mastering Hybrid Cloud Strategy

Introducing Netflix TimeSeries Data Abstraction Layer

Consistent caching mechanism in Titus Gateway

Plan Your Multi Cloud Strategy

Front-End: Cache Strategies You Should Know

Implementing AWS well-architected pillars with automated workflows

Automated Change Impact Analysis with Site Reliability Guardian

Best practices and key metrics for improving mobile app performance

AI-driven analysis of Spring Micrometer metrics in context, with typology at scale

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

What is a Distributed Storage System

Dynatrace accelerates business transformation with new AI observability solution

What is AWS Lambda?

How Edge and Industrial IoT Will Converge in 2025: A New Era for Smart Manufacturing

Observability vs. monitoring: What’s the difference?

Building Netflix’s Distributed Tracing Infrastructure

What is cloud migration?

DevOps observability: A guide for DevOps and DevSecOps teams

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

What is observability? Not just logs, metrics and traces

Introducing Dynatrace built-in data observability on Davis AI and Grail

Stay Connected