Data, Latency and Systems - Technology Performance Pulse

Next-level interaction and customization of data visualizations in Dynatrace Dashboards and Notebooks

Dynatrace

OCTOBER 10, 2024

Take your monitoring, data exploration, and storytelling to the next level with outstanding data visualization All your applications and underlying infrastructure produce vast volumes of data that you need to monitor or analyze for insights.

Latency

Latency Infrastructure Monitoring Metrics

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

Efficient Multimodal Data Processing: A Technical Deep Dive

DZone

FEBRUARY 27, 2025

Multimodal data processing is the evolving need of the latest data platforms powering applications like recommendation systems, autonomous vehicles, and medical diagnostics. Handling multimodal data spanning text, images, videos, and sensor inputs requires resilient architecture to manage the diversity of formats and scale.

Efficiency

Efficiency Processing Latency Storage

Comparing Approaches to Durability in Low Latency Messaging Queues

DZone

AUGUST 2, 2022

I have generally held the view that replicating data to a secondary system is faster than sync-ing to disk, assuming the round trip network delay wasn’t high due to quality networks and co-located redundant servers. Little’s Law and Why Latency Matters. This is the first time I have benchmarked it with a realistic example.

Latency

Latency Benchmarking Network Infrastructure

Optimizing Database Performance in Middleware Applications

DZone

FEBRUARY 14, 2025

In the realm of modern software architecture, middleware plays a pivotal role in connecting various components of distributed systems. This is crucial because middleware often serves as the bridge between client applications and backend databases, handling a high volume of requests and data processing tasks.

Database

Database Performance Software Architecture Latency

Optimising for High Latency Environments

CSS Wizardry

SEPTEMBER 16, 2024

Last week, I posted a short update on LinkedIn about CrUX’s new RTT data. Chrome have recently begun adding Round-Trip-Time (RTT) data to the Chrome User Experience Report (CrUX). This gives fascinating insights into the network topography of our visitors, and how much we might be impacted by high latency regions. What is RTT?

Latency

Latency Cache Transportation Mobile

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

The Netflix TechBlog

SEPTEMBER 29, 2022

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support for Non-Parallelizable Workloads by Kostas Christidis Introduction Timestone is a high-throughput, low-latency priority queueing system we built in-house to support the needs of Cosmos , our media encoding platform.

Latency

Latency Systems Media Serverless

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

By Ko-Jen Hsiao , Yesu Feng and Sudarshan Lamkhede Motivation Netflixs personalized recommender system is a complex system, boasting a variety of specialized machine learned models each catering to distinct needs including Continue Watching and Todays Top Picks for You. Refer to our recent overview for more details).

Tuning

Tuning Efficiency Latency Strategy

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Dynatrace

APRIL 10, 2025

Key insights for executives: Increase operational efficiency with automation and AI to foster seamless collaboration : With AI and automated workflows, teams work from shared data, automate repetitive tasks, and accelerate resolutionfocusing more on business outcomes. No delays and overhead of reindexing and rehydration.

Strategy

Strategy Storage Network Architecture

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

John McCalpin

FEBRUARY 17, 2025

Understanding sustained memory bandwidth in these systems starts with assuming 100% utilization and then reviewing the factors that get in the way (e.g., In my previous post , I reviewed historical data on single-core/single-thread memory bandwidth in multicore processors from Intel and AMD from 2010 to the present.

Latency

Latency Hardware Cache Systems

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Second, developers had to constantly re-learn new data modeling practices and common yet critical data access patterns. These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination.

Latency

Latency Storage Cache Efficiency

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

RabbitMQ is designed for flexible routing and message reliability, while Kafka handles high-throughput event streaming and real-time data processing. Both serve distinct purposes, from managing message queues to ingesting large data volumes.

Latency

Latency Analytics Architecture Storage

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Every image you hover over isnt just a visual placeholder; its a critical data point that fuels our sophisticated personalization engine. It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure.

Tuning

Tuning Latency Efficiency Storage

Build systems more reliably with Dynatrace: Chaos Engineering

Dynatrace

AUGUST 21, 2024

These releases often assumed ideal conditions such as zero latency, infinite bandwidth, and no network loss, as highlighted in Peter Deutsch’s eight fallacies of distributed systems. With Dynatrace, teams can seamlessly monitor the entire system, including network switches, database storage, and third-party dependencies.

Engineering

Engineering Systems Latency Metrics

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

The jobs executing such workloads are usually required to operate indefinitely on unbounded streams of continuous data and exhibit heterogeneous modes of failure as they run over long periods. Fault tolerance stands as a critical requirement for continuously operating production systems. This significantly increases event latency.

Engineering

Engineering Tuning Latency Open Source

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Berg , Romain Cledat , Kayla Seeley , Shashank Srikanth , Chaoying Wang , Darin Yu Netflix uses data science and machine learning across all facets of the company, powering a wide range of business applications from our internal infrastructure and content demand modeling to media understanding.

Systems

Systems Media Cache Open Source

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

JULY 26, 2021

This happens at an unprecedented scale and introduces many interesting challenges; one of the challenges is how to provide visibility of Studio data across multiple phases and systems to facilitate operational excellence and empower decision making.

Big Data

Big Data Government Processing Analytics

Data ingestion pipeline with Operation Management

The Netflix TechBlog

MARCH 7, 2023

These media focused machine learning algorithms as well as other teams generate a lot of data from the media files, which we described in our previous blog , are stored as annotations in Marken. But we cannot search or present low latency retrievals from files Etc. We do that by excluding the following from all queries in our system.

Media

Media Latency Architecture Database

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

To achieve this, we are committed to building robust systems that deliver comprehensive observability, enabling us to take full accountability for every title on ourservice. Each title represents countless hours of effort and creativity, and our systems need to honor that uniqueness. Yet, these pages couldnt be more different.

Traffic

Traffic Scalability Strategy Monitoring

Real-time business analytics with Dynatrace: Unleashing the treasure trove of insights from your observability data

Dynatrace

AUGUST 20, 2024

Driven by that value, Dynatrace brings real-time observability, security, and business data into context and makes sense of it so our customers can get answers, automate, predict, and prevent. Executives are sitting on a goldmine of data, and they don’t know it. Common business analytics incur too much latency.

Analytics

Analytics Latency Processing Systems

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Dynatrace

JUNE 25, 2020

The network latency between cluster nodes should be around 10 ms or less. With Dynatrace actively managing business-critical applications, some of our globally distributed enterprise customers require Dynatrace Managed to continue operating even when an entire data center goes down. Minimized cross-data center network traffic.

Availability

Availability Hardware Latency Traffic

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Dynatrace

OCTOBER 3, 2024

Considering the latest State of Observability 2024 report, it’s evident that multicloud environments not only come with an explosion of data beyond humans’ ability to manage it. It’s increasingly difficult to ingest, manage, store, and sort through this amount of data. You can find the list of use cases here.

Performance

Performance Architecture Innovation Latency

Faster time to value with enhanced handling of OneAgent runtime data

Dynatrace

SEPTEMBER 23, 2020

Recent improvements in OneAgent runtime-data handling. Operating Systems are not always set up in the same way. Storage mount points in a system might be larger or smaller, local or remote, with high or low latency, and various speeds. Customizable location of large runtime files. See details below. See details below.

Storage

Storage Latency Operating System Network

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Youll also learn strategies for maintaining data safety and managing node failures so your RabbitMQ setup is always up to the task. Monitoring the cluster nodes preemptively addresses potential issues, ensuring the system operates smoothly.

Best Practices

Best Practices Traffic Strategy Efficiency

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Behind the scenes, a myriad of systems and services are involved in orchestrating the product experience. These backend systems are consistently being evolved and optimized to meet and exceed customer and product expectations. It provides a good read on the availability and latency ranges under different production conditions.

Traffic

Traffic Latency Tuning Systems

Introducing Dynatrace built-in data observability on Davis AI and Grail

Dynatrace

JANUARY 31, 2024

I have ingested important custom data into Dynatrace, critical to running my applications and making accurate business decisions… but can I trust the accuracy and reliability?” ” Welcome to the world of data observability. At its core, data observability is about ensuring the availability, reliability, and quality of data.

DevOps

DevOps Analytics Airlines Metrics

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Mastering Latency With P90, P99, and Mean Response Times

DZone

FEBRUARY 5, 2024

In the fast-paced digital world, where every millisecond counts, understanding the nuances of network latency becomes paramount for developers and system architects. Latency, the delay before a transfer of data begins following an instruction for its transfer, can significantly impact user experience and system performance.

Latency

Latency Metrics Network Systems

Build automated self-healing systems with xMatters and Dynatrace (Part 2 of 3)

Dynatrace

AUGUST 27, 2019

Step 2 – xMatters passes Dynatrace data to alerts that provide actionable responses. In this alert, xMatters includes all the important incident information from Dynatrace, so there’s no need for you to visit additional system dashboards. Dynatrace data in an xMatters alert (left), with actionable responses (right).

Systems

Systems DevOps Latency Azure

Noisy Neighbor Detection with eBPF

The Netflix TechBlog

SEPTEMBER 10, 2024

On Titus , our multi-tenant compute platform, a "noisy neighbor" refers to a container or system service that heavily utilizes the server's resources, causing performance degradation in adjacent containers. To emit a run queue latency metric, we leveraged three eBPF hooks: sched_wakeup, sched_wakeup_new, and sched_switch.

Latency

Latency Metrics Programming Monitoring

Solve hybrid Kubernetes performance and reliability problems with unified observability

Dynatrace

APRIL 10, 2025

In modern containerized environments, teams often deploy Kubernetes across mixed operating systems, creating a situation where both Linux and Windows nodes reside in the same cluster. Integrating data at an OS-agnostic cluster level is another hurdle, often leading to data silos and incomplete visibility.

Performance

Performance Java Operating System Infrastructure

Bandwidth or Latency: When to Optimise for Which

CSS Wizardry

JANUARY 31, 2019

When it comes to network performance, there are two main limiting factors that will slow you down: bandwidth and latency. the maximum rate of data transfer across a given path. Latency is defined as…. how long it takes for a bit of data to travel across the network from one node or endpoint to another. The Time Column.

Latency

Latency Network Speed Servers

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Dynatrace

MAY 17, 2023

OpenTelemetry , the open source observability tool, has become the go-to standard for instrumenting custom applications to collect observability telemetry data. For this third and final part of our series, we saved the best for last: How you can enhance telemetry data even more and with less effort on your end with Dynatrace OneAgent.

Metrics

Metrics Database Monitoring Network

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

Testing Strategies: A Summary Two key factors determined our testing strategies: Functional vs. non-functional requirements Idempotency If we were testing functional requirements like data accuracy, and if the request was idempotent , we relied on Replay Testing. In such cases, we were not testing for response data but overall behavior.

Traffic

Traffic Latency Metrics Cache

OpenTelemetry 101: A nontechnical guide for IT leaders and enthusiasts

Dynatrace

JULY 22, 2024

Using OpenTelemetry, developers can collect and process telemetry data from applications, services, and systems. Observability Observability is the ability to determine a system’s health by analyzing the data it generates, such as logs, metrics, and traces. There are three main types of telemetry data: Metrics.

Latency

Latency Best Practices Metrics Open Source

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly

MARCH 25, 2025

The system is inconsistent, slow, hallucinatingand that amazing demo starts collecting digital dust. Two big things: They bring the messiness of the real world into your system through unstructured data. People have been building data products and machine learning products for the past couple of decades. The way out?

Systems

Systems Development Tuning Monitoring

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

As the number of Titus users increased over the years, the load and pressure on the system increased substantially. We introduce a caching mechanism in the API gateway layer, allowing us to offload processing from singleton leader elected controllers without giving up strict data consistency and guarantees clients observe.

Cache

Cache Latency Traffic Systems

Implementing service-level objectives to improve software quality

Dynatrace

DECEMBER 27, 2022

SLOs can be a great way for DevOps and infrastructure teams to use data and performance expectations to make decisions, such as whether to release and where engineers should focus their time. This telemetry data serves as the basis for establishing meaningful SLOs. SLOs aid decision making. SLOs promote automation. Reliability.

Software

Software Software Benchmarking Latency

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Edge computing has transformed how businesses and industries process and manage data. By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Data interception during transit. Redundancy and inefficiency in data aggregation.

IoT

IoT Energy Logistics Latency

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

Andreas Andreakis , Ioannis Papapanagiotou Overview Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. Requirements In a previous blog post, we discussed Delta , a data enrichment and synchronization platform.

Database

Database Traffic Transportation Open Source

Designing Instagram

High Scalability

JANUARY 11, 2022

from a client it performs two parallel operations: i) persisting the action in the data store ii) publish the action in a streaming data store for a pub-sub model. User Feed Service, Media Counter Service) read the actions from the streaming data store and performs their specific tasks. System Components. Data Models.

Design

Design Media Storage Logistics

Cloud infrastructure monitoring in action: Dynatrace on Dynatrace

Dynatrace

SEPTEMBER 29, 2020

Sydney, we have a disk write latency problem! It was on August 25 th at 14:00 when Davis initially alerted on a disk write latency issues to Elastic File System (EFS) on one of our EC2 instances in AWS’s Sydney Data Center. The problem didn’t last long or have any impact on our services.

Infrastructure

Infrastructure Cloud Monitoring AWS

Next-level interaction and customization of data visualizations in Dynatrace Dashboards and Notebooks

Netflix’s Distributed Counter Abstraction

Trending Sources

Efficient Multimodal Data Processing: A Technical Deep Dive

Comparing Approaches to Durability in Low Latency Messaging Queues

Optimizing Database Performance in Middleware Applications

Optimising for High Latency Environments

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

Foundation Model for Personalized Recommendation

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

Introducing Netflix’s Key-Value Data Abstraction Layer

RabbitMQ vs. Kafka: Key Differences

Introducing Impressions at Netflix

Build systems more reliably with Dynatrace: Chaos Engineering

Why applying chaos engineering to data-intensive applications matters

Introducing Netflix TimeSeries Data Abstraction Layer

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Supporting Diverse ML Systems at Netflix

Data Movement in Netflix Studio via Data Mesh

Data ingestion pipeline with Operation Management

Title Launch Observability at Netflix Scale

Real-time business analytics with Dynatrace: Unleashing the treasure trove of insights from your observability data

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Faster time to value with enhanced handling of OneAgent runtime data

Best Practices for Scaling RabbitMQ

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Introducing Dynatrace built-in data observability on Davis AI and Grail

Optimizing data warehouse storage

Mastering Latency With P90, P99, and Mean Response Times

Build automated self-healing systems with xMatters and Dynatrace (Part 2 of 3)

Noisy Neighbor Detection with eBPF

Solve hybrid Kubernetes performance and reliability problems with unified observability

Bandwidth or Latency: When to Optimise for Which

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Migrating Netflix to GraphQL Safely

OpenTelemetry 101: A nontechnical guide for IT leaders and enthusiasts

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Consistent caching mechanism in Titus Gateway

Implementing service-level objectives to improve software quality

These 7 Edge Data Challenges Will Test Companies the Most in 2025

DBLog: A Generic Change-Data-Capture Framework

Designing Instagram

Cloud infrastructure monitoring in action: Dynatrace on Dynatrace

Stay Connected