Availability, Data and Latency - Technology Performance Pulse

Next-level interaction and customization of data visualizations in Dynatrace Dashboards and Notebooks

Dynatrace

OCTOBER 10, 2024

Take your monitoring, data exploration, and storytelling to the next level with outstanding data visualization All your applications and underlying infrastructure produce vast volumes of data that you need to monitor or analyze for insights. Try different cell shapes. Use color coding to tell a story.

Latency

Latency Infrastructure Monitoring Metrics

Dynatrace on Microsoft Azure in Australia enables regional customers to leverage AI-powered observability

Dynatrace

OCTOBER 28, 2024

As modern multicloud environments become more distributed and complex, having real-time insights into applications and infrastructure while keeping data residency in local markets is crucial. As of October 2024, Dynatrace is available on Microsoft Azure Australia East region, enabling joint customers to maintain a local SaaS presence.

Azure

Azure Latency Infrastructure Cloud

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

Optimising for High Latency Environments

CSS Wizardry

SEPTEMBER 16, 2024

Last week, I posted a short update on LinkedIn about CrUX’s new RTT data. Chrome have recently begun adding Round-Trip-Time (RTT) data to the Chrome User Experience Report (CrUX). This gives fascinating insights into the network topography of our visitors, and how much we might be impacted by high latency regions. What is RTT?

Latency

Latency Cache Transportation Mobile

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Dynatrace

JUNE 25, 2020

Dynatrace Managed is intrinsically highly available as it stores three copies of all events, user sessions, and metrics across its cluster nodes. The network latency between cluster nodes should be around 10 ms or less. Near-zero RPO and RTO—monitoring continues seamlessly and without data loss in failover scenarios.

Availability

Availability Hardware Latency Traffic

Comparing Approaches to Durability in Low Latency Messaging Queues

DZone

AUGUST 2, 2022

A significant feature of Chronicle Queue Enterprise is support for TCP replication across multiple servers to ensure the high availability of application infrastructure. Little’s Law and Why Latency Matters. In many cases, the assumption is that as long as throughput is high enough, the latency won’t be a problem.

Latency

Latency Benchmarking Network Infrastructure

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

John McCalpin

FEBRUARY 17, 2025

In my previous post , I reviewed historical data on single-core/single-thread memory bandwidth in multicore processors from Intel and AMD from 2010 to the present. “Concurrency” is the amount of data that must be “in flight” between the core and the memory in order to maintain a steady-state system.

Latency

Latency Hardware Cache Systems

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Central to this infrastructure is our use of multiple online distributed databases such as Apache Cassandra , a NoSQL database known for its high availability and scalability. Second, developers had to constantly re-learn new data modeling practices and common yet critical data access patterns.

Latency

Latency Storage Cache Servers

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

The jobs executing such workloads are usually required to operate indefinitely on unbounded streams of continuous data and exhibit heterogeneous modes of failure as they run over long periods. This significantly increases event latency. Performance is usually a primary concern when using stream processing frameworks.

Engineering

Engineering Tuning Latency Open Source

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

RabbitMQ is designed for flexible routing and message reliability, while Kafka handles high-throughput event streaming and real-time data processing. Both serve distinct purposes, from managing message queues to ingesting large data volumes.

Latency

Latency Analytics Architecture Storage

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Every image you hover over isnt just a visual placeholder; its a critical data point that fuels our sophisticated personalization engine. This nuanced integration of data and technology empowers us to offer bespoke content recommendations.

Tuning

Tuning Latency Efficiency Storage

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

JULY 26, 2021

From the moment a Netflix film or series is pitched and long before it becomes available on Netflix, it goes through many phases. Operational Reporting is a reporting paradigm specialized in covering high-resolution, low-latency data sets, serving detailed day-to-day activities¹ and processes of a business domain.

Big Data

Big Data Government Processing Analytics

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Data ingestion pipeline with Operation Management

The Netflix TechBlog

MARCH 7, 2023

These media focused machine learning algorithms as well as other teams generate a lot of data from the media files, which we described in our previous blog , are stored as annotations in Marken. But we cannot search or present low latency retrievals from files Etc. in a video file. This is obviously very expensive.

Media

Media Latency Architecture Database

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

Furthermore, it was difficult to transfer innovations from one model to another, given that most are independently trained despite using common data sources. Yet, many are confined to a brief temporal window due to constraints in serving latency or training costs.

Tuning

Tuning Efficiency Latency Strategy

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Dynatrace

OCTOBER 3, 2024

Considering the latest State of Observability 2024 report, it’s evident that multicloud environments not only come with an explosion of data beyond humans’ ability to manage it. It’s increasingly difficult to ingest, manage, store, and sort through this amount of data. You can find the list of use cases here.

Performance

Performance Architecture Innovation Latency

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Youll also learn strategies for maintaining data safety and managing node failures so your RabbitMQ setup is always up to the task. Implementing clustering and quorum queues in RabbitMQ significantly improves load distribution and data redundancy, ensuring high availability and fault tolerance for messaging services.

Best Practices

Best Practices Traffic Strategy Scalability

Real-time business analytics with Dynatrace: Unleashing the treasure trove of insights from your observability data

Dynatrace

AUGUST 20, 2024

Driven by that value, Dynatrace brings real-time observability, security, and business data into context and makes sense of it so our customers can get answers, automate, predict, and prevent. Executives are sitting on a goldmine of data, and they don’t know it. Common business analytics incur too much latency.

Analytics

Analytics Latency Processing Systems

Faster time to value with enhanced handling of OneAgent runtime data

Dynatrace

SEPTEMBER 23, 2020

Recent improvements in OneAgent runtime-data handling. Storage mount points in a system might be larger or smaller, local or remote, with high or low latency, and various speeds. For example: All subfolders of the /opt directory are mounted as local, low latency, high-throughput drives, with relatively low storage capacity.

Storage

Storage Latency Operating System Network

Introducing Dynatrace built-in data observability on Davis AI and Grail

Dynatrace

JANUARY 31, 2024

I have ingested important custom data into Dynatrace, critical to running my applications and making accurate business decisions… but can I trust the accuracy and reliability?” ” Welcome to the world of data observability. At its core, data observability is about ensuring the availability, reliability, and quality of data.

DevOps

DevOps Analytics Airlines Metrics

Benchmark (YCSB) numbers for Redis, MongoDB, Couchbase2, Yugabyte and BangDB

High Scalability

FEBRUARY 17, 2021

This is guest post by Sachin Sinha who is passionate about data, analytics and machine learning at scale. Load stage is to load the data and then run stage we run the test. Load is consistent for all dbs for all tests as expected as this phase is to load the data. Again Yugabyte latency is quite high.

Benchmarking

Benchmarking Latency C++ Database

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Ensure Ideal Performance and Access for All Your Customers, Everywhere [Webinar Sign-up]

DZone

JULY 19, 2021

Many applications are deployed across multiple regions so they can meet customers where they are and/or ensure availability. However, these deployments add significant complexity to database operations and data consistency suffers.

Latency

Latency Database Performance Availability

Seeing through hardware counters: a journey to threefold performance increase

The Netflix TechBlog

NOVEMBER 9, 2022

At Netflix, we periodically reevaluate our workloads to optimize utilization of available capacity. A quick canary test was free of errors and showed lower latency, which is expected given that our standard canary setup routes an equal amount of traffic to both the baseline running on 4xl and the canary on 12xl. let’s call it GS2?—?to

Hardware

Hardware Cache Performance Latency

Reducing Your Database Hosting Costs: DigitalOcean vs. AWS vs. Azure

Scalegrid

APRIL 28, 2020

Is my database cluster still highly available? All of our high availability options are offered in DigitalOcean, including 2 Replicas + 1 Arbiter, 3 Replicas and custom replica set setups. DigitalOcean does not have the concept of availability zones (AZ), so we distribute the nodes across different regions. High performance.

Azure

Azure AWS Database Latency

Implementing service-level objectives to improve software quality

Dynatrace

DECEMBER 27, 2022

SLOs can be a great way for DevOps and infrastructure teams to use data and performance expectations to make decisions, such as whether to release and where engineers should focus their time. This telemetry data serves as the basis for establishing meaningful SLOs. SLOs aid decision making. SLOs promote automation. Reliability.

Software

Software Software Benchmarking Latency

Data Reprocessing Pipeline in Asset Management Platform @Netflix

The Netflix TechBlog

MARCH 10, 2023

This platform has evolved from supporting studio applications to data science applications, machine-learning applications to discover the assets metadata, and build various data facts. Hence we built the data pipeline that can be used to extract the existing assets metadata and process it specifically to each new use case.

Media

Media Traffic Processing Design

The Three Cs: Concatenate, Compress, Cache

CSS Wizardry

OCTOBER 16, 2023

What is the availability, configurability, and efficacy of each? ?️ Plotted on the same horizontal axis of 1.6s, the waterfalls speak for themselves: 201ms of cumulative latency; 109ms of cumulative download. 4,362ms of cumulative latency; 240ms of cumulative download. And do any of our previous decisions dictate our options?

Cache

Cache Latency Strategy Speed

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Dynatrace

MAY 17, 2023

OpenTelemetry , the open source observability tool, has become the go-to standard for instrumenting custom applications to collect observability telemetry data. For this third and final part of our series, we saved the best for last: How you can enhance telemetry data even more and with less effort on your end with Dynatrace OneAgent.

Metrics

Metrics Database Monitoring Network

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Scalegrid

NOVEMBER 25, 2019

ScyllaDB is an open-source distributed NoSQL data store, reimplemented from the popular Apache Cassandra database. ScyllaDB offers significantly lower latency which allows you to process a high volume of data with minimal delay. percentile latency is up to 11X better than Cassandra on AWS EC2 bare metal. Google Cloud.

Big Data

Big Data Database Open Source Azure

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

It can happen on an edge API system servicing customer devices, between the edge and mid-tier services, or from mid-tiers to data stores. It provides a good read on the availability and latency ranges under different production conditions.

Traffic

Traffic Latency Tuning Systems

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

Andreas Andreakis , Ioannis Papapanagiotou Overview Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. Designed with High Availability in mind. This is crucial for repairs downstream when data has been lost or corrupted.

Database

Database Traffic Transportation Open Source

Noisy Neighbor Detection with eBPF

The Netflix TechBlog

SEPTEMBER 10, 2024

Continuous Instrumentation of the Linux Scheduler To ensure the reliability of our workloads that depend on low latency responses, we instrumented the run queue latency for each container, which measures the time processes spend in the scheduling queue before being dispatched to the CPU. For this purpose, we chose the eBPF ring buffer.

Latency

Latency Metrics Programming Monitoring

Scalable Annotation Service?—?Marken

The Netflix TechBlog

JANUARY 25, 2023

Scalable Annotation Service — Marken by Varun Sekhri , Meenakshi Jindal Introduction At Netflix, we have hundreds of micro services each with its own data models or entities. But there are more interesting cases where users want to store temporal (time-based) data or spatial data. Movie Entity with id 1234 has violence.

Scalability

Scalability Latency Media Architecture

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

Testing Strategies: A Summary Two key factors determined our testing strategies: Functional vs. non-functional requirements Idempotency If we were testing functional requirements like data accuracy, and if the request was idempotent , we relied on Replay Testing. In such cases, we were not testing for response data but overall behavior.

Traffic

Traffic Latency Metrics Cache

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

Andreas Andreakis , Ioannis Papapanagiotou Overview Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. Designed with High Availability in mind. This is crucial for repairs downstream when data has been lost or corrupted.

Database

Database Traffic Transportation Open Source

Understanding gRPC Concepts, Use Cases, and Best Practices

DZone

JANUARY 19, 2023

Because with the advent of cloud providers, we are less worried about managing data centers. Everything is available within seconds on-demand. This leads to an increase in the size of data as well. Big data is generated and transported using various mediums in single requests. We need to cut down on transportation.

Best Practices

Best Practices Transportation Big Data Latency

Time to First Byte: What It Is and Why It Matters

CSS Wizardry

AUGUST 7, 2019

The first—and often most surprising for people to learn—thing that I want to draw your attention to is that TTFB counts one whole round trip of latency. TTFB isn’t just time spent on the server, it is also the time spent getting from our device to the sever and back again (carrying, that’s right, the first byte of data!).

Latency

Latency Ecommerce Servers Mobile

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Scalegrid

JUNE 4, 2020

Compare Latency. lower latency compared to DigitalOcean for PostgreSQL. Now, let’s take a look at the throughput and latency performance of our comparison. Next, we are going to test and compare the latency performance between ScaleGrid and DigitalOcean for PostgreSQL. PostgreSQL DigitalOcean Latency Averages (ms).

Database

Database Latency Benchmarking Performance

Optimize Citrix platform performance and user experience with Dynatrace (GA)

Dynatrace

JANUARY 15, 2020

Having released this functionality in an Preview Release back in September 2019, we’re now happy to announce the General Availability of our Citrix monitoring extension. Synthetic monitoring: Citrix login availability and performance. Citrix latency represents the end-to-end “screen lag” experienced by a server’s users.

Latency

Latency Performance Virtualization Infrastructure

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Edge computing has transformed how businesses and industries process and manage data. By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Data interception during transit. Redundancy and inefficiency in data aggregation.

IoT

IoT Energy Logistics Latency

The Power of Caching: Boosting API Performance and Scalability

DZone

AUGUST 16, 2023

Caching is the process of storing frequently accessed data or resources in a temporary storage location, such as memory or disk, to improve retrieval speed and reduce the need for repetitive processing. Bandwidth optimization: Caching reduces the amount of data transferred over the network, minimizing bandwidth usage and improving efficiency.

Cache

Cache Scalability Performance Latency

Cloud infrastructure monitoring in action: Dynatrace on Dynatrace

Dynatrace

SEPTEMBER 29, 2020

Since we moved to AWS in May 2014 we have had an availability of 99.95%! Sydney, we have a disk write latency problem! It was on August 25 th at 14:00 when Davis initially alerted on a disk write latency issues to Elastic File System (EFS) on one of our EC2 instances in AWS’s Sydney Data Center.

Infrastructure

Infrastructure Cloud Monitoring AWS

Next-level interaction and customization of data visualizations in Dynatrace Dashboards and Notebooks

Dynatrace on Microsoft Azure in Australia enables regional customers to leverage AI-powered observability

Trending Sources

Netflix’s Distributed Counter Abstraction

Optimising for High Latency Environments

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Comparing Approaches to Durability in Low Latency Messaging Queues

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

Introducing Netflix’s Key-Value Data Abstraction Layer

Why applying chaos engineering to data-intensive applications matters

RabbitMQ vs. Kafka: Key Differences

Introducing Impressions at Netflix

Introducing Netflix TimeSeries Data Abstraction Layer

Data Movement in Netflix Studio via Data Mesh

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Data ingestion pipeline with Operation Management

Foundation Model for Personalized Recommendation

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Best Practices for Scaling RabbitMQ

Real-time business analytics with Dynatrace: Unleashing the treasure trove of insights from your observability data

Faster time to value with enhanced handling of OneAgent runtime data

Introducing Dynatrace built-in data observability on Davis AI and Grail

Benchmark (YCSB) numbers for Redis, MongoDB, Couchbase2, Yugabyte and BangDB

Optimizing data warehouse storage

Ensure Ideal Performance and Access for All Your Customers, Everywhere [Webinar Sign-up]

Seeing through hardware counters: a journey to threefold performance increase

Reducing Your Database Hosting Costs: DigitalOcean vs. AWS vs. Azure

Implementing service-level objectives to improve software quality

Data Reprocessing Pipeline in Asset Management Platform @Netflix

The Three Cs: Concatenate, Compress, Cache

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

DBLog: A Generic Change-Data-Capture Framework

Noisy Neighbor Detection with eBPF

Scalable Annotation Service?—?Marken

Migrating Netflix to GraphQL Safely

DBLog: A Generic Change-Data-Capture Framework

Understanding gRPC Concepts, Use Cases, and Best Practices

Time to First Byte: What It Is and Why It Matters

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Optimize Citrix platform performance and user experience with Dynatrace (GA)

These 7 Edge Data Challenges Will Test Companies the Most in 2025

The Power of Caching: Boosting API Performance and Scalability

Cloud infrastructure monitoring in action: Dynatrace on Dynatrace

Stay Connected