Latency, Performance and Systems - Technology Performance Pulse

Optimizing Database Performance in Middleware Applications

DZone

FEBRUARY 14, 2025

In the realm of modern software architecture, middleware plays a pivotal role in connecting various components of distributed systems. Efficient database operations in middleware can dramatically improve overall system performance, reduce latency, and enhance user experience.

Database

Database Performance Software Architecture Latency

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

Optimising for High Latency Environments

CSS Wizardry

SEPTEMBER 16, 2024

This gives fascinating insights into the network topography of our visitors, and how much we might be impacted by high latency regions. Round-trip-time (RTT) is basically a measure of latency—how long did it take to get from one endpoint to another and back again? What is RTT? RTT isn’t a you-thing, it’s a them-thing. Go and sign up.

Latency

Latency Cache Transportation Mobile

How to Optimize CPU Performance Through Isolation and System Tuning

DZone

MAY 1, 2023

CPU isolation and efficient system management are critical for any application which requires low-latency and high-performance computing. These measures are especially important for high-frequency trading systems, where split-second decisions on buying and selling stocks must be made.

Tuning

Tuning Systems Latency Performance

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

By: Ankush Gulati , David Gevorkyan Additional credits: Michael Clark , Gokhan Ozer Intro Netflix has more than 220 million active members who perform a variety of actions throughout each session, ranging from renaming a profile to watching a title.

Systems

Systems Traffic Architecture Mobile

Seeing through hardware counters: a journey to threefold performance increase

The Netflix TechBlog

NOVEMBER 9, 2022

A quick canary test was free of errors and showed lower latency, which is expected given that our standard canary setup routes an equal amount of traffic to both the baseline running on 4xl and the canary on 12xl. What’s worse, average latency degraded by more than 50%, with both CPU and latency patterns becoming more “choppy.”

Hardware

Hardware Cache Performance Latency

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

John McCalpin

FEBRUARY 17, 2025

Understanding sustained memory bandwidth in these systems starts with assuming 100% utilization and then reviewing the factors that get in the way (e.g., What about single-core performance? This requires a completely different approach to modeling the memory system — one based on Little’s Law from queueing theory.

Latency

Latency Hardware Cache Systems

Next-level interaction and customization of data visualizations in Dynatrace Dashboards and Notebooks

Dynatrace

OCTOBER 10, 2024

New: identify hotspots with the honeycomb visualization Honeycombs are great for visualizing health in complex and distributed systems, enabling you to visualize countless entities effectively and at scale. This is useful for identifying performance bottlenecks and understanding the overall user experience.

Latency

Latency Infrastructure Monitoring Metrics

Build systems more reliably with Dynatrace: Chaos Engineering

Dynatrace

AUGUST 21, 2024

These releases often assumed ideal conditions such as zero latency, infinite bandwidth, and no network loss, as highlighted in Peter Deutsch’s eight fallacies of distributed systems. With Dynatrace, teams can seamlessly monitor the entire system, including network switches, database storage, and third-party dependencies.

Engineering

Engineering Systems Latency Metrics

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

This article outlines the key differences in architecture, performance, and use cases to help determine the best fit for your workload. Introduction to Message Brokers Message brokers enable applications, services, and systems to communicate by acting as intermediaries between senders and receivers.

Latency

Latency Analytics Architecture Storage

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

By Ko-Jen Hsiao , Yesu Feng and Sudarshan Lamkhede Motivation Netflixs personalized recommender system is a complex system, boasting a variety of specialized machine learned models each catering to distinct needs including Continue Watching and Todays Top Picks for You. Refer to our recent overview for more details).

Tuning

Tuning Efficiency Latency Strategy

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. In this multi-part blog series, we take you behind the scenes of our system that processes billions of impressions daily.

Tuning

Tuning Latency Efficiency Storage

Optimize Citrix platform performance and user experience with Dynatrace (GA)

Dynatrace

JANUARY 15, 2020

This extends Dynatrace visibility into Citrix user experience and Citrix platform performance. Therefore, it requires multidimensional and multidisciplinary monitoring: Infrastructure health —automatically monitor the compute, storage, and network resources available to the Citrix system to ensure a stable platform. Citrix VDA.

Latency

Latency Performance Virtualization Infrastructure

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems.

Systems

Systems Media Cache Open Source

API Design Principles for Optimal Performance and Scalability

DZone

JUNE 22, 2023

The post will provide a comprehensive guide to understanding the key principles and best practices for optimizing the performance of APIs. What Is API Performance Optimization? API performance optimization is the process of improving the speed, scalability, and reliability of APIs.

Scalability

Scalability Design Best Practices Performance

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

To achieve this, we are committed to building robust systems that deliver comprehensive observability, enabling us to take full accountability for every title on ourservice. Each title represents countless hours of effort and creativity, and our systems need to honor that uniqueness. Yet, these pages couldnt be more different.

Traffic

Traffic Scalability Strategy Monitoring

Best practices and key metrics for improving mobile app performance

Dynatrace

DECEMBER 13, 2023

Mobile applications (apps) are an increasingly important channel for reaching customers, but the distributed nature of mobile app platforms and delivery networks can cause performance problems that leave users frustrated, or worse, turning to competitors. What is mobile app performance?

Best Practices

Best Practices Mobile Metrics Performance

Maximize user experience with out-of-the-box service-performance SLOs

Dynatrace

AUGUST 25, 2023

This article explores SLOs for service performance. According to the Google Site Reliability Engineering (SRE) handbook, monitoring the four golden signals is crucial in delivering high-performing software solutions. SLOs, as a measure of service quality, can track the related availability, reliability, and performance.

Performance

Performance Latency Traffic Metrics

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Behind the scenes, a myriad of systems and services are involved in orchestrating the product experience. These backend systems are consistently being evolved and optimized to meet and exceed customer and product expectations. This approach has a handful of benefits. This technique facilitates validation on multiple fronts.

Traffic

Traffic Latency Tuning Systems

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Key Takeaways RabbitMQ improves scalability and fault tolerance in distributed systems by decoupling applications, enabling reliable message exchanges.

Best Practices

Best Practices Traffic Strategy Efficiency

Mastering Latency With P90, P99, and Mean Response Times

DZone

FEBRUARY 5, 2024

In the fast-paced digital world, where every millisecond counts, understanding the nuances of network latency becomes paramount for developers and system architects. Latency, the delay before a transfer of data begins following an instruction for its transfer, can significantly impact user experience and system performance.

Latency

Latency Metrics Network Systems

Build automated self-healing systems with xMatters and Dynatrace (Part 2 of 3)

Dynatrace

AUGUST 27, 2019

As soon as Dynatrace detects a disk health related issue—in this case Low disk space—the Dynatrace AI causation engine provides automated root cause analysis that shows you all related performance errors, as well as which applications and services have been affected by the issue. xMatters creates and updates Jira issues.

Systems

Systems DevOps Latency Azure

Noisy Neighbor Detection with eBPF

The Netflix TechBlog

SEPTEMBER 10, 2024

By Jose Fernandez , Sebastien Dabdoub , Jason Koch , Artem Tkachuk The Compute and Performance Engineering teams at Netflix regularly investigate performance issues in our multi-tenant environment. Traditional performance analysis tools such as perf can introduce significant overhead, risking further performance degradation.

Latency

Latency Metrics Programming Monitoring

Optimize your environment: Unveiling Dynatrace Hyper-V extension for enhanced performance and efficient troubleshooting

Dynatrace

OCTOBER 23, 2023

Microsoft Hyper-V is a virtualization platform that manages virtual machines (VMs) on Windows-based systems. It enables multiple operating systems to run simultaneously on the same physical hardware and integrates closely with Windows-hosted services. This leads to a more efficient and streamlined experience for users.

Efficiency

Efficiency Virtualization Hardware Performance

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Dynatrace

OCTOBER 3, 2024

Dynatrace OTel Collector Understand your applications with ease Due to a lack of contextual insights and actionable intelligence, application teams often find themselves overwhelmed by data, unable to quickly identify the root causes of performance issues.

Performance

Performance Architecture Innovation Latency

Bandwidth or Latency: When to Optimise for Which

CSS Wizardry

JANUARY 31, 2019

When it comes to network performance, there are two main limiting factors that will slow you down: bandwidth and latency. Latency is defined as…. Where bandwidth deals with capacity, latency is more about speed of transfer 2. and reduction in latency. and reduction in latency. Bandwidth is defined as….

Latency

Latency Network Speed Servers

Extending Vector with eBPF to inspect host and container performance

The Netflix TechBlog

FEBRUARY 20, 2019

by Jason Koch , with Martin Spier , Brendan Gregg , Ed Hunter Improving the tools available to our engineers to help them diagnose, triage, and work through software performance challenges in the cloud is a key goal for the cloud performance engineering team at Netflix. to the broader community.

Performance

Performance Latency Open Source Metrics

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly

MARCH 25, 2025

The system is inconsistent, slow, hallucinatingand that amazing demo starts collecting digital dust. Two big things: They bring the messiness of the real world into your system through unstructured data. When your system is both ingesting messy real-world data AND producing nondeterministic outputs, you need a different approach.

Systems

Systems Development Tuning Monitoring

How to Improve MySQL AWS Performance 2X Over Amazon RDS at The Same Cost

Scalegrid

OCTOBER 24, 2019

As organizations continue to migrate to the cloud, it’s important to get in front of performance issues, such as high latency, low throughput, and replication lag with higher distances between your users and cloud infrastructure. MySQL on AWS Performance Test. AWS High Performance XLarge (see system details below).

AWS

AWS Latency Performance Performance Testing

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

This blog post will share broadly-applicable techniques (beyond GraphQL) we used to perform this migration. To determine customer impact, we could compare various metrics such as error rates, latencies, and time to render. The AB experiment results hinted that GraphQL’s correctness was not up to par with the legacy system.

Traffic

Traffic Latency Metrics Cache

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Firstly, developers struggled to reason about consistency, durability and performance in this complex global deployment across multiple stores. These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination.

Latency

Latency Storage Cache Efficiency

OpenTelemetry 101: A nontechnical guide for IT leaders and enthusiasts

Dynatrace

JULY 22, 2024

Using OpenTelemetry, developers can collect and process telemetry data from applications, services, and systems. Observability Observability is the ability to determine a system’s health by analyzing the data it generates, such as logs, metrics, and traces. There are three main types of telemetry data: Metrics.

Latency

Latency Best Practices Metrics Open Source

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing One approach to such a challenging scenario is stream processing, a computing paradigm and software architectural style for data-intensive software systems that emerged to cope with requirements for near real-time processing of massive amounts of data. This significantly increases event latency.

Engineering

Engineering Tuning Latency Open Source

Best Practice for Creating Indexes on your MySQL Tables

Scalegrid

NOVEMBER 20, 2019

By having appropriate indexes on your MySQL tables, you can greatly enhance the performance of SELECT queries. During this time, you are also likely to experience a degraded performance of queries as your system resources are busy in index-creation work as well. Performance Benefits of Rolling Index Creation.

Best Practices

Best Practices Latency Tuning Database

Implementing service-level objectives to improve software quality

Dynatrace

DECEMBER 27, 2022

When organizations implement SLOs, they can improve software development processes and application performance. SLOs can be a great way for DevOps and infrastructure teams to use data and performance expectations to make decisions, such as whether to release and where engineers should focus their time. SLOs improve software quality.

Software

Software Software Benchmarking Latency

Resilience Pattern: Circuit Breaker

DZone

NOVEMBER 16, 2023

In this article, we will explore one of the most common and useful resilience patterns in distributed systems: the circuit breaker. The circuit breaker is a design pattern that prevents cascading failures and improves the overall availability and performance of a system. What Is a Circuit Breaker?

Latency

Latency Network Database Monitoring

Who will watch the watchers? Extended infrastructure observability for WSO2 API Manager

Dynatrace

SEPTEMBER 18, 2020

Sure, cloud infrastructure requires comprehensive performance visibility, as Dynatrace provides , but the services that leverage cloud infrastructures also require close attention. Well-defined APIs are required for managing such microservices and tracking changes in their performance. High latency or lack of responses.

Infrastructure

Infrastructure Latency Metrics Cloud

Optimizing your Kubernetes clusters without breaking the bank

Dynatrace

JANUARY 14, 2022

That is because Kubernetes provides several benefits from a performance perspective. However, setting the right parameters for Kubernetes clusters to ensure application availability, performance, and resilience while avoiding overspending isn’t a walk in the park. Dynatrace news. below 500ms) and error rates (e.g. lower than 2%.).

Latency

Latency Tuning Efficiency AWS

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. CFS is widely used and therefore well tested and Linux machines around the world run with reasonable performance.

Cache

Cache Latency Airlines Logistics

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

As the number of Titus users increased over the years, the load and pressure on the system increased substantially. cell): Titus Job Coordinator is a leader elected process managing the active state of the system. For example, a batch workflow orchestration system may create multiple jobs which are part of a single workflow execution.

Cache

Cache Latency Traffic Systems

What are quality gates? How to use quality gates to deliver better software at speed and scale

Dynatrace

FEBRUARY 21, 2024

Benefits of quality gates Quality gates provide several advantages to organizations, including the following: Optimized software performance : Quality gates assess code at different SDLC stages and ensure that only high-quality code progresses. Several tools can be used to collect metrics in load/performance testing.

Speed

Speed Software Software Latency

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

This is where large-scale system migrations come into play. A small percentage of production traffic is redirected to the two new clusters, allowing us to monitor the new version’s performance and compare it against the current version. Canaries and sticky canaries are valuable tools in the system migration process.

Traffic

Traffic Metrics Systems Strategy

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

High Scalability

SEPTEMBER 8, 2018

You get all the multicore Anna performance you want, but you don’t pay for what you don’t need. Just to throw out some numbers, we measured Anna providing 355x the performance of DynamoDB for the dollar. No, I don’t think that is because AWS is earning a 355x margin on DynamoDB!

Storage

Storage Performance AWS Cloud

Optimizing Database Performance in Middleware Applications

Netflix’s Distributed Counter Abstraction

Trending Sources

Optimising for High Latency Environments

How to Optimize CPU Performance Through Isolation and System Tuning

Rapid Event Notification System at Netflix

Seeing through hardware counters: a journey to threefold performance increase

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

Next-level interaction and customization of data visualizations in Dynatrace Dashboards and Notebooks

Build systems more reliably with Dynatrace: Chaos Engineering

RabbitMQ vs. Kafka: Key Differences

Foundation Model for Personalized Recommendation

Introducing Impressions at Netflix

Optimize Citrix platform performance and user experience with Dynatrace (GA)

Supporting Diverse ML Systems at Netflix

API Design Principles for Optimal Performance and Scalability

Title Launch Observability at Netflix Scale

Best practices and key metrics for improving mobile app performance

Maximize user experience with out-of-the-box service-performance SLOs

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Best Practices for Scaling RabbitMQ

Mastering Latency With P90, P99, and Mean Response Times

Build automated self-healing systems with xMatters and Dynatrace (Part 2 of 3)

Noisy Neighbor Detection with eBPF

Optimize your environment: Unveiling Dynatrace Hyper-V extension for enhanced performance and efficient troubleshooting

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Bandwidth or Latency: When to Optimise for Which

Extending Vector with eBPF to inspect host and container performance

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

How to Improve MySQL AWS Performance 2X Over Amazon RDS at The Same Cost

Migrating Netflix to GraphQL Safely

Introducing Netflix’s Key-Value Data Abstraction Layer

OpenTelemetry 101: A nontechnical guide for IT leaders and enthusiasts

Why applying chaos engineering to data-intensive applications matters

Best Practice for Creating Indexes on your MySQL Tables

Implementing service-level objectives to improve software quality

Resilience Pattern: Circuit Breaker

Who will watch the watchers? Extended infrastructure observability for WSO2 API Manager

Optimizing your Kubernetes clusters without breaking the bank

Predictive CPU isolation of containers at Netflix

Consistent caching mechanism in Titus Gateway

What are quality gates? How to use quality gates to deliver better software at speed and scale

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Introducing Netflix TimeSeries Data Abstraction Layer

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

Stay Connected