Architecture, Infrastructure and Latency - Technology Performance Pulse

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Dynatrace

APRIL 10, 2025

As an executive, I am always seeking simplicity and efficiency to make sure the architecture of the business is as streamlined as possible. Re-indexing data and rehydrating it from cold storage for incident investigation and forensics causes query latency and additional management overhead and cost.

Strategy

Strategy Storage Network Architecture

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

This article outlines the key differences in architecture, performance, and use cases to help determine the best fit for your workload. RabbitMQ follows a message broker model with advanced routing, while Kafkas event streaming architecture uses partitioned logs for distributed processing. What is RabbitMQ? What is Apache Kafka?

Latency

Latency Analytics Architecture Storage

Who will watch the watchers? Extended infrastructure observability for WSO2 API Manager

Dynatrace

SEPTEMBER 18, 2020

Sure, cloud infrastructure requires comprehensive performance visibility, as Dynatrace provides , but the services that leverage cloud infrastructures also require close attention. Cloud-based application architectures commonly leverage microservices. Extend infrastructure observability to WSO2 API Manager.

Infrastructure

Infrastructure Latency Metrics Cloud

Spring WebFlux: publishOn vs subscribeOn for Improving Microservices Performance

DZone

SEPTEMBER 23, 2024

With the rise of microservices architecture , there has been a rapid acceleration in the modernization of legacy platforms, leveraging cloud infrastructure to deliver highly scalable, low-latency, and more responsive services. Why Use Spring WebFlux?

Performance

Performance Latency Architecture Programming

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

This scenario underscored the need for a new recommender system architecture where member preference learning is centralized, enhancing accessibility and utility across different models. Yet, many are confined to a brief temporal window due to constraints in serving latency or training costs.

Tuning

Tuning Efficiency Latency Strategy

Why Replace External Database Caches?

DZone

AUGUST 28, 2024

Putting an external cache in front of the database is commonly used to compensate for subpar latency stemming from various factors, such as inefficient database internals, driver usage, infrastructure choices, traffic spikes, and so on. This is a clear performance-oriented decision.

Cache

Cache Database Latency Traffic

Solve hybrid Kubernetes performance and reliability problems with unified observability

Dynatrace

APRIL 10, 2025

While this hybrid architectural approach offers flexibility, it also introduces the need for unified observability. Step 3: Deploy OneAgent on the Windows nodes for infrastructure observability To ensure infrastructure observability on Windows nodes, start by configuring and running a Dynatrace Workflow.

Performance

Performance Java Operating System Infrastructure

Dynatrace supports SnapStart for Lambda as an AWS launch partner

Dynatrace

NOVEMBER 28, 2022

The new Amazon capability enables customers to improve the startup latency of their functions from several seconds to as low as sub-second (up to 10 times faster) at P99 (the 99th latency percentile). This can cause latency outliers and may lead to a poor end-user experience for latency-sensitive applications.

Lambda

Lambda AWS Serverless Latency

How to maximize serverless benefits and overcome its challenges

Dynatrace

OCTOBER 10, 2022

Reduced latency. Serverless architecture makes it possible to host code anywhere, rather than relying on an origin server. By using cloud providers with multiple server sites, organizations can reduce function latency for end users. No infrastructure to maintain. Architectural complexity. Optimizes resources.

Serverless

Serverless Infrastructure Lambda Latency

What is observability? Not just logs, metrics and traces

Dynatrace

OCTOBER 1, 2021

As dynamic systems architectures increase in complexity and scale, IT teams face mounting pressure to track and respond to conditions and issues across their multi-cloud environments. Dynatrace news. As teams begin collecting and working with observability data, they are also realizing its benefits to the business, not just IT.

Metrics

Metrics Open Source Monitoring Cloud

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Vidhya Arvind , Rajasekhar Ummadisetty , Joey Lynch , Vinay Chella Introduction At Netflix our ability to deliver seamless, high-quality, streaming experiences to millions of users hinges on robust, global backend infrastructure. Data Model At its core, the KV abstraction is built around a two-level map architecture.

Latency

Latency Storage Cache Efficiency

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Scalegrid

JUNE 4, 2020

As an open source database, it’s a highly popular choice for enterprise applications looking to modernize their infrastructure and reduce their total cost of ownership, along with startup and developer applications looking for a powerful, flexible and cost-effective database to work with. Compare Latency. At a glance – TLDR.

Database

Database Latency Benchmarking Performance

Implementing service-level objectives to improve software quality

Dynatrace

DECEMBER 27, 2022

As more organizations embrace microservices-based architecture to deliver goods and services digitally, maintaining customer satisfaction has become exponentially more challenging. First, it helps to understand that applications and all the services and infrastructure that support them generate telemetry data based on traffic from real users.

Software

Software Software Benchmarking Latency

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

This decoupling is crucial in modern architectures where scalability and fault tolerance are paramount. The architecture of RabbitMQ is meticulously designed for complex message routing, enabling dynamic and flexible interactions between producers and consumers. Keeping queues short maintains a responsive and efficient RabbitMQ setup.

Best Practices

Best Practices Traffic Strategy Efficiency

Optimizing your Kubernetes clusters without breaking the bank

Dynatrace

JANUARY 14, 2022

Its ability to densely schedule containers into the underlying machines translates to low infrastructure costs. The following figure shows the high-level architecture where any load testing solution (e.g. That is because Kubernetes provides several benefits from a performance perspective. below 500ms) and error rates (e.g.

Latency

Latency Tuning Efficiency AWS

Dynatrace supports Azure Managed Instance for Apache Cassandra

Dynatrace

MAY 13, 2022

Because of its scalability and distributed architecture, thousands of companies trust it to run their cloud and hybrid-based workloads at high availability without compromising performance. It also removes the need for developers and database administrators to manage infrastructure or update database versions.

Azure

Azure Latency Metrics Infrastructure

Site reliability engineering: 5 things you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. ” According to Google, “SRE is what you get when you treat operations as a software problem.” Solving for SR.

Engineering

Engineering DevOps Government Latency

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

What is serverless computing? Driving efficiency without sacrificing observability

Dynatrace

JANUARY 26, 2021

Within this paradigm, it is possible to run entire architectures without touching a traditional virtual server, either locally or in the cloud. In a serverless architecture, applications are distributed to meet demand and scale requirements efficiently. When an application is triggered, it can cause latency as the application starts.

Serverless

Serverless Efficiency Lambda AWS

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Dynatrace

OCTOBER 3, 2024

Trace your application Imagine a microservices architecture with hundreds of dependencies. Without distributed tracing, pinpointing the cause of increased latency could take hours or even days. Interact with data intuitively and easily and benefit from immediate, AI-supported insights.

Performance

Performance Architecture Innovation Latency

How Park ‘N Fly eliminated silos and improved customer experience with Dynatrace cloud monitoring

Dynatrace

APRIL 7, 2021

But your infrastructure teams don’t see any issue on their AWS or Azure monitoring tools, your platform team doesn’t see anything too concerning in Kubernetes logging, and your apps team says there are green lights across the board. This scenario has become all too common as digital infrastructure has grown increasingly complex.

Cloud

Cloud Monitoring Latency Games

Dynatrace accelerates business transformation with new AI observability solution

Dynatrace

JANUARY 31, 2024

Retrieval-augmented generation emerges as the standard architecture for LLM-based applications Given that LLMs can generate factually incorrect or nonsensical responses, retrieval-augmented generation (RAG) has emerged as an industry standard for building GenAI applications. million AI server units annually by 2027, consuming 75.4+

Cache

Cache Azure Infrastructure Monitoring

Site reliability engineering: 5 things to you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. ” According to Google, “SRE is what you get when you treat operations as a software problem.” Solving for SR.

Engineering

Engineering DevOps Government Latency

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing One approach to such a challenging scenario is stream processing, a computing paradigm and software architectural style for data-intensive software systems that emerged to cope with requirements for near real-time processing of massive amounts of data. This significantly increases event latency.

Engineering

Engineering Tuning Latency Open Source

Dynatrace supports the newly released AWS Lambda Response Streaming

Dynatrace

APRIL 7, 2023

Customers can use AWS Lambda Response Streaming to improve performance for latency-sensitive applications and return larger payload sizes. Despite being serverless, the function still requires infrastructure on which to run. What is a Lambda serverless function? Return larger payload sizes.

Lambda

Lambda AWS Serverless Latency

The Netflix Cosmos Platform

The Netflix TechBlog

MARCH 1, 2021

It supports both high throughput services that consume hundreds of thousands of CPUs at a time, and latency-sensitive workloads where humans are waiting for the results of a computation. The subsystems all communicate with each other asynchronously via Timestone, a high-scale, low-latency priority queuing system.

Serverless

Serverless Media Latency Social Media

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

While data lakes and data warehousing architectures are commonly used modes for storing and analyzing data, a data lakehouse is an efficient third way to store and analyze data that unifies the two architectures while preserving the benefits of both. Data lakehouses deliver the query response with minimal latency.

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix TechBlog

JANUARY 10, 2024

This architecture shift greatly reduced the processing latency and increased system resiliency. We expanded pipeline support to serve our studio/content-development use cases, which had different latency and resiliency requirements as compared to the traditional streaming use case.

Processing

Processing Media Latency Innovation

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

SEPTEMBER 8, 2020

We tried a few iterations of what this new service should look like, and eventually settled on a modern architecture that aimed to give more control of the API experience to the client teams. For us, it means that we now need to have ~15 MDN tabs open when writing routes :) Let’s briefly discuss the architecture of this microservice.

Latency

Latency Cache Java Traffic

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

How site reliability engineering affects organizations’ bottom line SRE applies the disciplines of software engineering to infrastructure management, both on-premises and in the cloud. Microservices-based architectures and software containers enable organizations to deploy and modify applications with unprecedented speed.

Best Practices

Best Practices DevOps Latency Metrics

For your eyes only: improving Netflix video quality with neural networks

The Netflix TechBlog

NOVEMBER 17, 2022

Our approach to NN-based video downscaling The deep downscaler is a neural network architecture designed to improve the end-to-end video quality by learning a higher-quality video downscaler. Architecture of the deep downscaler model, consisting of a preprocessing block followed by a resizing block.

Network

Network Media Innovation Efficiency

Designing Instagram

High Scalability

JANUARY 11, 2022

Architecture. FUN FACT : In this talk , Rodrigo Schmidt, director of engineering at Instagram talks about the different challenges they have faced in scaling the data infrastructure at Instagram. When a user requests for feed then there will be two parallel threads involved in fetching the user feeds to optimize for latency.

Design

Design Media Storage Logistics

What is AWS Lambda?

Dynatrace

APRIL 5, 2021

Organizations can offload much of the burden of managing app infrastructure and transition many functions to the cloud by going serverless with the help of Lambda. AWS continues to improve how it handles latency issues. An application could rely on dozens or even hundreds of Lambdas and other infrastructure.

Lambda

Lambda AWS Serverless Hardware

Managing risk for financial services: The secret to visibility and control during times of volatility

Dynatrace

APRIL 8, 2024

Optimize the IT infrastructure supporting risk management processes and controls for maximum performance and resilience. The IT infrastructure, services, and applications that enable processes for risk management must perform optimally. Once teams solidify infrastructure and application performance, security is the subsequent priority.

Analytics

Analytics Infrastructure Efficiency Technology

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

ITOps is an IT discipline involving actions and decisions made by the operations team responsible for an organization’s IT infrastructure. Besides the traditional system hardware, storage, routers, and software, ITOps also includes virtual components of the network and cloud infrastructure. What is ITOps?

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

Automated observability, security, and reliability at scale

Dynatrace

JULY 18, 2023

This is especially crucial in microservice architectures, where the number of components can be overwhelming. Configuration as Code in Git repos, automatically applied by Dynatrace Analogous to infrastructure as code, Configuration as Code, or “everything as code” is now essential for tackling software development challenges.

Best Practices

Best Practices Code Infrastructure Latency

Under the Hood of Amazon EC2 Container Service

All Things Distributed

JULY 20, 2015

Today, I want to explore the Amazon ECS architecture and what this architecture enables. This architecture affords Amazon ECS high availability, low latency, and high throughput because the data store is never pessimistically locked. Below is a diagram of the basic components of Amazon ECS: How we coordinate the cluster.

Latency

Latency Architecture AWS Open Source

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

NOVEMBER 2, 2020

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. Those two metrics are approximate indicators of failures and latency.

Traffic

Traffic Metrics Infrastructure Architecture

Observability platform vs. observability tools

Dynatrace

DECEMBER 22, 2021

Metrics are measures of critical system values, such as CPU utilization or average write latency to persistent storage. They are particularly important in distributed systems, such as microservices architectures. Observability platforms are becoming essential as the complexity of cloud-native architectures increases.

Artificial Intelligence

Artificial Intelligence Metrics Architecture DevOps

Unlock the power of contextual log analytics

Dynatrace

OCTOBER 2, 2024

For instance, in a Kubernetes environment, if an application fails, logs in context not only highlight the error alongside corresponding log entries but also provide correlated logs from surrounding services and infrastructure components. Keep in mind that Dynatrace Grail is schema-on-read and indexless, built with scaling in mind.

Analytics

Analytics AWS DevOps Cloud

Enhancing Kubernetes cluster management key to platform engineering success

Dynatrace

MARCH 29, 2024

“We use AI to optimize the configuration of the software stack,” Doni said, highlighting how Akamas works by taking into account infrastructure and application metrics at the same time to achieve its optimization goals. You can ask for the best configuration to reduce latency or improve the user experience.”

Engineering

Engineering DevOps Operating System Open Source

What are SLOs? How service-level objectives work with SLIs to deliver on SLAs

Dynatrace

DECEMBER 2, 2021

As organizations adopt microservices-based architecture , service-level objectives (SLOs) have become a vital way for teams to set specific, measurable targets that ensure users are receiving agreed-upon service levels. You can set SLOs based on individual indicators, such as batch throughput, request latency, and failures-per-second.

Metrics

Metrics Best Practices DevOps Infrastructure

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Netflix’s Distributed Counter Abstraction

Trending Sources

RabbitMQ vs. Kafka: Key Differences

Who will watch the watchers? Extended infrastructure observability for WSO2 API Manager

Spring WebFlux: publishOn vs subscribeOn for Improving Microservices Performance

Foundation Model for Personalized Recommendation

Why Replace External Database Caches?

Solve hybrid Kubernetes performance and reliability problems with unified observability

Dynatrace supports SnapStart for Lambda as an AWS launch partner

How to maximize serverless benefits and overcome its challenges

What is observability? Not just logs, metrics and traces

Introducing Netflix’s Key-Value Data Abstraction Layer

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Implementing service-level objectives to improve software quality

Best Practices for Scaling RabbitMQ

Optimizing your Kubernetes clusters without breaking the bank

Dynatrace supports Azure Managed Instance for Apache Cassandra

Site reliability engineering: 5 things you need to know

Introducing Netflix TimeSeries Data Abstraction Layer

What is serverless computing? Driving efficiency without sacrificing observability

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

How Park ‘N Fly eliminated silos and improved customer experience with Dynatrace cloud monitoring

Dynatrace accelerates business transformation with new AI observability solution

Site reliability engineering: 5 things to you need to know

Why applying chaos engineering to data-intensive applications matters

Dynatrace supports the newly released AWS Lambda Response Streaming

The Netflix Cosmos Platform

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Rebuilding Netflix Video Processing Pipeline with Microservices

Seamlessly Swapping the API backend of the Netflix Android app

Site reliability done right: 5 SRE best practices that deliver on business objectives

For your eyes only: improving Netflix video quality with neural networks

Designing Instagram

What is AWS Lambda?

Managing risk for financial services: The secret to visibility and control during times of volatility

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Predictive CPU isolation of containers at Netflix

Automated observability, security, and reliability at scale

Under the Hood of Amazon EC2 Container Service

Keeping Netflix Reliable Using Prioritized Load Shedding

Observability platform vs. observability tools

Unlock the power of contextual log analytics

Enhancing Kubernetes cluster management key to platform engineering success

What are SLOs? How service-level objectives work with SLIs to deliver on SLAs

Stay Connected