Architecture, Latency and Systems - Technology Performance Pulse

Optimizing Database Performance in Middleware Applications

DZone

FEBRUARY 14, 2025

In the realm of modern software architecture, middleware plays a pivotal role in connecting various components of distributed systems. Efficient database operations in middleware can dramatically improve overall system performance, reduce latency, and enhance user experience.

Database

Database Performance Software Architecture Latency

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

To this end, we developed a Rapid Event Notification System (RENO) to support use cases that require server initiated communication with devices in a scalable and extensible manner. In this blog post, we will give an overview of the Rapid Event Notification System at Netflix and share some of the learnings we gained along the way.

Systems

Systems Traffic Architecture Mobile

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

This article outlines the key differences in architecture, performance, and use cases to help determine the best fit for your workload. RabbitMQ follows a message broker model with advanced routing, while Kafkas event streaming architecture uses partitioned logs for distributed processing.

Latency

Latency Analytics Architecture Storage

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

The Netflix TechBlog

SEPTEMBER 29, 2022

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support for Non-Parallelizable Workloads by Kostas Christidis Introduction Timestone is a high-throughput, low-latency priority queueing system we built in-house to support the needs of Cosmos , our media encoding platform.

Latency

Latency Systems Media Serverless

Efficient Multimodal Data Processing: A Technical Deep Dive

DZone

FEBRUARY 27, 2025

Multimodal data processing is the evolving need of the latest data platforms powering applications like recommendation systems, autonomous vehicles, and medical diagnostics. Handling multimodal data spanning text, images, videos, and sensor inputs requires resilient architecture to manage the diversity of formats and scale.

Efficiency

Efficiency Processing Latency Storage

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

John McCalpin

FEBRUARY 17, 2025

Understanding sustained memory bandwidth in these systems starts with assuming 100% utilization and then reviewing the factors that get in the way (e.g., This requires a completely different approach to modeling the memory system — one based on Little’s Law from queueing theory.

Latency

Latency Hardware Cache Systems

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. In this multi-part blog series, we take you behind the scenes of our system that processes billions of impressions daily.

Tuning

Tuning Latency Efficiency Storage

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Key Takeaways RabbitMQ improves scalability and fault tolerance in distributed systems by decoupling applications, enabling reliable message exchanges.

Best Practices

Best Practices Traffic Strategy Efficiency

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Behind the scenes, a myriad of systems and services are involved in orchestrating the product experience. These backend systems are consistently being evolved and optimized to meet and exceed customer and product expectations. This blog series will examine the tools, techniques, and strategies we have utilized to achieve this goal.

Traffic

Traffic Latency Tuning Systems

Architectural Insights: Designing Efficient Multi-Layered Caching With Instagram Example

DZone

FEBRUARY 27, 2024

Leveraging this hierarchical structure can significantly reduce latency and improve overall performance.

Cache

Cache Efficiency Architecture Design

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination. Data Model At its core, the KV abstraction is built around a two-level map architecture.

Latency

Latency Storage Cache Servers

Lessons learned from enterprise service-level objective management

Dynatrace

MAY 19, 2022

Every organization’s goal is to keep its systems available and resilient to support business demands. Lastly, error budgets, as the difference between a current state and the target, represent the maximum amount of time a system can fail per the contractual agreement without repercussions. Example 1: Architecture boundaries.

Automotive

Automotive Latency Architecture Mobile

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

As the number of Titus users increased over the years, the load and pressure on the system increased substantially. The original assumptions and architectural choices were no longer viable. Overview The figure below depicts a simplified high-level architecture of a single Titus cluster (a.k.a

Cache

Cache Latency Traffic Systems

Optimizing your Kubernetes clusters without breaking the bank

Dynatrace

JANUARY 14, 2022

The following figure shows the high-level architecture where any load testing solution (e.g. The optimization goal was to improve the application efficiency, that is to improve the ratio between service throughput and cloud costs while not increasing the application latency (e.g. below 500ms) and error rates (e.g. lower than 2%.).

Latency

Latency Tuning Efficiency AWS

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing One approach to such a challenging scenario is stream processing, a computing paradigm and software architectural style for data-intensive software systems that emerged to cope with requirements for near real-time processing of massive amounts of data. This significantly increases event latency.

Engineering

Engineering Tuning Latency Open Source

Implementing service-level objectives to improve software quality

Dynatrace

DECEMBER 27, 2022

As more organizations embrace microservices-based architecture to deliver goods and services digitally, maintaining customer satisfaction has become exponentially more challenging. Latency is the time that it takes a request to be served. Define SLOs for each service. Reliability. This is what Dynatrace captures as response time.

Software

Software Software Benchmarking Latency

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

SEPTEMBER 24, 2021

Table 1: Movie and File Size Examples Initial Architecture A simplified view of our initial cloud video processing pipeline is illustrated in the following diagram. Lastly, the packager kicks in, adding a system layer to the asset, making it ready to be consumed by the clients.

Cloud

Cloud Media Storage Cache

Site reliability engineering: 5 things you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. SRE applies DevOps principles to developing systems and software that help increase site reliability and performance.

Engineering

Engineering DevOps Government Latency

What is observability? Not just logs, metrics and traces

Dynatrace

OCTOBER 1, 2021

As dynamic systems architectures increase in complexity and scale, IT teams face mounting pressure to track and respond to conditions and issues across their multi-cloud environments. Dynatrace news. But what is observability? Why is it important, and what can it actually help organizations achieve? What is observability?

Metrics

Metrics Open Source Monitoring Cloud

Who will watch the watchers? Extended infrastructure observability for WSO2 API Manager

Dynatrace

SEPTEMBER 18, 2020

Cloud-based application architectures commonly leverage microservices. High latency or lack of responses. You receive an alert message from Dynatrace (your infrastructure observability hub) letting you know that the average response latency of all deployed APIs has tripled. Soaring number of active connections.

Infrastructure

Infrastructure Latency Metrics Cloud

What is serverless computing? Driving efficiency without sacrificing observability

Dynatrace

JANUARY 26, 2021

Traditional computing models rely on virtual or physical machines, where each instance includes a complete operating system, CPU cycles, and memory. Within this paradigm, it is possible to run entire architectures without touching a traditional virtual server, either locally or in the cloud. What is serverless computing?

Serverless

Serverless Efficiency Lambda AWS

The Netflix Cosmos Platform

The Netflix TechBlog

MARCH 1, 2021

It supports both high throughput services that consume hundreds of thousands of CPUs at a time, and latency-sensitive workloads where humans are waiting for the results of a computation. The first generation of this system went live with the streaming launch in 2007. Delivery?—?A

Serverless

Serverless Media Latency Social Media

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Dynatrace

OCTOBER 3, 2024

Trace your application Imagine a microservices architecture with hundreds of dependencies. Without distributed tracing, pinpointing the cause of increased latency could take hours or even days. Interact with data intuitively and easily and benefit from immediate, AI-supported insights.

Performance

Performance Architecture Innovation Latency

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix TechBlog

JANUARY 10, 2024

This architecture shift greatly reduced the processing latency and increased system resiliency. By integrating with studio content systems, we enabled the pipeline to leverage rich metadata from the creative side and create more engaging member experiences like interactive storytelling.

Processing

Processing Media Latency Innovation

Site reliability engineering: 5 things to you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. SRE applies DevOps principles to developing systems and software that help increase site reliability and performance.

Engineering

Engineering DevOps Government Latency

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

Scalegrid

OCTOBER 17, 2019

On modern Linux systems, the difference in overhead between forking a process and creating a thread is much lesser than it used to be. Moving to a multithreaded architecture will require extensive rewrites. The PostgreSQL Architecture | Source. The Connection Pool Architecture.

Architecture

Architecture Database Latency Servers

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Infrastructure

Designing Instagram

High Scalability

JANUARY 11, 2022

Architecture. The streaming data store makes the system extensible to support other use-cases (e.g. System Components. The system will comprise of several micro-services each performing a separate task. Sending and receiving messages from other users. High Level Design. Fetching User Feed. Optimization.

Design

Design Media Storage Logistics

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

Improved Alerting with Atlas Streaming Eval

The Netflix TechBlog

APRIL 27, 2023

Engineers want their alerting system to be realtime, reliable, and actionable. A few years ago, we were paged by our SRE team due to our Metrics Alerting System falling behind — critical application health alerts reached engineers 45 minutes late! In other words, false positives are bad but false negatives are the absolute worst!

Storage

Storage Cache Metrics Database

Observability vs. monitoring: What’s the difference?

Dynatrace

NOVEMBER 3, 2021

Organizations are depending more and more on distributed architectures to provide application services. Logging provides additional data but is typically viewed in isolation of a broader system context. Monitoring is capturing and displaying data, whereas observability can discern system health by analyzing its inputs and outputs.

Monitoring

Monitoring Metrics DevOps Scalability

Dynatrace accelerates business transformation with new AI observability solution

Dynatrace

JANUARY 31, 2024

GenAI is prone to erratic behavior due to unforeseen data scenarios or underlying system issues. Figure 1: Sample RAG architecture While this approach significantly improves the response quality of GenAI applications, it also introduces new challenges.

Cache

Cache Azure Infrastructure Monitoring

Observability platform vs. observability tools

Dynatrace

DECEMBER 22, 2021

Complex information systems fail in unexpected ways. Observability gives developers and system operators real-time awareness of a highly distributed system’s current state based on the data it generates. With observability, teams can understand what part of a system is performing poorly and how to correct the problem.

Artificial Intelligence

Artificial Intelligence Metrics Architecture DevOps

Edgar: Solving Mysteries Faster with Observability

The Netflix TechBlog

SEPTEMBER 2, 2020

Edgar helps Netflix teams troubleshoot distributed systems efficiently with the help of a summarized presentation of request tracing, logs, analysis, and metadata. The more complex a system, the more places to look for clues. In an earlier blog post, we discussed Telltale , our health monitoring system. What is Edgar?

Latency

Latency Transportation Engineering Traffic

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

This is where large-scale system migrations come into play. By collecting and analyzing key performance metrics of the service over time, we can assess the impact of the new changes and determine if they meet the availability, latency, and performance requirements. But what happens when this machinery needs a transformation?

Traffic

Traffic Metrics Systems Strategy

What is AWS Lambda?

Dynatrace

APRIL 5, 2021

AWS Lambda enables organizations to access many types of functions from AWS’ cloud-based services, such as: Data processing, to execute code based on triggers, system states, or user actions. You will likely need to write code to integrate systems and handle complex tasks or incoming network requests.

Lambda

Lambda AWS Serverless Hardware

Edge Authentication and Token-Agnostic Identity Propagation

The Netflix TechBlog

FEBRUARY 9, 2021

The whole system was quite complex, and starting to become brittle. Plus, the architecture of the Edge tier was evolving to a PaaS (platform as a service) model, and we had some tough decisions to make about how, and where, to handle identity token handling. The API server orchestrates backend systems to authenticate the user.

Architecture

Architecture Latency Servers Website

Orbital edge computing: nano satellite constellations as a new class of computer system

The Morning Paper

OCTOBER 11, 2020

Orbital edge computing: nanosatellite constellations as a new class of computer system , Denby & Lucia, ASPLOS’20. Only space system architects don’t call it request-response, they call it a ‘ bent-pipe architecture.’. Nanosatellite systems have a GSD of around 3.0m/px. Satellites are changing! Physical constraints.

Systems

Systems Latency Architecture Energy

How Park ‘N Fly eliminated silos and improved customer experience with Dynatrace cloud monitoring

Dynatrace

APRIL 7, 2021

Organizations are rapidly adopting multicloud architectures to achieve the agility needed to drive customer success through new digital service channels. Park ‘N Fly’s business relies on successfully integrating its booking system with its custom-built kiosks located at its off-airport parking lots. “As

Cloud

Cloud Monitoring Latency Games

Data ingestion pipeline with Operation Management

The Netflix TechBlog

MARCH 7, 2023

But we cannot search or present low latency retrievals from files Etc. Marken Architecture Marken’s architecture diagram is as follows. Marken Architecture Marken’s architecture diagram is as follows. We do that by excluding the following from all queries in our system.

Media

Media Latency Architecture Database

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. This guide delves into how these systems work, the challenges they solve, and their essential role in businesses and technology.

Storage

Storage Systems Big Data Azure

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

Uptime Institute’s 2022 Outage Analysis report found that over 60% of system outages resulted in at least $100,000 in total losses, up from 39% in 2019. Microservices-based architectures and software containers enable organizations to deploy and modify applications with unprecedented speed. Make SLOs realistic.

Best Practices

Best Practices DevOps Latency Metrics

Engineering dependability and fault tolerance in a distributed system

High Scalability

FEBRUARY 19, 2021

This means a system that is not merely available but is also engineered with extensive redundant measures to continue to work as its users expect. Fault tolerance The ability of a system to continue to be dependable (both available and reliable) in the presence of certain component or subsystem failures.

Engineering

Engineering Systems Availability Scalability

Optimizing Database Performance in Middleware Applications

Rapid Event Notification System at Netflix

Trending Sources

Netflix’s Distributed Counter Abstraction

RabbitMQ vs. Kafka: Key Differences

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

Efficient Multimodal Data Processing: A Technical Deep Dive

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

Introducing Impressions at Netflix

Best Practices for Scaling RabbitMQ

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Architectural Insights: Designing Efficient Multi-Layered Caching With Instagram Example

Introducing Netflix’s Key-Value Data Abstraction Layer

Lessons learned from enterprise service-level objective management

Consistent caching mechanism in Titus Gateway

Optimizing your Kubernetes clusters without breaking the bank

Why applying chaos engineering to data-intensive applications matters

Implementing service-level objectives to improve software quality

Netflix Cloud Packaging in the Terabyte Era

Site reliability engineering: 5 things you need to know

What is observability? Not just logs, metrics and traces

Who will watch the watchers? Extended infrastructure observability for WSO2 API Manager

What is serverless computing? Driving efficiency without sacrificing observability

The Netflix Cosmos Platform

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Rebuilding Netflix Video Processing Pipeline with Microservices

Site reliability engineering: 5 things to you need to know

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

Introducing Netflix TimeSeries Data Abstraction Layer

Designing Instagram

Predictive CPU isolation of containers at Netflix

Improved Alerting with Atlas Streaming Eval

Observability vs. monitoring: What’s the difference?

Dynatrace accelerates business transformation with new AI observability solution

Observability platform vs. observability tools

Edgar: Solving Mysteries Faster with Observability

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

What is AWS Lambda?

Edge Authentication and Token-Agnostic Identity Propagation

Orbital edge computing: nano satellite constellations as a new class of computer system

How Park ‘N Fly eliminated silos and improved customer experience with Dynatrace cloud monitoring

Data ingestion pipeline with Operation Management

What is a Distributed Storage System

Site reliability done right: 5 SRE best practices that deliver on business objectives

Engineering dependability and fault tolerance in a distributed system

Stay Connected