Design, Hardware and Latency - Technology Performance Pulse

Seeing through hardware counters: a journey to threefold performance increase

The Netflix TechBlog

NOVEMBER 9, 2022

A quick canary test was free of errors and showed lower latency, which is expected given that our standard canary setup routes an equal amount of traffic to both the baseline running on 4xl and the canary on 12xl. What’s worse, average latency degraded by more than 50%, with both CPU and latency patterns becoming more “choppy.”

Hardware

Hardware Cache Performance Latency

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

RabbitMQ is designed for flexible routing and message reliability, while Kafka handles high-throughput event streaming and real-time data processing. Its design prioritizes high availability and efficient data transfer with minimal overhead, making it a practical choice for handling real-time data pipelines and distributed event processing.

Latency

Latency Analytics Architecture Storage

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

The architecture of RabbitMQ is meticulously designed for complex message routing, enabling dynamic and flexible interactions between producers and consumers. While clustering across wide-area networks (WANs) is discouraged due to latency issues, leased links can mitigate some connectivity challenges.

Best Practices

Best Practices Traffic Strategy Scalability

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Dynatrace

JUNE 25, 2020

The network latency between cluster nodes should be around 10 ms or less. Our Premium High Availability comes with the following features: Active-active deployment model for optimum hardware utilization. Save on costs for hardware and network bandwidth to optimize total cost of ownership. Self-contained turnkey solution.

Availability

Availability Hardware Latency Traffic

What is AWS Lambda?

Dynatrace

APRIL 5, 2021

This is where Lambda comes in: Developers can deploy programs with no concern for the underlying hardware, connecting to services in the broader ecosystem, creating APIs, preparing data, or sending push notifications directly in the cloud, to list just a few examples. AWS continues to improve how it handles latency issues.

Lambda

Lambda AWS Serverless Hardware

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

ITOps refers to the process of acquiring, designing, deploying, configuring, and maintaining equipment and services that support an organization’s desired business outcomes. Besides the traditional system hardware, storage, routers, and software, ITOps also includes virtual components of the network and cloud infrastructure.

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Use hardware-based encryption and ensure regular over-the-air updates to maintain device security. Increased latency during peak loads. Data interception during transit.

IoT

IoT Energy Logistics Latency

Towards a Reliable Device Management Platform

The Netflix TechBlog

AUGUST 30, 2021

Complementing the hardware is the software on the RAE and in the cloud, and bridging the software on both ends is a bi-directional control plane. When a new hardware device is connected, the Local Registry detects and collects a set of information about it, such as networking information and ESN. million elements.

Latency

Latency Traffic Transportation Cloud

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

It requires purchasing, powering, and configuring physical hardware, training and retaining the staff capable of servicing and securing the machines, operating a data center, and so on. They need enough hardware to serve their anticipated volume and keep things running smoothly without buying too much or too little. Reduced cost.

Cloud

Cloud Traffic Best Practices Strategy

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Today is a very exciting day as we release Amazon DynamoDB , a fast, highly reliable and cost-effective NoSQL database service designed for internet scale applications. Amazon DynamoDB offers low, predictable latencies at any scale. Comments ().

Scalability

Scalability Database Ecommerce Latency

Achieving 100Gbps intrusion prevention on a single server

The Morning Paper

NOVEMBER 15, 2020

With more nodes and more coordination comes more complexity, both in design and operation. This makes the whole system latency sensitive. One of the joys of this paper is that you don’t just get to see the final design, you also get insight into the forces and trade-offs that led to it. Back of the envelope.

Servers

Servers Hardware Latency Design

5.5 mm in 1.25 nanoseconds

Randon ASCII

JANUARY 12, 2022

That meant I started having regular meetings with the hardware engineers who were working with IBM on the CPU which gave me even more expertise on this CPU, which was critical in helping me discover a design flaw in one of its instructions , and in helping game developers master this finicky beast. So, anyway. Standard stuff.

Cache

Cache Latency Benchmarking Hardware

What is observability? Not just logs, metrics and traces

Dynatrace

OCTOBER 1, 2021

In these modern environments, every hardware, software, and cloud infrastructure component and every container, open-source tool, and microservice generates records of every activity. The architects and developers who create the software must design it to be observed. Benefits of observability.

Metrics

Metrics Open Source Monitoring Cloud

10 Lessons from 10 Years of Amazon Web Services

All Things Distributed

MARCH 11, 2016

This is a given, whether you are using the highest quality hardware or lowest cost components. Many of those failure scenarios can be anticipated beforehand, but many more are unknown at design and build time. We knew that designing APIs was a very important task as we’d only have one chance to get it right. your resource usage.

AWS

AWS Hardware Retail Virtualization

USENIX LISA2021 Computing Performance: On the Horizon

Brendan Gregg

JULY 4, 2021

This was a chance to talk about other things I've been working on, such as the present and future of hardware performance. The video is on [youtube]: The slides are on [slideshare] or as a [PDF]: I work on many areas of performance, but recently I've had a lot of demand to talk about BPF. Ford, et al., “TCP

Performance

Performance Latency Hardware Storage

Growth Engineering at Netflix?—?Automated Imagery Generation

The Netflix TechBlog

FEBRUARY 9, 2021

Before designing a solution it’s important to understand the main product requirements for such a feature: The content needs to be new, relevant, and regional (not all countries have the same catalogue). To reduce latency, assets should be generated in an offline fashion and not in real time.

Engineering

Engineering Storage Latency Entertainment

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

The Morning Paper

MAY 12, 2019

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems Gan et al., Microservices fundamentally change a lot of assumptions current cloud systems are designed with, and present both opportunities and challenges when optimizing for quality of service (QoS) and utilization.

Open Source

Open Source Hardware Benchmarking Systems

Crucial Redis Monitoring Metrics You Must Watch

Scalegrid

JANUARY 25, 2024

Key Takeaways Critical performance indicators such as latency, CPU usage, memory utilization, hit rate, and number of connected clients/slaves/evictions must be monitored to maintain Redis’s high throughput and low latency capabilities. Similarly, an increased throughput signifies an intensive workload on a server and a larger latency.

Metrics

Metrics Monitoring Latency Cache

Under the Hood of Amazon EC2 Container Service

All Things Distributed

JULY 20, 2015

To be robust and scalable, this key/value store needs to be distributed for durability and availability, to protect against network partitions or hardware failures. This architecture affords Amazon ECS high availability, low latency, and high throughput because the data store is never pessimistically locked.

Latency

Latency Architecture AWS Open Source

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

For example, the most fundamental abstraction trade-off has always been latency versus throughput. These trade-offs have even impacted the way the lowest level building blocks in our computer architectures have been designed. The throughput of this pipeline is more important than the latency of the individual operations.

AWS

AWS Programming Latency Architecture

Snap: a microkernel approach to host networking

The Morning Paper

NOVEMBER 10, 2019

It’s been clear for a while that software designed explicitly for the data center environment will increasingly want/need to make different design trade-offs to e.g. general-purpose systems software that you might install on your own machines. The desire for CPU efficiency and lower latencies is easy to understand.

Network

Network Transportation Latency Entertainment

How To Scale a Single-Host PostgreSQL Database With Citus

Percona

NOVEMBER 3, 2023

PostgreSQL Cluster One coordinator node citus-coord-01 Three worker nodes citus1 citus2 citus3 Hardware AWS Instance Ubuntu Server 20.04, SSD volume type 64-bit (x86) c5.xlarge And now, execute the benchmark: -- execute the following on the coordinator node pgbench -c 20 -j 3 -T 60 -P 3 pgbench The results are not pretty.

Database

Database Benchmarking Latency C++

Save Money in AWS RDS: Don’t Trust the Defaults

Percona

MAY 1, 2023

The innodb_io_capacity_max parameter was set to 2000, so the hardware should be able to deliver that many IOPS without major issues. ” Yes, this is true, io2 volumes are expensive, and honestly, I think they should be used only where really high IO capacity at expected latencies is required, and this didn’t seem to be the case.

AWS

AWS Hardware Storage Tuning

Distributed Algorithms in NoSQL Databases

Highly Scalable

SEPTEMBER 18, 2012

Historically, NoSQL paid a lot of attention to tradeoffs between consistency, fault-tolerance and performance to serve geographically distributed systems, low-latency or highly available applications. A database should accommodate itself to different data distributions, cluster topologies and hardware configurations. Data Placement.

Database

Database Latency C++ Scalability

RPCValet: NI-driven tail-aware balancing of µs-scale RPCs

The Morning Paper

MAY 19, 2019

Last week we learned about the [increased tail-latency sensitivity of microservices based applications with high RPC fan-outs. Seer uses estimates of queue depths to mitigate latency spikes on the order of 10-100ms, in conjunction with a cluster manager. So what we have here is a glimpse of the limits for low-latency RPCs under load.

Latency

Latency Hardware Network Architecture

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

All Things Distributed

JULY 14, 2015

In traditional database architectures, database engines often run a small search engine or data warehouse engines on the same hardware as the database. DynamoDB Streams simplifies and improves this design pattern with a distributed systems approach. DynamoDB Cross-region Replication.

Database

Database Lambda AWS IoT

Seamless offloading of web app computations from mobile device to edge clouds via HTML5 Web Worker migration

The Morning Paper

JANUARY 30, 2020

Edge servers are the middle ground – more compute power than a mobile device, but with latency of just a few ms. These use their regression models to estimate processing time (which will depend on the hardware available, current load, etc.). Why would we want to live migrate web workers?

Mobile

Mobile Cloud Latency Games

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Cluster Computer Instances for Amazon EC2 are a new instance type specifically designed for High Performance Computing applications. Other industries using Amazon EC2 for HPC-style workloads include pharmaceuticals, oil exploration, industrial and automotive design, media and entertainment, and more. until today. Recent Entries.

Cloud

Cloud AWS Automotive Latency

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Key Takeaways Distributed storage systems benefit organizations by enhancing data availability, fault tolerance, and system scalability, leading to cost savings from reduced hardware needs, energy consumption, and personnel. Variations within these storage systems are called distributed file systems.

Storage

Storage Systems Big Data Azure

Narrowing the gap between serverless and its state with storage functions

The Morning Paper

JANUARY 28, 2020

Shredder is " a low-latency multi-tenant cloud store that allows small units of computation to be performed directly within storage nodes. " From an operator perspective it makes it harder to follow the classic cloud-native design in which a global storage layer is separate to compute.

Serverless

Serverless Storage Latency Cloud

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

FEBRUARY 28, 2023

My personal opinion is that I don't see a widespread need for more capacity given horizontal scaling and servers that can already exceed 1 Tbyte of DRAM; bandwidth is also helpful, but I'd be concerned about the increased latency for adding a hop to more memory. Ford, et al., “TCP

Performance

Performance Latency Cache Virtualization

Millions of tiny databases

The Morning Paper

MARCH 3, 2020

It takes you through the thinking processes and engineering practices behind the design of a key part of the control plane for AWS Elastic Block Storage (EBS): the Physalia database that stores configuration information. This work is latency critical, because volume IO is blocked until it is complete. NSDI’20. Requirements.

Database

Database AWS Network Design

Trip report: February 2025 ISO C++ standards meeting (Hagenberg, Austria)

Sutter's Mill

FEBRUARY 17, 2025

On Saturday, the ISO C++ committee completed the second-last design meeting of C++26, held in Hagenberg, Austria. for more delightful background and design alternative discussion. This can create variable latency during iteration. Note that this feature has already been approved for inclusion in the next revision of C as well.

C++

C++ Programming Code Google

The AWS Storage Gateway - All Things Distributed

All Things Distributed

JANUARY 23, 2012

VM Import allows our customers to move virtual machine images from their datacenters to the Cloud and Amazon Direct Connect makes the network latencies and bandwidth between on-premises and AWS more predictable. AWS Identity and Access Management brings together on-premises and cloud identity management. s highly reliable storage environment.

Storage

Storage AWS Virtualization Cloud

A case for managed and model-less inference serving

The Morning Paper

JUNE 13, 2019

As we saw with the SOAP paper last time out, even with a fixed model variant and hardware there are a lot of different ways to map a training workload over the available hardware. The following figure highlights how just one of these variables, batch size, impacts throughput and latency on ResNet50.

Hardware

Hardware Latency Serverless Energy

The Performance Inequality Gap, 2021

Alex Russell

MARCH 6, 2021

Hardware Past As Performance Prologue. Regardless, the overall story for hardware progress remains grim, particularly when we recall how long device replacement cycles are: Tap for a larger version. Chip design choices and silicon economics are the defining feature of the still-growing Performance Inequality Gap. Mind The Gap.

Performance

Performance Network Mobile Metrics

ChatGPT vs. MySQL DBA Challenge

Percona

MAY 2, 2023

However, it’s important to note that the optimal value may vary depending on your specific workload and hardware configuration. The answer does not consider the queue or latency of the sample, which could indicate a disk with issues. 16) and monitoring the server’s performance. The answer could be better.

Social Media

Social Media Database Servers Cache

Automating chaos experiments in production

The Morning Paper

JULY 4, 2019

This is a fascinating paper from members of Netflix’s Resilience Engineering team describing their chaos engineering initiatives: automated controlled experiments designed to verify hypotheses about how the system should behave under gray failure conditions, and to probe for and flush out any weaknesses. Safeguards.

Latency

Latency Engineering Metrics Traffic

Why OpenStack is like a Crowdfunded Viking Movie

VoltDB

JUNE 1, 2017

Hardware Optimizers” want to get the maximum utilization out of hardware. These systems were designed to have a lifetime of half a decade or more, and rapidly changing hardware meant that the initial deployment had to be sized for 5-7 years out. Latency Optimizers” – need support for very large federated deployments.

Hardware

Hardware Latency Virtualization Cloud

File systems unfit as distributed storage backends: lessons from ten years of Ceph evolution

The Morning Paper

NOVEMBER 5, 2019

Breaking that assumption allowed Ceph to introduce a new storage backend called BlueStore with much better performance and predictability, and the ability to support the changing storage hardware landscape. But let’s take a quick look at the changing hardware landscape before we go on… The changing hardware landscape.

Storage

Storage Systems Hardware Efficiency

Software-defined far memory in warehouse scale computers

The Morning Paper

MAY 21, 2019

This boils down to a single digit µs latency toleration in the tail for far memory, and in addition to security and privacy concerns, rules out remote memory solutions. Thus we’re fundamentally trading (de)-compression latency at access time for the ability to pack more data in memory.

Software

Software Software Google Hardware

MongoDB Best Practices: Security, Data Modeling, & Schema Design

Percona

APRIL 17, 2023

The CFQ works well for many general use cases but lacks latency guarantees. The deadline excels at latency-sensitive use cases ( like databases ), and noop is closer to no schedule at all. On the other hand, MongoDB schema design takes a document-oriented approach. Two other schedulers are deadline and noop.

Best Practices

Best Practices Design Tuning Database

The evolution of single-core bandwidth in multicore processors

John McCalpin

APRIL 25, 2023

To understand what is happening here, we need to understand the way memory bandwidth interacts with memory latency and the concurrency (parallelism) of memory accesses. That is the topic of the next blog entry. (“Real Soon Now”) Can this problem be fixed?

Benchmarking

Benchmarking Cache Latency Tuning

Seeing through hardware counters: a journey to threefold performance increase

RabbitMQ vs. Kafka: Key Differences

Trending Sources

Best Practices for Scaling RabbitMQ

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

What is AWS Lambda?

Predictive CPU isolation of containers at Netflix

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Towards a Reliable Device Management Platform

What is cloud migration?

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

Achieving 100Gbps intrusion prevention on a single server

5.5 mm in 1.25 nanoseconds

What is observability? Not just logs, metrics and traces

10 Lessons from 10 Years of Amazon Web Services

USENIX LISA2021 Computing Performance: On the Horizon

Growth Engineering at Netflix?—?Automated Imagery Generation

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

Crucial Redis Monitoring Metrics You Must Watch

Under the Hood of Amazon EC2 Container Service

Amazon EC2 Cluster GPU Instances - All Things Distributed

Snap: a microkernel approach to host networking

How To Scale a Single-Host PostgreSQL Database With Citus

Save Money in AWS RDS: Don’t Trust the Defaults

Distributed Algorithms in NoSQL Databases

RPCValet: NI-driven tail-aware balancing of µs-scale RPCs

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

Seamless offloading of web app computations from mobile device to edge clouds via HTML5 Web Worker migration

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

What is a Distributed Storage System

Narrowing the gap between serverless and its state with storage functions

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Millions of tiny databases

Trip report: February 2025 ISO C++ standards meeting (Hagenberg, Austria)

The AWS Storage Gateway - All Things Distributed

A case for managed and model-less inference serving

The Performance Inequality Gap, 2021

ChatGPT vs. MySQL DBA Challenge

Automating chaos experiments in production

Why OpenStack is like a Crowdfunded Viking Movie

File systems unfit as distributed storage backends: lessons from ten years of Ceph evolution

Software-defined far memory in warehouse scale computers

MongoDB Best Practices: Security, Data Modeling, & Schema Design

The evolution of single-core bandwidth in multicore processors

Stay Connected