Architecture, Cache and Hardware - Technology Performance Pulse

AWS serverless services: Exploring your options

Dynatrace

OCTOBER 7, 2021

To get a better understanding of AWS serverless, we’ll first explore the basics of serverless architectures, review AWS serverless offerings, and explore common use cases. Serverless architecture: A primer. Serverless architecture shifts application hosting functions away from local servers onto those managed by providers.

Serverless

Serverless AWS Lambda Storage

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

John McCalpin

FEBRUARY 17, 2025

“Latency” is the duration from the execution of a load instruction (to an address that misses in all the caches), and the completion of that load instruction when the data is returned from memory. GB/s peak DRAM bandwidth, requiring 6 concurrent 64-byte cache line accesses to be pending at all times to maintain full bandwidth.

Latency

Latency Hardware Cache Systems

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

On-premises data centers invest in higher capacity servers since they provide more flexibility in the long run, while the procurement price of hardware is only one of many cost factors. Of the organizations in the Kubernetes survey, 71% run databases and caches in Kubernetes, representing a +48% year-over-year increase.

Open Source

Open Source Java Operating System Programming

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

Distance-Based ISA for Efficient Register Management

ACM Sigarch

APRIL 2, 2025

To create a CPU core that can execute a large number of instructions in parallel, it is necessary to improve both the architecturewhich includes the overall CPU design and the instruction set architecture (ISA) designand the microarchitecture, which refers to the hardware design that optimizes instruction execution.

Efficiency

Efficiency Hardware Architecture Design

How to Optimize Digital Experience and Operations with Dynatrace

Dynatrace

AUGUST 30, 2019

Reducing CPU Utilization to now only consume 15% of initially provisioned hardware. Missing Cache Settings – Make sure you cache resources that don’t change often on the browser or use a CDN. Reducing performance and architectural issues in their backend system gave them a 99% performance improvement!

Cache

Cache Database Architecture Government

Helping VFX studios pave a path to the cloud

The Netflix TechBlog

NOVEMBER 15, 2022

Rendering is the final step in the VFX creation process, and processing on a render farm often can take several hours to complete just a single frame of a show, even when this process runs on the latest high-end hardware. Rendering on AWS provides the flexibility to control how quickly a project is completed.

Cloud

Cloud Entertainment AWS Infrastructure

Why Do We Need the Volatile Keyword?

DZone

MAY 19, 2019

Even if my application runs in the cloud on the JVM, despite all of those software layers abstracting away the underlying hardware, the volatile keyword is still needed due to the cache of the processor that my software runs on. The Volatile Keyword and the Cache of Modern Processors.

Cache

Cache Hardware Architecture Cloud

The Winds of Architecture Changes at the USENIX ATC 2019

ACM Sigarch

NOVEMBER 1, 2019

This blog post gives a glimpse of the computer systems research papers presented at the USENIX Annual Technical Conference (ATC) 2019, with an emphasis on systems that use new hardware architectures. As a consequence, the vast majority of the papers in the past has usually focused on conventional X86 or GPU-accelerated architectures.

Architecture

Architecture Hardware Cache Storage

MICRO 2019 Trip Report

ACM Sigarch

NOVEMBER 4, 2019

The technical program, put together by program chairs Tor Aamodt and Reetuparna Das , showcased key innovations across a wide range of computer architecture topics, from domain-specific accelerators to in/near-memory computing and from security to quantum computing. . This year’s MICRO had three inspiring keynote talks.

Hardware

Hardware Architecture Programming Innovation

The Ultimate Guide to Database High Availability

Percona

JUNE 22, 2023

Defining high availability In general terms, high availability refers to the continuous operation of a system with little to no interruption to end users in the event of hardware or software failures, power outages, or other disruptions. If a primary server fails, a backup server can take over and continue to serve requests.

Availability

Availability Database Open Source Hardware

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

All Things Distributed

JULY 14, 2015

In this blog post, I will explain how these three new capabilities empower you to build applications with distributed systems architecture and create responsive, reliable, and high-performance applications using DynamoDB that work at any scale. The point-of-sales system records changes from all the purchases and stores them in DynamoDB.

Database

Database Lambda AWS IoT

Building an elastic query engine on disaggregated storage

The Morning Paper

MARCH 8, 2020

When I think about cloud-native architectures, I think about disaggregation (enabling each resource type to scale independently), fine-grained units of resource allocation (enabling rapid response to changing workload demands, i.e. elasticity), and isolation (keeping tenants apart). From shared-nothing to disaggregation.

Storage

Storage Engineering Cache Serverless

Architectural Myopia

ACM Sigarch

FEBRUARY 23, 2018

I had a professor in grad school who used to joke that all architecture is reinvented every 5 years. Both virtualization and power burst onto the architecture community seemingly out of nowhere even though there was a clear historical basis and trend for both. We believed existing hardware and OS protocols protected the processor.

Architecture

Architecture Cache Hardware Virtualization

Hierarchical Navigation and Faceted Search on Top of Oracle Coherence

Highly Scalable

APRIL 2, 2012

From the business logic point of view, this was a pretty typical eCommerce service for hierarchical and faceted navigation, although not without peculiarities, but high performance requirements led us to the quite advanced architecture and technical design. So, the only way was to cache all necessary data to minimize interaction with RDBMS.

Ecommerce

Ecommerce Cache Storage Architecture

Use Distributed Caching to Accelerate Online Web Sites

ScaleOut Software

APRIL 22, 2020

The Solution: Distributed Caching. A widely used technology called distributed caching meets this need by storing frequently accessed data in memory on a server farm instead of within a database. It’s not enough simply to lash together a set of servers hosting a collection of in-memory caches.

Cache

Cache Storage Servers Database

Use Distributed Caching to Accelerate Online Web Sites

ScaleOut Software

APRIL 22, 2020

The Solution: Distributed Caching. A widely used technology called distributed caching meets this need by storing frequently accessed data in memory on a server farm instead of within a database. It’s not enough simply to lash together a set of servers hosting a collection of in-memory caches.

Cache

Cache Storage Servers Database

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Key Takeaways Distributed storage systems benefit organizations by enhancing data availability, fault tolerance, and system scalability, leading to cost savings from reduced hardware needs, energy consumption, and personnel. This strategy reduces the volume needed during retrieval operations.

Storage

Storage Systems Big Data Azure

Understanding operational 5G: a first measurement study on its coverage, performance and energy consumption

The Morning Paper

OCTOBER 4, 2020

Our analysis suggests that the wireline paths, upper-layer protocols, computing and radio hardward architecture need to co-evolve with 5G to form an ecosystem, in order to fully unleash its potential. This is a feature of the NSA architecture which requires dropping off of 5G onto 4G, doing a handover on 4G, and then upgrading to 5G again.

Energy

Energy Latency Performance Network

Designing far memory data structures: think outside the box

The Morning Paper

JUNE 25, 2019

This paper is all about the design of efficient data structures for far-memory, which turns out to have consequences reaching all the way down to the hardware. A far memory data structure has: far data in far memory, containing the core content of the data structure data caches at clients algorithms for operations. Refreshable vectors.

Design

Design Cache Hardware Scalability

Redis® Monitoring Strategies for 2025

Scalegrid

JANUARY 21, 2025

With its widespread use in modern application architectures, understanding the ins and outs of Redis monitoring is essential for any tech professional. To monitor Redis instances effectively, collect Redis metrics focusing on cache hit ratio, memory allocated, and latency threshold.

Strategy

Strategy Monitoring Latency DevOps

How To Develop Your Business’ Technology Roadmap

Smashing Magazine

AUGUST 15, 2021

Cache Merril. Companies can use technology roadmaps to review their internal IT , DevOps, infrastructure, architecture, software, internal system, and hardware procurement policies and procedures with innovation and efficiency in mind. How To Develop Your Business’ Technology Roadmap. 2021-08-16T06:55:00+00:00.

Technology

Technology Technology Development Strategy

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

This architectural pattern was a response to the scaling challenges that had challenged Amazon.com through its first 5 years, when direct database access was one of the major bottlenecks in scaling and operating the business. Most importantly, direct database access to the data from outside its respective service is not allowed.

Scalability

Scalability Database Ecommerce Latency

Redis® Monitoring Strategies for 2024

Scalegrid

DECEMBER 21, 2023

With its widespread use in modern application architectures, understanding the ins and outs of Redis® monitoring is essential for any tech professional. To monitor Redis® instances effectively, collect Redis metrics focusing on cache hit ratio, memory allocated, and latency threshold.

Strategy

Strategy Monitoring Latency DevOps

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Brendan Gregg

FEBRUARY 28, 2023

Make sure your system can handle next-generation DRAM,” [link] Nov 2011 - [Hruska 12] Joel Hruska, “The future of CPU scaling: Exploring options on the cutting edge,” [link] Feb 2012 - [Gregg 13] Brendan Gregg, “Blazing Performance with Flame Graphs,” [link] 2013 - [Shimpi 13] Anand Lal Shimpi, “Seagate to Ship 5TB HDD in 2014 using Shingled Magnetic (..)

Performance

Performance Latency Cache Virtualization

Key Advantages of DBMS for Efficient Data Management

Scalegrid

JANUARY 5, 2024

These can be mitigated through the implementation of: efficient query optimization caching of database queries utilization of database indexes implementation of session storage employing database read replication and sharding.

Efficiency

Efficiency Storage Database Scalability

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

ACM Sigarch

MAY 31, 2023

Introduction Memory systems are evolving into heterogeneous and composable architectures. There are three common mechanisms to access remote memory: modifying applications, modifying virtual memory, and hardware-level cache coherence support. About CXL hardware availability with academia. Using emulation (e.g.

Latency

Latency Hardware Cache Architecture

How Google PageSpeed Works: Improve Your Score and Search Engine Ranking

CSS - Tricks

JULY 25, 2019

Cache-Headers missing? If you’re interested in a high-level overview of Lighthouse architecture, read this guide from the official repository. Service workers that will cache the bytecode result of a parsed and compiled script. After that, it’ll be mitigated by cache. What changed in PageSpeed 5.0?

Google

Google Engineering Speed Mobile

Time protection: the missing OS abstraction

The Morning Paper

APRIL 14, 2019

The paper sets out what we can do in software given today’s hardware, and along the way also highlights areas where cooperation from hardware will be needed in the future. cache) can be partitioned across domains; for those that are instead time-multiplexed, we have to flush them during domain switches. Threat scenarios.

Hardware

Hardware Cache Latency Speed

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

In 2018, we anticipate that ETL will either lose relevance or the ETL process will disintegrate and be consumed by new data architectures. Unified data management architecture. Apache Arrow's in-memory columnar layout is specifically optimized for data locality for better performance on modern hardware like CPUs and GPUs.

Big Data

Big Data Artificial Intelligence Storage Hardware

This spring: High-Performance and Low-Latency C++ (Stockholm) and ACCU (Bristol)

Sutter's Mill

FEBRUARY 13, 2017

.” This contains updated and new material that reflects the latest C++ standards and compilers, with a focus to using modern C++11/14/17 effectively on modern hardware and memory architectures. Note that the class size is limited to about 100, so that I’ll be able to interact with most attendees directly.

Latency

Latency C++ Hardware Performance

Updated Azure SQL Database Tier Options

SQL Performance

APRIL 27, 2020

Gen 5 is the primary hardware option now for most regions since Gen 4 is aging out. Hyperscale achieves high performance from each compute node having SSD-based caches which helps minimize the network round trips to fetch data. New Hardware Configuration for Provisioned Compute Tier. GB per vCore.

Azure

Azure Database Serverless Hardware

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

John McCalpin

FEBRUARY 17, 2025

“Latency” is the duration from the execution of a load instruction (to an address that misses in all the caches), and the completion of that load instruction when the data is returned from memory. GB/s peak DRAM bandwidth, requiring 6 concurrent 64-byte cache line accesses to be pending at all times to maintain full bandwidth.

Latency

Latency Hardware Cache Systems

The evolution of single-core bandwidth in multicore processors

John McCalpin

APRIL 25, 2023

For most high-end processors these values have remained in the range of 75% to 85% of the peak DRAM bandwidth of the system over the past 15-20 years — an amazing accomplishment given the increase in core count (with its associated cache coherence issues), number of DRAM channels, and ever-increasing pipelining of the DRAMs themselves.

Benchmarking

Benchmarking Cache Latency Tuning

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The whole point of this section is that all the algorithms above can be naturally implemented using a message passing architectural style i.e. the query execution engine can be considered as a distributed network of nodes connected by the messaging queues. Marz, “Big Data Lambda Architecture”. Jacobsen and R. Lyons, “The Sliding DFT“.

Big Data

Big Data Processing Lambda Database

Why I hate MPI (from a performance analysis perspective)

John McCalpin

AUGUST 1, 2018

According to Dr. Bandwidth, performance analysis has two recurring themes: How fast should this code (or “simple” variations on this code) run on this hardware? The user environment defines the mapping of MPI ranks to hardware resources (cores, sockets, nodes). The MPI runtime library. in ways that are seldom transparent.

Hardware

Hardware Transportation Performance Latency

Why I hate MPI (from a performance analysis perspective)

John McCalpin

AUGUST 1, 2018

According to Dr. Bandwidth, performance analysis has two recurring themes: How fast should this code (or “simple” variations on this code) run on this hardware? The user environment defines the mapping of MPI ranks to hardware resources (cores, sockets, nodes). The MPI runtime library. in ways that are seldom transparent.

Hardware

Hardware Transportation Performance Latency

RUM vs APM

KeyCDN

JANUARY 23, 2020

A wide range of users with different operating systems, browsers, hardware configurations and other variables provides a wide sample size that helps developers discover as many issues as possible. Teams can measure the performance of all application dependencies, including databases, web services, caching, and more. Usage performance.

Strategy

Strategy Metrics Monitoring Servers

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

The Morning Paper

MAY 14, 2019

Last time around we looked at the DeathStarBench suite of microservices-based benchmark applications and learned that microservices systems can be especially latency sensitive, and that hotspots can propagate through a microservices architecture in interesting ways. When available, it can use hardware level performance counters.

Big Data

Big Data Cloud Performance Hardware

Emerging Fault Modes: Challenges and Research Opportunities

ACM Sigarch

JULY 17, 2023

An example of a specification is the correct operation of the hardware of a microprocessor. An SDC is the worst possible outcome of a fault, as it can have an arbitrary impact on the correctness of software running on the hardware. Vilas Sridharan is an AMD Senior Fellow and leads the RAS Architecture team. All rights reserved.

Energy

Energy Hardware Best Practices Architecture

Can You Afford It?: Real-world Web Performance Budgets

Alex Russell

OCTOBER 22, 2017

One distinct trend is a belief that a JavaScript framework and Single-Page Architecture (SPA) is a must for PWA development. Add onto that the yawning chasm between low-end and high-end device performance thanks to chip design factors like cache sizes, and it can be difficult to know where to set a device baseline.

Performance

Performance Network Benchmarking Mobile

SQL Server I/O Basics Chapter #1

SQL Server According to Bob

JANUARY 11, 2020

Stable media is commonly physical disk storage, but other devices and certain caching facilities qualify as well. Many high-end disk subsystems provide high-speed cache facilities to reduce the latency of read and write operations. This cache is often supported by a battery-powered backup facility.

Servers

Servers Cache Media Hardware

Intel discloses “vector+SIMD” instructions for future processors

John McCalpin

NOVEMBER 5, 2016

The art and science of microprocessor architecture is a never-ending struggling to balance complexity, verifiability, usability, expressiveness, compactness, ease of encoding/decoding, energy consumption, backwards compatibility, forwards compatibility, and other factors.

Cache

Cache C++ Latency Hardware

Deep dive into NVIDIA Blackwell Benchmarks — where does the 4x training and 30x inference…

Adrian Cockcroft

SEPTEMBER 23, 2024

The benchmarks are documented in the Blackwell Architecture Technical Brief and some screenshots of the GTC keynote, and Ill break those out and try to explain whats really going on from a benchmarketing approach. Up to 256 GH200 modules can be connected using a shared memory architecture rather than Infiniband.

Benchmarking

Benchmarking Energy Architecture Latency

AWS serverless services: Exploring your options

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

Trending Sources

Kubernetes in the wild report 2023

Predictive CPU isolation of containers at Netflix

Distance-Based ISA for Efficient Register Management

How to Optimize Digital Experience and Operations with Dynatrace

Helping VFX studios pave a path to the cloud

Why Do We Need the Volatile Keyword?

The Winds of Architecture Changes at the USENIX ATC 2019

MICRO 2019 Trip Report

The Ultimate Guide to Database High Availability

Embrace event-driven computing: Amazon expands DynamoDB with streams, cross-region replication, and database triggers

Building an elastic query engine on disaggregated storage

Architectural Myopia

Hierarchical Navigation and Faceted Search on Top of Oracle Coherence

Use Distributed Caching to Accelerate Online Web Sites

Use Distributed Caching to Accelerate Online Web Sites

What is a Distributed Storage System

Understanding operational 5G: a first measurement study on its coverage, performance and energy consumption

Designing far memory data structures: think outside the box

Redis® Monitoring Strategies for 2025

How To Develop Your Business’ Technology Roadmap

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

Redis® Monitoring Strategies for 2024

USENIX SREcon APAC 2022: Computing Performance: What's on the Horizon

Key Advantages of DBMS for Efficient Data Management

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

How Google PageSpeed Works: Improve Your Score and Search Engine Ranking

Time protection: the missing OS abstraction

5 data integration trends that will define the future of ETL in 2018

This spring: High-Performance and Low-Latency C++ (Stockholm) and ACCU (Bristol)

Updated Azure SQL Database Tier Options

Single-core memory bandwidth: Latency, Bandwidth, and Concurrency

The evolution of single-core bandwidth in multicore processors

In-Stream Big Data Processing

Why I hate MPI (from a performance analysis perspective)

Why I hate MPI (from a performance analysis perspective)

RUM vs APM

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

Emerging Fault Modes: Challenges and Research Opportunities

Can You Afford It?: Real-world Web Performance Budgets

SQL Server I/O Basics Chapter #1

Intel discloses “vector+SIMD” instructions for future processors

Deep dive into NVIDIA Blackwell Benchmarks — where does the 4x training and 30x inference…

Stay Connected