Cache and Hardware - Technology Performance Pulse

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

Distance-Based ISA for Efficient Register Management

ACM Sigarch

APRIL 2, 2025

To create a CPU core that can execute a large number of instructions in parallel, it is necessary to improve both the architecturewhich includes the overall CPU design and the instruction set architecture (ISA) designand the microarchitecture, which refers to the hardware design that optimizes instruction execution.

Efficiency

Efficiency Hardware Architecture Design

WiredTiger Logging and Checkpoint Mechanism

Percona

MARCH 28, 2023

” This acts as a step to ensure durability by recovering lost data from the same journal files in case of crashes, power, and hardware failures between the checkpoints (see below) Here’s what the process looks like. The same data, in the form of pages inside the Wiredtiger cache, are also marked dirty. wt and index-*.wt).

Hardware

Hardware C++ Storage Cache

Compress objects, not cache lines: an object-based compressed memory hierarchy

The Morning Paper

MAY 23, 2019

Compress objects, not cache lines: an object-based compressed memory hierarchy Tsai & Sanchez, ASPLOS’19. Existing cache and main memory compression techniques compress data in small fixed-size blocks, typically cache lines. Hotpads is a hardware-managed hierarchy of scratchpad-like memories called pads.

Cache

Cache Benchmarking Hardware Java

This spring: High-Performance and Low-Latency C++ (Stockholm) and ACCU (Bristol)

Sutter's Mill

FEBRUARY 13, 2017

I don’t get to Europe very often apart from ISO C++ standards meetings, but this spring I’ve been able to accept invitations for two English-language European events in the last week of April. Tue-Thu Apr 25-27: High-Performance and Low-Latency C++ (Stockholm).

Latency

Latency C++ Hardware Performance

Back-to-Basics Weekend Reading - A Decomposition Storage Model

All Things Distributed

SEPTEMBER 20, 2013

Not everybody agreed that the "N-ary Storage Model" (NSM) was the best approach for all workloads but it stayed dominant until hardware constraints, especially on caches, forced the community to revisit some of the alternatives. The first practical modern implementation is probably C-Store by Stonebraker, et al.

Storage

Storage Hardware Cache C++

HammerDB for Managers

HammerDB

JUNE 27, 2022

It enables the user to measure database performance and make comparative judgements about database hardware and software. These factors meant that often when looking for database performance information, the results for a particular combination of software and hardware were not available. Cached vs Scaled Workloads.

Benchmarking

Benchmarking Open Source C++ Cache

Using Parallel Query with Amazon Aurora for MySQL

Percona

JANUARY 17, 2019

On multi-core machines – which is the majority of the hardware nowadays – and in the cloud, we have multiple cores available for use. Aurora Parallel Query response time (for queries which can not use indexes) can be 5x-10x better compared to the non-parallel fully cached operations. The second and third run used the cached data.

Cache

Cache C++ AWS Airlines

File systems unfit as distributed storage backends: lessons from ten years of Ceph evolution

The Morning Paper

NOVEMBER 5, 2019

Breaking that assumption allowed Ceph to introduce a new storage backend called BlueStore with much better performance and predictability, and the ability to support the changing storage hardware landscape. But let’s take a quick look at the changing hardware landscape before we go on… The changing hardware landscape.

Storage

Storage Systems Hardware Efficiency

A thorough introduction to bpftrace

Brendan Gregg

AUGUST 18, 2019

C @ns: [256, 512) 10900 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [512, 1k) 18291 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [1k, 2k) 4998 |@@@@@@@@@@@@@@ | [2k, 4k) 57 | | [4k, 8k) 117 | | [8k, 16k) 48 | | [16k, 32k) 109 | | [32k, 64k) 3 | |. hardware Hardware counter-based instrumentation. Hit Ctrl-C to end. ^C

Latency

Latency C++ Cache Programming

Solving Common Cross-Platform Issues When Working With Flutter

Smashing Magazine

JUNE 18, 2020

Flutter isn’t that, though: it runs natively on each platform, and it means each app runs just like it would run if it were written in Java/Kotlin or Objective-C/Swift on Android and iOS, pretty much. You need to know that because this implies that you need to take care of the many differences between these very diverse platforms.

Storage

Storage Mobile Website Java

Compiler bug? Linker bug? Windows Kernel bug.

Randon ASCII

FEBRUARY 25, 2018

In this particular investigation, which spanned twenty months, we suspected hardware failure, compiler bugs, linker bugs, and other possibilities. Jumping too quickly to blaming hardware or build tools is a classic mistake, but in this case the mistake was that we weren’t thinking big enough. failure rate.

Programming

Programming Hardware Cache Code

PostgreSQL Performance Tuning: Optimizing Database Parameters for Maximum Efficiency

Percona

MAY 1, 2023

Key areas include: Configuration parameter tuning : This tuning involves altering variables such as memory allocation, disk I/O settings, and concurrent connections based on specific hardware and requirements. This not only results in cost savings by minimizing hardware requirements but also has the potential to decrease cloud expenses.

Tuning

Tuning Database Efficiency Performance

A persistent problem: managing pointers in NVM

The Morning Paper

DECEMBER 8, 2019

Byte-addressable non-volatile memory,) NVM will fundamentally change the way hardware interacts, the way operating systems are designed, and the way applications operate on data. The beauty of persistent memory is that we can use memory layouts for persistent data (with some considerations for volatile caches etc. What about security?

Hardware

Hardware Programming Media Storage

How to Assess MySQL Performance

HammerDB

APRIL 19, 2023

This all sounded very similar to HammerDB TPROC-C workload (we will look at HammerDB TPROC-H (OLAP) another time), so it is easy for any reader to think both workloads are about the same. By default, HammerDB is designed to take advantage of database system caching mechanisms such as buffer caches, query caches, or statement caches.

Performance

Performance Benchmarking Cache Storage

Intel discloses “vector+SIMD” instructions for future processors

John McCalpin

NOVEMBER 5, 2016

So consider the dense matrix multiplication operation C += A*B, where A, B, and C are dense square matrices of order N, and the matrix multiplication operation is equivalent to the pseudo-code: for (i=0; i<N; i++) { for (j=0; j<N; j++) { for (k=0; k<N; k++) { C[i][j] += A[i][k] * B[k][j]; } } }.

Cache

Cache C++ Latency Hardware

Use Parallel Analysis – Not Parallel Query – for Fast Data Access and Scalable Computing Power

ScaleOut Software

JULY 27, 2018

Looking beyond distributed caching, it’s their ability to perform data-parallel analysis that gives IMDGs such exciting capabilities. Application developers often deploy IMDGs as a distributed cache that sits between an application and its database; the IMDG offloads ephemeral data from the database.

Scalability

Scalability Cache Airlines Ecommerce

Use Parallel Analysis – Not Parallel Query – for Fast Data Access and Scalable Computing Power

ScaleOut Software

JULY 27, 2018

Looking beyond distributed caching, it’s their ability to perform data-parallel analysis that gives IMDGs such exciting capabilities. Application developers often deploy IMDGs as a distributed cache that sits between an application and its database; the IMDG offloads ephemeral data from the database.

Scalability

Scalability Cache Airlines Ecommerce

Efficient lock-free durable sets

The Morning Paper

DECEMBER 1, 2019

Memory might be durable, but… …it is expected that caches and registers will remain volatile. Plus ça change, plus c’est la même chose. So, we’re going to need to take care that everything we say is committed is truly durable, and that we can recover to a consistent state following a crash.

Efficiency

Efficiency Cache C++ Performance

Seeing through hardware counters: a journey to threefold performance increase

The Netflix TechBlog

NOVEMBER 9, 2022

We also see much higher L1 cache activity combined with 4x higher count of MACHINE_CLEARS. a usage pattern occurring when 2 cores reading from / writing to unrelated variables that happen to share the same L1 cache line. Cache line is a concept similar to memory page?—? Thread 0’s cache in this example.

Hardware

Hardware Cache Performance Latency

Establishing software root of trust unconditionally

The Morning Paper

APRIL 2, 2019

In this context it means there are no external dependencies on e.g. secrets, trusted hardware modules, or special instructions (e.g. A system comprises c connected devices, where device i has random access memory and processor registers. The word “unconditionally” from the paper title is also highly significant here.

Software

Software Software Programming Hardware

Declarative recursive computation on an RDBMS

The Morning Paper

SEPTEMBER 12, 2019

SQL provides a declarative programming interface, below which the system itself can figure out the most effective execution plans based on data size and statistics, layout, compute hardware etc. The implementations require both SQL and also some UDFs written in C++. Be careful what you ask for (materialize).

Network

Network Database Programming Hardware

Distributed Algorithms in NoSQL Databases

Highly Scalable

SEPTEMBER 18, 2012

A database should accommodate itself to different data distributions, cluster topologies and hardware configurations. C) In the previous schema, failures can be handled better using the hinted handoff technique [8]. Consider an example: there are 3 nodes A, B, and C and increment operation was applied 3 times, once per node.

Database

Database Latency C++ Scalability

From Heavy Metal to Irrational Exuberance

ACM Sigarch

OCTOBER 12, 2020

These, let’s call them metal languages , include FORTRAN (introduced in 1957), C (1972), and C++ (1985). Programmers continue to write applications in them, and they continue to evolve: the just approved C++20 standard is the latest example. Are caches large enough for this code? As Leiserson et al.

C++

C++ Benchmarking Hardware Architecture

SQL Server On Linux: Forced Unit Access (Fua) Internals

SQL Server According to Bob

DECEMBER 18, 2018

Device level flushing may have an impact on your I/O caching, read ahead or other behaviors of the storage system. FILE_FLAG_NO_BUFFERING is the Win32, CreateFile API flags and attributes setting to bypass file system cache. FILE_FLAG_NO_BUFFERING is the Win32, CreateFile API flags and attributes setting to bypass file system cache.

Servers

Servers Media Cache Storage

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

The Morning Paper

MAY 12, 2019

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems Gan et al., The paper examines the implications of microservices at the hardware, OS and networking stack, cluster management, and application framework levels, as well as the impact of tail latency. ASPLOS’19.

Open Source

Open Source Hardware Benchmarking Systems

HTTP/3: Performance Improvements (Part 2)

Smashing Magazine

AUGUST 22, 2021

Some examples of the latter are heavily cached websites, as well as single-page apps that periodically fetch small updates via APIs and other protocols such as DNS-over-QUIC. Let’s contemplate what would happen if A, B, and C were all render-blocking resources. This multiplexing can happen in many different ways. Large preview ).

Performance

Performance Network Latency Servers

SQL Server I/O Basics Chapter #2

SQL Server According to Bob

JANUARY 11, 2020

Character POS ASCII Value Formula Value A 1 65 67 C 2 67 69 Checksum 136 Comparing the checksum values indicates that the values do not match and damage has occurred to the data.

Servers

Servers Cache Database Media

Front-End Performance Checklist 2021

Smashing Magazine

JANUARY 11, 2021

Build Optimizations JavaScript modules, module/nomodule pattern, tree-shaking, code-splitting, scope-hoisting, Webpack, differential serving, web worker, WebAssembly, JavaScript bundles, React, SPA, partial hydration, import on interaction, 3rd-parties, cache. This improves page load time and caching during navigations.

Performance

Performance Cache Media Metrics

Front-End Performance Checklist 2020 [PDF, Apple Pages, MS Word]

Smashing Magazine

JANUARY 6, 2020

On the other hand, we have hardware constraints on memory and CPU due to JavaScript parsing times (we’ll talk about them in detail later). Gatsby (React), Vuepress (Vue) Preact CLI , and PWA Starter Kit provide reasonable defaults for fast loading out of the box on average mobile hardware. ??Also,

Performance

Performance Cache Servers Network

Front-End Performance Checklist 2019 [PDF, Apple Pages, MS Word]

Smashing Magazine

JANUARY 7, 2019

On the other hand, we have hardware constraints on memory and CPU due to JavaScript parsing times (we’ll talk about them in detail later). Gatsby.js (React), Preact CLI , and PWA Starter Kit provide reasonable defaults for fast loading out of the box on average mobile hardware. Image credit: Addy Osmani ) ( Large preview ).

Performance

Performance Cache Network Metrics

Egnyte Architecture: Lessons learned in building and scaling a multi petabyte content platform

High Scalability

NOVEMBER 25, 2019

Edge caching. In general, Egnyte connect architecture shards and caches data at different levels based on: Amount of data. Nginx for disk based caching. We use different types of caching techniques depending on the problem statements. Disk based caching. Hybrid Sync. On prem data processing. Offline access.

Architecture

Architecture Cache Azure Storage

Technology Performance Pulse

Predictive CPU isolation of containers at Netflix

Distance-Based ISA for Efficient Register Management

Trending Sources

WiredTiger Logging and Checkpoint Mechanism

Compress objects, not cache lines: an object-based compressed memory hierarchy

This spring: High-Performance and Low-Latency C++ (Stockholm) and ACCU (Bristol)

Back-to-Basics Weekend Reading - A Decomposition Storage Model

HammerDB for Managers

Using Parallel Query with Amazon Aurora for MySQL

File systems unfit as distributed storage backends: lessons from ten years of Ceph evolution

A thorough introduction to bpftrace

Solving Common Cross-Platform Issues When Working With Flutter

Compiler bug? Linker bug? Windows Kernel bug.

PostgreSQL Performance Tuning: Optimizing Database Parameters for Maximum Efficiency

A persistent problem: managing pointers in NVM

How to Assess MySQL Performance

Intel discloses “vector+SIMD” instructions for future processors

Use Parallel Analysis – Not Parallel Query – for Fast Data Access and Scalable Computing Power

Use Parallel Analysis – Not Parallel Query – for Fast Data Access and Scalable Computing Power

Efficient lock-free durable sets

Seeing through hardware counters: a journey to threefold performance increase

Establishing software root of trust unconditionally

Declarative recursive computation on an RDBMS

Distributed Algorithms in NoSQL Databases

From Heavy Metal to Irrational Exuberance

SQL Server On Linux: Forced Unit Access (Fua) Internals

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

HTTP/3: Performance Improvements (Part 2)

SQL Server I/O Basics Chapter #2

Front-End Performance Checklist 2021

Front-End Performance Checklist 2020 [PDF, Apple Pages, MS Word]

Front-End Performance Checklist 2019 [PDF, Apple Pages, MS Word]

Egnyte Architecture: Lessons learned in building and scaling a multi petabyte content platform

Stay Connected