2018 and Cache - Technology Performance Pulse

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

At inference time, when multi-step decoding is needed, we can deploy KV caching to efficiently reuse past computations and maintain lowlatency. McAuley, Self-Attentive Sequential Recommendation, 2018 IEEE International Conference on Data Mining (ICDM) , Singapore, 2018, pp. Kang and J. 197206, doi: 10.1109/ICDM.2018.00035.

Tuning

Tuning Efficiency Latency Strategy

Fundamentals of Table Expressions, Part 12 – Inline Table-Valued Functions

SQL Performance

OCTOBER 13, 2021

I also compare them with stored procedures, mainly focusing on differences in terms of default optimization strategy, and plan caching and reuse behavior. The main minus of parameter embedding optimization is you don’t get efficient plan caching and reuse behavior like you do for parameterized plans. plan_handle , Q. SELECT TOP (@n).WHERE

Cache

Cache Strategy C++ Code

Bringing Rich Experiences to Memory-constrained TV Devices

The Netflix TechBlog

JULY 1, 2019

In a previous post , we described how our TV application consists of a C++ SDK installed natively on the device, an updatable JavaScript user interface (UI) layer, and a custom rendering layer known as Gibbon. Our UI runs on top of a custom rendering engine which uses what we call a “surface cache” to optimize our use of graphics memory.

Cache

Cache Design Testing Development

Analyzing a High Rate of Paging

Brendan Gregg

AUGUST 29, 2021

1072-aws (xxx) 12/18/2018 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5.03 1072-aws (xxx) 12/18/2018 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5.03 Reads usually have apps waiting on them; writes may not (write-back caching). Hit Ctrl-C to end. ^C Hit Ctrl-C to end. ^C

Cache

Cache C++ AWS Java

Using Parallel Query with Amazon Aurora for MySQL

Percona

JANUARY 17, 2019

Aurora Parallel Query response time (for queries which can not use indexes) can be 5x-10x better compared to the non-parallel fully cached operations. The second and third run used the cached data. It does not use any cache (ie: innodb buffer pool) either. This query is 100% cached. Test data and versions.

Cache

Cache C++ AWS Airlines

What bugs cause cloud production incidents?

The Morning Paper

JUNE 20, 2019

In total, there were 112 such incidents over the period March – September 2018 (not all of them affecting external customers). Most Azure code is written in.Net managed languages such as C#, reducing memory leak bugs. Tools like CHESS and PCT [[link] are used to expose shared-memory concurrency bugs.

Cloud

Cloud Azure Cache C++

Speeding up Linux kernel builds with ccache

Nick Desaulniers

JUNE 2, 2018

ccache , the compiler cache, is a fantastic way to speed up build times for C and C++ code that I previously recommended. Usually when this happens with ccache, there’s something non-deterministic about the builds that prevents cache hits. . $ Sat Mar 17 03:04:59 UTC 2018. Cold Cache. Hot Cache.

Speed

Speed Cache C++ Systems

Compiler bug? Linker bug? Windows Kernel bug.

Randon ASCII

FEBRUARY 25, 2018

See the end of the post for an October 2018 bug fix update, or read the whole story: Flaky failures are the worst. This was starting to look like a Windows file cache bug. Maybe something to do with multi-socket coherency of the disk and cache or ??? Update, October 2018. failure rate.

Programming

Programming Hardware Cache Code

A thorough introduction to bpftrace

Brendan Gregg

AUGUST 18, 2019

C @ns: [256, 512) 10900 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [512, 1k) 18291 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [1k, 2k) 4998 |@@@@@@@@@@@@@@ | [2k, 4k) 57 | | [4k, 8k) 117 | | [8k, 16k) 48 | | [16k, 32k) 109 | | [32k, 64k) 3 | |. . ### Source Here's the code to biolatency.bt: tools# cat -n biolatency.bt. Attaching 3 probes.

Latency

Latency C++ Cache Programming

GotW #98 Solution: Assertion levels (Difficulty: 5/10)

Sutter's Mill

JANUARY 25, 2021

Not only is the integer comparison operation cheap, but min and max are already being accessed by this function and so they’re already “hot” in registers or cache. P0542: Support for contract based programming in C++” (WG21 paper, June 2018). b) arbitrarily expensive “A condition that’s expensive? Dos Reis, J. Meredith, N.

C++

C++ Programming Cache Testing

Analyzing a High Rate of Paging

Brendan Gregg

AUGUST 29, 2021

1072-aws (xxx) 12/18/2018 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5.03 100.00 ^C I'm looking at the r_await column in particular: the average wait time for reads in milliseconds. . ## 2. 1072-aws (xxx) 12/18/2018 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5.03 17 48011 [.]

Cache

Cache C++ AWS Systems

The State Of Web Workers In 2021

Smashing Magazine

JUNE 30, 2021

First and foremost, this allows you to implement arbitrarily complex caching behavior, but it has also been extended to let you tap into long-running background fetches, push notifications, and other functionality that requires code to run without an associated page. The DOM actor now updates the DOM according to the new state object.

Games

Games Architecture Code C++

High Memory Usage on ProxySQL Server

Percona

DECEMBER 5, 2022

ProxySQL is a very useful tool for gaining high availability, load balancing, query routing, query caching, query rewriting, multiplexing, and data masking. Your MySQL connection id is 1 Server version: 5.5.30 (ProxySQL Admin Module) Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Commands end with ; or g.

Servers

Servers Cache C++ Metrics

Efficient lock-free durable sets

The Morning Paper

DECEMBER 1, 2019

Memory might be durable, but… …it is expected that caches and registers will remain volatile. Plus ça change, plus c’est la même chose. So, we’re going to need to take care that everything we say is committed is truly durable, and that we can recover to a consistent state following a crash.

Efficiency

Efficiency Cache C++ Performance

KPTI/KAISER Meltdown Initial Performance Regressions

Brendan Gregg

FEBRUARY 9, 2018

This overhead can be reduced by A) pcid, fully available in Linux 4.14, and B) Huge pages. - **Cache access pattern**: the overheads are exacerbated by certain access patterns that switch from caching well to caching a little less well. virtual (bgregg-c5.9xl-i-xxx) 02/09/2018 _x86_64_ (36 CPU) 05:24:51 PM proc/s cswch/s.

Performance

Performance Benchmarking Cache Tuning

HTTP/3: Practical Deployment Options (Part 3)

Smashing Magazine

SEPTEMBER 6, 2021

This approach was touted to be better for fine-grained caching because each subresource could be cached individually and the full bundle didn’t need to be redownloaded if one of them changed. Upon receipt of a valid Alt-Svc header indicating HTTP/3 support, the browser will cache this and try to set up a QUIC connection from then on.

Network

Network Servers Cache Traffic

Three Other Models of Computer System Performance: Part 2

ACM Sigarch

MARCH 25, 2019

The M/M/1 queue will show us a required trade-off among (a) allowing unscheduled task arrivals, (b) minimizing latency, and (c) maximizing throughput. For the previous cache miss buffer example, the 32-buffer answer is minimal for 100-ns average miss latency. While Little’s Law provides a black-box result, it does not expose tradeoffs.

Systems

Systems Latency Performance C++

Three Other Models of Computer System Performance: Part 1

ACM Sigarch

MARCH 18, 2019

For example, how many buffers must a cache have to record outstanding misses if it receives 2 memory references per cycle at 2.5 In our second blog post , we will present the M/M/1 queue that confronts us with a stark, required trade-off among (a) allowing unscheduled task arrivals, (b) minimizing latency, and (c) maximizing throughput.

Systems

Systems Latency Performance Analytics

PostgreSQL Performance Tuning: Optimizing Database Parameters for Maximum Efficiency

Percona

MAY 1, 2023

This blog was originally published in August 2018 and was updated in May 2023. Connection pooling: Minimizing connection overhead and improving response times for frequently accessed data by implementing mechanisms for connection pooling and caching strategies. It is just a guideline, not the exact allocated memory or cache size.

Tuning

Tuning Database Efficiency Performance

How to improve Redo, Transaction Log and WAL throughput for HammerDB benchmarks

HammerDB

NOVEMBER 5, 2018

If you are new to running Oracle, SQL Server, MySQL and PostgreSQL TPC-C workloads with HammerDB and have needed to investigate I/O performance the chances are that you have experienced waits on writing to the Redo, Transaction Log or WAL depending on the database you are testing. SQL> alter system flush buffer_cache; System altered.

Benchmarking

Benchmarking Database C++ Virtualization

KPTI/KAISER Meltdown Initial Performance Regressions

Brendan Gregg

FEBRUARY 8, 2018

This overhead can be reduced by A) pcid, fully available in Linux 4.14, and B) Huge pages. - **Cache access pattern**: the overheads are exacerbated by certain access patterns that switch from caching well to caching a little less well. virtual (bgregg-c5.9xl-i-xxx) 02/09/2018 _x86_64_ (36 CPU) 05:24:51 PM proc/s cswch/s.

Performance

Performance Benchmarking Cache Tuning

SQL Server On Linux: Forced Unit Access (Fua) Internals

SQL Server According to Bob

DECEMBER 18, 2018

Device level flushing may have an impact on your I/O caching, read ahead or other behaviors of the storage system. FILE_FLAG_NO_BUFFERING is the Win32, CreateFile API flags and attributes setting to bypass file system cache. FILE_FLAG_NO_BUFFERING is the Win32, CreateFile API flags and attributes setting to bypass file system cache.

Servers

Servers Media Cache Storage

Headless WordPress: The Ups And Downs Of Creating A Decoupled WordPress

Smashing Magazine

OCTOBER 26, 2018

2018-10-26T13:45:46+02:00. When it comes down to how WordPress is programmed, one thing is certain: it doesn’t follow the M odel- V iew- C ontroller (MVC) design pattern that many developers are familiar with. Headless WordPress: The Ups And Downs Of Creating A Decoupled WordPress. Denis Žoljom. 2019-04-29T18:34:58+00:00.

Cache

Cache Website Speed Testing

Front-End Performance Checklist 2021

Smashing Magazine

JANUARY 11, 2021

Build Optimizations JavaScript modules, module/nomodule pattern, tree-shaking, code-splitting, scope-hoisting, Webpack, differential serving, web worker, WebAssembly, JavaScript bundles, React, SPA, partial hydration, import on interaction, 3rd-parties, cache. This improves page load time and caching during navigations.

Performance

Performance Cache Metrics Media

Front-End Performance Checklist 2020 [PDF, Apple Pages, MS Word]

Smashing Magazine

JANUARY 6, 2020

Globally in 2018–2019, according to the IDC, 87% of all shipped mobile phones are Android devices. PRPL stands for Pushing critical resource, Rendering initial route, Pre-caching remaining routes and Lazy-loading remaining routes on demand. In 2018, the Alliance of Open Media has released a new promising video format called AV1.

Performance

Performance Cache Servers Network

Front-End Performance Checklist 2019 [PDF, Apple Pages, MS Word]

Smashing Magazine

JANUARY 7, 2019

The idea is quite straightforward: Push the minimal code needed to get interactive for the initial route to render quickly, then use service worker for caching and pre-caching resources and then lazy-load routes that you need, asynchronously. In 2018, the Alliance of Open Media has released a new promising video format called AV1.

Performance

Performance Cache Network Metrics

Technology Performance Pulse

Foundation Model for Personalized Recommendation

Fundamentals of Table Expressions, Part 12 – Inline Table-Valued Functions

Trending Sources

Bringing Rich Experiences to Memory-constrained TV Devices

Analyzing a High Rate of Paging

Using Parallel Query with Amazon Aurora for MySQL

What bugs cause cloud production incidents?

Speeding up Linux kernel builds with ccache

Compiler bug? Linker bug? Windows Kernel bug.

A thorough introduction to bpftrace

GotW #98 Solution: Assertion levels (Difficulty: 5/10)

Analyzing a High Rate of Paging

The State Of Web Workers In 2021

High Memory Usage on ProxySQL Server

Efficient lock-free durable sets

KPTI/KAISER Meltdown Initial Performance Regressions

HTTP/3: Practical Deployment Options (Part 3)

Three Other Models of Computer System Performance: Part 2

Three Other Models of Computer System Performance: Part 1

PostgreSQL Performance Tuning: Optimizing Database Parameters for Maximum Efficiency

How to improve Redo, Transaction Log and WAL throughput for HammerDB benchmarks

KPTI/KAISER Meltdown Initial Performance Regressions

SQL Server On Linux: Forced Unit Access (Fua) Internals

Headless WordPress: The Ups And Downs Of Creating A Decoupled WordPress

Front-End Performance Checklist 2021

Front-End Performance Checklist 2020 [PDF, Apple Pages, MS Word]

Front-End Performance Checklist 2019 [PDF, Apple Pages, MS Word]

Stay Connected