Cache and Storage - Technology Performance Pulse

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

After selecting a mode, users can interact with APIs without needing to worry about the underlying storage mechanisms and counting methods. Best Effort Regional Counter This type of counter is powered by EVCache , Netflix’s distributed caching solution built on the widely popular Memcached.

Latency

Latency Cache Infrastructure Strategy

The Power of Caching: Boosting API Performance and Scalability

DZone

AUGUST 16, 2023

Caching is the process of storing frequently accessed data or resources in a temporary storage location, such as memory or disk, to improve retrieval speed and reduce the need for repetitive processing.

Cache

Cache Scalability Performance Latency

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

We introduce a caching mechanism in the API gateway layer, allowing us to offload processing from singleton leader elected controllers without giving up strict data consistency and guarantees clients observe. When a new leader is elected it loads all data from external storage. The cache is kept in sync with the current leader process.

Cache

Cache Latency Traffic Systems

MezzFS?—?Mounting object storage in Netflix’s media processing platform

The Netflix TechBlog

MARCH 6, 2019

Mounting object storage in Netflix’s media processing platform By Barak Alon (on behalf of Netflix’s Media Cloud Engineering team) MezzFS (short for “Mezzanine File System”) is a tool we’ve developed at Netflix that mounts cloud objects as local files via FUSE. Our object storage service splits objects into many parts and stores them in S3.

Media

Media Storage Processing Cache

How We Optimized Read Performance: Readahead, Prefetch, and Cache

DZone

SEPTEMBER 3, 2024

However, the increasing sizes of both data volumes and distributed system clusters raise significant cost challenges for all-flash storage and vast operational challenges for kernel clients. It improves I/O throughput substantially through the distributed cache and uses cost-effective object storage for data storage.

Cache

Cache Performance Storage Architecture

The Challenges of Ajax CDN

DZone

AUGUST 4, 2022

The host offered browser caching advantages, better stability, and storage on fast edge servers across strategic geolocations. The idea has been that a CDN has fast edge servers that cache content and deliver it based on the user’s geolocation. Not only did it have performance benefits, but it was also convenient for developers.

Cache

Cache Tuning Storage Website

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Our goal was to build a versatile and efficient data storage solution that could handle a wide variety of use cases, ranging from the simplest hashmaps to more complex data structures, all while ensuring high availability, tunable consistency, and low latency. Developers just provide their data problem rather than a database solution!

Latency

Latency Storage Cache Servers

Designing Instagram

High Scalability

JANUARY 11, 2022

Firstly, the synchronous process which is responsible for uploading image content on file storage, persisting the media metadata in graph data-storage, returning the confirmation message to the user and triggering the process to update the user activity. Fetching User Feed. Sample Queries supported by Graph Database. Optimization.

Design

Design Media Storage Logistics

Mastering Disk Space Management with MongoDB® Storage Engines

Scalegrid

MAY 11, 2024

MongoDB offers several storage engines that cater to various use cases. The default storage engine in earlier versions was MMAPv1, which utilized memory-mapped files and document-level locking. The newer, pluggable storage engine, WiredTiger, addresses this by using prefix compression, collection-level locking, and row-based storage.

Storage

Storage Engineering Cache Database

Improved Alerting with Atlas Streaming Eval

The Netflix TechBlog

APRIL 27, 2023

While Atlas is architected around compute & storage separation, and we could theoretically just scale the query layer to meet the increased query demand, every query, regardless of its type, has a data component that needs to be pushed down to the storage layer.

Storage

Storage Cache Metrics Database

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

SEPTEMBER 24, 2021

From chunk encoding to assembly and packaging, the result of each previous processing step must be uploaded to cloud storage and then downloaded by the next processing step. Since not all projects are terabytes projects, allocating the largest cloud storage to all packager instances is not an efficient use of cloud resources.

Cloud

Cloud Media Storage Cache

AWS serverless services: Exploring your options

Dynatrace

OCTOBER 7, 2021

This means you no longer have to provision, scale, and maintain servers to run your applications, databases, and storage systems. Speed is next; serverless solutions are quick to spin up or down as needed, and there are no delays due to limited storage or resource access. AWS offers four serverless offerings for storage.

Serverless

Serverless AWS Lambda Storage

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Smashing Magazine

JANUARY 7, 2025

In this article, well discuss six ways to design websites for high-traffic events like product drops and sales: Compress and optimize images , Choose a scalable web host , Use a CDN , Leverage caching , Stress test websites , Refine the backend. You can also find optimization plugins or caching solutions that give you access to a CDN.

Traffic

Traffic Website Design Cache

Building an elastic query engine on disaggregated storage

The Morning Paper

MARCH 8, 2020

Building an elastic query engine on disaggregated storage , Vuppalapati, NSDI’20. Snowflake is a data warehouse designed to overcome these limitations, and the fundamental mechanism by which it achieves this is the decoupling (disaggregation) of compute and storage. joins) during query processing. Disaggregation (or not).

Storage

Storage Engineering Cache Serverless

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

Dynatrace Kubernetes Observability for Persistent Volume Claims

Dynatrace

AUGUST 1, 2022

Interestingly, our partner RedHat reported in 2021 that around 80% of deployed workloads are databases or data caches, storing data in persistent volume claims (PVCs). You quickly realize that it will take ages to fill up the overprovisioned database storage. Two days later, your database runs out of storage in the middle of the night.

Storage

Storage Database Network Metrics

Remote Workstations for the Discerning Artists

The Netflix TechBlog

MARCH 8, 2021

They could need a GPU when doing graphics-intensive work or extra large storage to handle file management. Instead, we created a service to take the most popular configurations and cache them. We rely on our internal partner teams to support components installed on the workstation, such as storage and artist tools.

Entertainment

Entertainment Storage Open Source Hardware

MySQL Data Caching Efficiency

Percona

APRIL 14, 2023

A shared characteristic in most (if not all) databases, be them traditional relational databases like Oracle, MySQL, and PostgreSQL or some kind of NoSQL-style database like MongoDB, is the use of a caching mechanism to keep (a copy of) part of the data in memory. How do you know if your MySQL database caching is operating efficiently?

Cache

Cache Efficiency Database Monitoring

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Of the organizations in the Kubernetes survey, 71% run databases and caches in Kubernetes, representing a +48% year-over-year increase. Together with messaging systems (+36% growth), organizations are increasingly using databases and caches to persist application workload states.

Open Source

Open Source Java Operating System Programming

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Flexible Storage : The service is designed to integrate with various storage backends, including Apache Cassandra and Elasticsearch , allowing Netflix to customize storage solutions based on specific use case requirements. Note : With Cassandra 4.x

Latency

Latency Storage Traffic Tuning

Geek Reading - Week of June 5, 2013

DZone

OCTOBER 11, 2022

Using MongoDB as a cache store ( Architects Zone – Architectural Design Patterns & Best Practices). Email Reveals Google App Engine Search API About Ready For Preview Release, Charges Planned For Storage, Operations ( TechCrunch). Why haven’t cash-strapped American schools embraced open source? Hacker News). Java EE 7 is Final.

Java

Java Best Practices Google Analytics

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

Data warehouses offer a single storage repository for structured data and provide a source of truth for organizations. Unlike data warehouses, however, data is not transformed before landing in storage. A data lakehouse provides a cost-effective storage layer for both structured and unstructured data. Data management.

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

Elasticsearch Indexing Strategy in Asset Management Platform (AMP)

The Netflix TechBlog

MARCH 10, 2023

are stored in secure storage layers. Amsterdam is built on top of three storage layers. To avoid the ES query for the list of indices for every indexing request, we keep the list of indices in a distributed cache. It is also responsible for asset discovery, validation, sharing, and for triggering workflows.

Strategy

Strategy Cache Storage Analytics

How Uber Serves Over 40 Million Reads Per Second from Online Storage Using an Integrated Cache

Uber Engineering

MAY 2, 2024

Learn how Uber serves over 40 million reads per second from its in-house, distributed database built on top of MySQL using an integrated caching solution: CacheFront.”

Cache

Cache Storage Database

Helping VFX studios pave a path to the cloud

The Netflix TechBlog

NOVEMBER 15, 2022

But it’s not easy: to pull this off, VFX studios need to build and operate serious technical infrastructure (compute, storage, networking, and software licensing), otherwise known as a “ render farm.” Netflix production teams work with a global roster of VFX studios (both large and small) and their artists to create this amazing imagery.

Cloud

Cloud Entertainment AWS Infrastructure

PostgreSQL Indexes Can Hurt You: Negative Effects and the Costs Involved

Percona

APRIL 24, 2023

The more indexes, the more the requirement of memory for effective caching. Indexes need more cache than tables Due to random writes and reads, indexes need more pages to be in the cache. Cache requirements for indexes are generally much higher than associated tables.

Tuning

Tuning Cache Storage Database

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

Key Takeaways Redis offers complex data structures and additional features for versatile data handling, while Memcached excels in simplicity with a fast, multi-threaded architecture for basic caching needs. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios.

Cache

Cache Storage Scalability Architecture

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

KeyValue is an abstraction over the storage engine itself, which allows us to choose the best storage engine that meets our SLO needs. The DeviceToDeviceManager is also responsible for observability, with metrics around cache hits, calls to the data store, message delivery rates, and latency percentile measurements.

Latency

Latency Cache Tuning Efficiency

What is session replay? Discover user pain points with session recordings

Dynatrace

DECEMBER 20, 2021

Streamlined asset caching: Asset caching is critical for creating accurate replays. Tools that feature client-side compression can help reduce total data transfer volumes and storage footprints. To maximize ROI, make sure your provider is up-front about the costs of recording, transfer, storage, and use before making the move.

Mobile

Mobile Website Analytics Cache

How Bloom Filters Work in MyRocks

Percona

FEBRUARY 15, 2023

For good performance, the filter blocks are cached in the RocksDB block cache and normally stay there since they are accessed frequently. LSM storage engines like MyRocks are very different from the more common B-Tree-based storage engines like InnoDB.

Storage

Storage Tuning Cache Engineering

Back-to-Basics Weekend Reading - A Decomposition Storage Model

All Things Distributed

SEPTEMBER 20, 2013

Not everybody agreed that the "N-ary Storage Model" (NSM) was the best approach for all workloads but it stayed dominant until hardware constraints, especially on caches, forced the community to revisit some of the alternatives. A Decomposition Storage Model , George P. Copeland and Setrag N.

Storage

Storage Hardware Cache C++

View from Nutanix storage during Postgres DB benchmark

n0derunner

JUNE 28, 2019

Since the DB is small (50% the size of the Linux RAM) – the database is mostly cached on the read side – so we only see writes going to the DB files. The post View from Nutanix storage during Postgres DB benchmark appeared first on n0derunner. The other is doing reads and writes from the main datafiles.

Benchmarking

Benchmarking Storage Cache Database

Best practices and key metrics for improving mobile app performance

Dynatrace

DECEMBER 13, 2023

This includes how quickly the application loads, how much load it is putting on the device, how much storage is being used, and how frequently it crashes. This can be achieved by reducing the size of files or images, using caching, and compressing data. Optimize images and videos.

Best Practices

Best Practices Mobile Metrics Performance

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

All Things Distributed

AUGUST 22, 2011

Today AWS has launched Amazon ElastiCache , a new service that makes it easy to add distributed in-memory caching to any application. Amazon ElastiCache handles the complexity of creating, scaling and managing an in-memory cache to free up brainpower for more differentiating activities. Driving Storage Costs Down for AWS Customers.

Cloud

Cloud Cache AWS Storage

Data ingestion pipeline with Operation Management

The Netflix TechBlog

MARCH 7, 2023

We store all OperationIDs which are in STARTED state in a distributed cache (EVCache) for fast access during searches. For example, they can store the annotations in a blob storage like S3 and give us a link to the file as part of the single API. This new operation is marked to be in STARTED state.

Media

Media Latency Architecture Database

Improving Spark Memory Resource With Off-Heap In-Memory Storage

DZone

NOVEMBER 13, 2019

Improve your Spark memory. In the previous tutorial , we demonstrated how to get started with Spark and Alluxio. To share more thoughts and experiments on how Alluxio enhances Spark workloads, this article focuses on how Alluxio helps to optimize the memory utilization of Spark applications.

Storage

Storage Cache Performance

Use Distributed Caching to Accelerate Online Web Sites

ScaleOut Software

APRIL 22, 2020

The Solution: Distributed Caching. The solution to this challenge is to use scalable, memory-based data storage for fast-changing data so that web sites can keep up with exploding workloads. It’s not enough simply to lash together a set of servers hosting a collection of in-memory caches.

Cache

Cache Storage Servers Database

Use Distributed Caching to Accelerate Online Web Sites

ScaleOut Software

APRIL 22, 2020

The Solution: Distributed Caching. The solution to this challenge is to use scalable, memory-based data storage for fast-changing data so that web sites can keep up with exploding workloads. It’s not enough simply to lash together a set of servers hosting a collection of in-memory caches.

Cache

Cache Storage Servers Database

InnoDB Performance Optimization Basics

Percona

MARCH 23, 2023

By caching hot datasets, indexes, and ongoing changes, InnoDB can provide faster response times and utilize disk IO in a much more optimal way. Storage The type of storage and disk used for database servers can have a significant impact on performance and reliability. Setting oom_score_adj to -800. References How MySQL 8.0.21

Performance

Performance Hardware Tuning Storage

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Given the scale of the data being generated using replay traffic, we record the responses from the two sides to a cost-effective cold storage facility using technology like Apache Iceberg. It helps expose memory leaks, deadlocks, caching issues, and other system issues.

Traffic

Traffic Latency Tuning Systems

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

Storing frequently accessed data in faster storage, usually in-memory caching, improves data retrieval speed and overall system performance. Beyond A study by Amazon found that increasing page load time by just 100 milliseconds costs 1% in sales.  Beyond efficiency, validating performance thresholds is also crucial for revenues.

AWS

AWS Efficiency Azure Cloud

File systems unfit as distributed storage backends: lessons from ten years of Ceph evolution

The Morning Paper

NOVEMBER 5, 2019

File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution Aghayev et al., In this case, the assumption that a distributed storage backend should clearly be layered on top of a local file system. What is a distributed storage backend? SOSP’19. This is not surprising in hindsight.

Storage

Storage Systems Hardware Efficiency

The Most Important MySQL Setting

Percona

APRIL 7, 2023

But since retrieving data from disk is slow, databases tend to work with a caching mechanism to keep as much hot data, the bits and pieces that are most often accessed, in memory. In MySQL, considering the standard storage engine, InnoDB , the data cache is called Buffer Pool. In PostgreSQL, it is called shared buffers.

Tuning

Tuning Cache Servers Benchmarking

Why you need Dynatrace on Azure Workloads

Dynatrace

NOVEMBER 5, 2019

In addition to the OneAgent collecting all these metrics, Dynatrace has an integration with Azure Monitor to capture additional metrics for platform services such as Storage Accounts, Redis Cache, API Management Services, Load Balancers among others. Dynatrace does this by querying Azure monitor APIs to collect platform metrics.

Azure

Azure Artificial Intelligence Metrics Innovation

Netflix’s Distributed Counter Abstraction

The Power of Caching: Boosting API Performance and Scalability

Trending Sources

Consistent caching mechanism in Titus Gateway

MezzFS?—?Mounting object storage in Netflix’s media processing platform

How We Optimized Read Performance: Readahead, Prefetch, and Cache

The Challenges of Ajax CDN

Introducing Netflix’s Key-Value Data Abstraction Layer

Designing Instagram

Mastering Disk Space Management with MongoDB® Storage Engines

Improved Alerting with Atlas Streaming Eval

Netflix Cloud Packaging in the Terabyte Era

AWS serverless services: Exploring your options

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Building an elastic query engine on disaggregated storage

What is a Distributed Storage System

Dynatrace Kubernetes Observability for Persistent Volume Claims

Remote Workstations for the Discerning Artists

MySQL Data Caching Efficiency

Kubernetes in the wild report 2023

Introducing Netflix TimeSeries Data Abstraction Layer

Geek Reading - Week of June 5, 2013

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Elasticsearch Indexing Strategy in Asset Management Platform (AMP)

How Uber Serves Over 40 Million Reads Per Second from Online Storage Using an Integrated Cache

Helping VFX studios pave a path to the cloud

PostgreSQL Indexes Can Hurt You: Negative Effects and the Costs Involved

Redis vs Memcached in 2024

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

What is session replay? Discover user pain points with session recordings

How Bloom Filters Work in MyRocks

Back-to-Basics Weekend Reading - A Decomposition Storage Model

View from Nutanix storage during Postgres DB benchmark

Best practices and key metrics for improving mobile app performance

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

Data ingestion pipeline with Operation Management

Improving Spark Memory Resource With Off-Heap In-Memory Storage

Use Distributed Caching to Accelerate Online Web Sites

Use Distributed Caching to Accelerate Online Web Sites

InnoDB Performance Optimization Basics

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Implementing AWS well-architected pillars with automated workflows

File systems unfit as distributed storage backends: lessons from ten years of Ceph evolution

The Most Important MySQL Setting

Why you need Dynatrace on Azure Workloads

Stay Connected