Benchmarking and Storage - Technology Performance Pulse

Data Storage Formats for Big Data Analytics: Performance and Cost Implications of Parquet, Avro, and ORC

DZone

SEPTEMBER 9, 2024

One key factor that significantly affects the performance of data processing is the storage format of the data. This article explores the impact of different storage formats, specifically Parquet, Avro, and ORC on query performance and costs in big data environments on Google Cloud Platform (GCP).

Big Data

Big Data Storage Analytics Benchmarking

Notes on tuning postgres for cpu and memory benchmarking

n0derunner

OCTOBER 18, 2024

To do this I needed to drive postgres to do real transactions but have very little jitter/noise from the filesystem and storage. After reading a lot of blogs I came … The post Notes on tuning postgres for cpu and memory benchmarking appeared first on n0derunner.

Benchmarking

Benchmarking Tuning Storage Performance

Choosing the Right Storage for PostgreSQL on Kubernetes: A Benchmark Analysis

Percona

APRIL 1, 2025

As more organizations move their PostgreSQL databases onto Kubernetes, a common question arises: Which storage solution best handles its demands? For stateful workloads like PostgreSQL, storage must offer high availability and safeguard data integrity, even under intense, high-volume conditions.

Storage

Storage Benchmarking Scalability Database

Block Size and Its Impact on Storage Performance

DZone

JUNE 21, 2024

This article analyzes the correlation between block sizes and their impact on storage performance. This paper deals with definitions and understanding of structured data vs unstructured data, how various storage segments react to block size changes, and differences between I/O-driven and throughput-driven workloads.

Storage

Storage Performance Benchmarking Processing

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Message brokers handle validation, routing, storage, and delivery, ensuring efficient and reliable communication. Message Broker vs. Distributed Event Streaming Platform RabbitMQ functions as a message broker, managing message confirmation, routing, storage, and delivery within a queue. What is RabbitMQ?

Latency

Latency Analytics Architecture Storage

PostgreSQL Benchmark: ScaleGrid vs. Amazon RDS

Scalegrid

NOVEMBER 4, 2024

Performance Benchmarking of PostgreSQL on ScaleGrid vs. AWS RDS Using Sysbench This article evaluates PostgreSQL’s performance on ScaleGrid and AWS RDS, focusing on versions 13, 14, and 15. This study benchmarks PostgreSQL performance across two leading managed database platforms—ScaleGrid and AWS RDS—using versions 13, 14, and 15.

Benchmarking

Benchmarking AWS Tuning Metrics

Best MySQL DigitalOcean Performance – ScaleGrid vs. DigitalOcean Managed Databases

Scalegrid

JUNE 22, 2020

ScaleGrid provides 30% more storage on average vs. DigitalOcean for MySQL at the same affordable price. MySQL DigitalOcean Performance Benchmark. In this benchmark, we compare equivalent plan sizes between ScaleGrid MySQL on DigitalOcean and DigitalOcean Managed Databases for MySQL. Read-Intensive Throughput Benchmark.

Database

Database Benchmarking Latency Performance

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Scalegrid

JUNE 4, 2020

On average, ScaleGrid provides over 30% more storage vs. DigitalOcean for PostgreSQL at the same affordable price. PostgreSQL Benchmark Setup. Here is the configuration we used for the ScaleGrid and DigitalOcean benchmark performance tests highlighted above: Configuration. Benchmark Tool. Compare Pricing. Deployment Type.

Database

Database Latency Benchmarking Performance

View from Nutanix storage during Postgres DB benchmark

n0derunner

JUNE 28, 2019

The post View from Nutanix storage during Postgres DB benchmark appeared first on n0derunner. Even though the log writes are sequential, they are low-concurrency and small size (looks like mostly 16K-32K). This write pattern is also a good candidate for oplog. These low-concurrency log writes also hit oplog.

Benchmarking

Benchmarking Storage Cache Database

What is infrastructure monitoring and why is it mission-critical in the new normal?

Dynatrace

NOVEMBER 2, 2020

IT infrastructure is the heart of your digital business and connects every area – physical and virtual servers, storage, databases, networks, cloud services. This shift requires infrastructure monitoring to ensure all your components work together across applications, operating systems, storage, servers, virtualization, and more.

Infrastructure

Infrastructure Monitoring Virtualization Serverless

ScaleGrid DigitalOcean Support for MySQL, PostgreSQL and Redis™ Now Available

Scalegrid

JUNE 9, 2020

ScaleGrid’s MySQL, PostgreSQL and Redis™ solutions on DigitalOcean are competitively priced starting at just $15/GB, the same as DigitalOcean’s Managed Database solution, but offer on average 30% more storage for the same price.

Availability

Availability Open Source Benchmarking Database

Running the ML-Perf Storage benchmark on Nutanix files.

n0derunner

SEPTEMBER 15, 2023

Some technical notes on our submission to the benchmark committee. Background For the past few months engineers from Nutanix have been participating in the MLPerftm Storage benchmark which is designed to measure the storage performance required for ML training workloads. appeared first on n0derunner.

Benchmarking

Benchmarking Storage Servers Engineering

Measuring the importance of data quality to causal AI success

Dynatrace

JANUARY 4, 2024

It starts with implementing data governance practices, which set standards and policies for data use and management in areas such as quality, security, compliance, storage, stewardship, and integration. Fragmented and siloed data storage can create inconsistencies and redundancies.

Government

Government Analytics Benchmarking Storage

Further improved handling and reliability of OneAgent deployments

Dynatrace

NOVEMBER 11, 2020

Dynatrace OneAgent deployment and life-cycle management are already widely considered to be industry benchmarks for reliability and efficiency. Easier rollout thanks to log storage best practices. Easier rollout thanks to log storage best practices. Dynatrace news. Advanced customization of OneAgent deployments made easy.

Best Practices

Best Practices Storage Java Benchmarking

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

Our distributed tracing infrastructure is grouped into three sections: tracer library instrumentation, stream processing, and storage. An additional implication of a lenient sampling policy is the need for scalable stream processing and storage infrastructure fleets to handle increased data volume. Storage: don’t break the bank!

Infrastructure

Infrastructure Transportation Storage Open Source

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. In order to maintain performance, benchmarking is a vital part of our system’s lifecycle.

AWS

AWS Entertainment Open Source Benchmarking

Maximizing Performance of AWS RDS for MySQL with Dedicated Log Volumes

Percona

DECEMBER 11, 2023

A Dedicated Log Volume (DLV) is a specialized storage volume designed to house database transaction logs separately from the volume containing the database tables. DLVs are particularly advantageous for databases with large allocated storage, high I/O per second (IOPS) requirements, or latency-sensitive workloads.

AWS

AWS Benchmarking Performance Traffic

Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook

The Morning Paper

MARCH 10, 2020

Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook , Cao et al., Or in the case of key-value stores, what you benchmark. So if you want to design a system that will offer good real-world performance, it’s really useful to have benchmarks that accurately represent real-world workloads.

Benchmarking

Benchmarking Storage Cache Open Source

Building a Media Understanding Platform for ML Innovations

The Netflix TechBlog

MARCH 14, 2023

This service leverages Cassandra and Elasticsearch for data storage and retrieval. When onboarding embedding vector data we performed an extensive benchmarking to evaluate the available datastores. It can store and retrieve temporal (timestamp) as well as spatial (coordinates) data.

Media

Media Innovation Energy Architecture

Grafana Dashboards: A PoC Implementing the PostgreSQL Extension pg_stat_monitor

Percona

DECEMBER 26, 2023

Querying the data While it is reasonable to create panels showing real-time load in order to explore better the types of queries that can be run against pg_stat_monitor, it is more practical to copy and query the data into tables after the benchmarking has completed its run. A script executing a benchmarking run: #!/bin/bash

Benchmarking

Benchmarking Metrics C++ Database

A Generalized workload generator for storage IO

n0derunner

DECEMBER 22, 2020

With help from the Nutanix X-Ray team I have created an IO “benchmark” which simulates a “General Server Virtualization” workload. The post A Generalized workload generator for storage IO appeared first on n0derunner.

Storage

Storage Benchmarking Virtualization Servers

Evaluating the Evaluation: A Benchmarking Checklist

Brendan Gregg

JUNE 30, 2018

These have inspired me to summarize another performance activity: evaluating benchmark accuracy. Accurate benchmarking rewards engineering investment that actually improves performance, but, unfortunately, inaccurate benchmarking is more common. If the benchmark reported 20k ops/sec, you should ask: why not 40k ops/sec?

Benchmarking

Benchmarking Latency Cache Network

InnoDB Performance Optimization Basics

Percona

MARCH 23, 2023

Storage The type of storage and disk used for database servers can have a significant impact on performance and reliability. Benchmark before you decide. Cloud Different cloud providers offer a range of instance types and sizes, each with varying amounts of CPU, memory, and storage. Transparent huge pages (THP) disabled.

Performance

Performance Hardware Tuning Storage

How To Scale a Single-Host PostgreSQL Database With Citus

Percona

NOVEMBER 3, 2023

xlarge 4vCPU 8GB-RAM Storage: EBS volume (root) 80GB gp2 (IOPS 240/3000) As well, high availability will be integrated, guaranteeing cluster viability in the case that one worker node goes down. And now, execute the benchmark: -- execute the following on the coordinator node pgbench -c 20 -j 3 -T 60 -P 3 pgbench The results are not pretty.

Database

Database Benchmarking Latency C++

The Importance of Selecting the Proper Azure VM Size

SQL Performance

NOVEMBER 18, 2019

IT professionals are familiar with scoping the size of VMs with regards to vCPU, memory, and storage capacity. Storage optimized – High disk throughput and IO. and the overall size will determine the amount of temporary storage available. Premium storage support. VM Types and Sizes. VM Image Options. Generation.

Azure

Azure Benchmarking Storage Virtualization

Impact of Data locality on DB workloads.

n0derunner

JULY 23, 2019

In this video I migrate a Postgres DB running PGbench benchmark. As the DB continues to run on the new host – the Nutanix storage detects the access patterns and “localizes” the data that the DB is accessing. Many different queries are executing in parallel, some hitting RAM cache, some hitting storage.

Benchmarking

Benchmarking Storage Tuning Cache

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. In order to maintain performance, benchmarking is a vital part of our system’s lifecycle.

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. In order to maintain performance, benchmarking is a vital part of our system’s lifecycle.

AWS

AWS Entertainment Open Source Benchmarking

Evaluating the Evaluation: A Benchmarking Checklist

Brendan Gregg

JUNE 29, 2018

These have inspired me to summarize another performance activity: evaluating benchmark accuracy. Accurate benchmarking rewards engineering investment that actually improves performance, but, unfortunately, inaccurate benchmarking is more common. If the benchmark reported 20k ops/sec, you should ask: why not 40k ops/sec?

Benchmarking

Benchmarking Latency Cache Network

MySQL Key Performance Indicators (KPI) With PMM

Percona

JUNE 22, 2023

Indexing efficiency Monitoring indexing efficiency in MySQL involves analyzing query performance, using EXPLAIN statements, utilizing performance monitoring tools, reviewing error logs, performing regular index maintenance, and benchmarking/testing. This KPI is also directly related to Query Performance and helps improve it.

Performance

Performance Monitoring Traffic Database

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

This article will explore how they handle data storage and scalability, perform in different scenarios, and, most importantly, how these factors influence your choice. It uses a hash table to manage these pairs, divided into fixed-size buckets with linked lists for key-value storage. Redis Database Management with ScaleGrid ScaleGrid.io

Cache

Cache Storage Scalability Architecture

The top 5 reasons to run your own database benchmarks

HammerDB

JANUARY 5, 2019

Some opinions claim that “Benchmarks are meaningless”, “benchmarks are irrelevant” or “benchmarks are nothing like your real applications” However for others “Benchmarks matter,” as they “account for the processing architecture and speed, memory, storage subsystems and the database engine.”

Benchmarking

Benchmarking Database Social Media Scalability

SQL Server Index Fill factor with Performance Benchmark

SQL Shack

SEPTEMBER 9, 2019

This option is available in index properties to manage data storage in the data pages. Index Fill factor SQL Server Index Fill Factor is a percentage value to be filled data page with data in SQL Server. It […].

Benchmarking

Benchmarking Servers Performance Storage

Percona Monitoring and Management 2 Scaling and Capacity Planning

Percona

MARCH 17, 2023

PMM2 uses VictoriaMetrics (VM) as its metrics storage engine. Please note that the focus of these tests was around standard metrics gathering and display, we’ll use a future blog post to benchmark some of the more intensive query analytics (QAN) performance numbers.

Monitoring

Monitoring Scalability Database Cache

RPC vs. Messaging – which is faster?

Particular Software

SEPTEMBER 20, 2021

Why RPC is “faster” It’s tempting to simply write a micro-benchmark test where we issue 1000 requests to a server over HTTP and then repeat the same test with asynchronous messages. If you did such a benchmark, here’s an incomplete picture you might end up with: Graph of microbenchmark showing RPC is faster than messaging.

Benchmarking

Benchmarking Latency Servers Systems

Beware of tiny working-set-sizes when testing storage performance.

n0derunner

JULY 1, 2022

I was recently asked to investigate why Nutanix storage was not as fast as a competing solution in a PoC environment. One thing that seemed really odd was that the working set size for the tests were in the order of … The post Beware of tiny working-set-sizes when testing storage performance. appeared first on n0derunner.

Storage

Storage Testing Performance Benchmarking

Choosing a cloud DBMS: architectures and tradeoffs

The Morning Paper

AUGUST 29, 2019

use the TPC-H benchmark to assess Redshift, Redshift Spectrum, Athena, Presto, Hive, and Vertica to find out what works best and the trade-offs involved. For cost calculations, the costs are a combination of compute costs, storage costs, data scan costs, and software license costs. Key findings. System initialisation time.

Architecture

Architecture Cloud Storage Serverless

The Most Important MySQL Setting

Percona

APRIL 7, 2023

To illustrate this, I ran the Sysbench-TPCC synthetic benchmark against two different GCP instances running a freshly installed Percona Server for MySQL version 8.0.31 In MySQL, considering the standard storage engine, InnoDB , the data cache is called Buffer Pool. In PostgreSQL, it is called shared buffers.

Tuning

Tuning Cache Servers Benchmarking

What Is a Workload in Cloud Computing

Scalegrid

JANUARY 12, 2024

Storage is a critical aspect to consider when working with cloud workloads. High availability storage options within the context of cloud computing involve highly adaptable storage solutions specifically designed for storing vast amounts of data while providing easy access to it. What is an example of a workload?

Cloud

Cloud Virtualization Storage Efficiency

Measuring CPU performance with X-Ray and pgbench.

n0derunner

OCTOBER 1, 2019

Nutanix X-Ray is well known for being able to model IO/Storage workloads, but what about workloads that are CPU bound? For our purposes we are going to use Postgres DB and the built-in benchmarking tool PGbench. This time though the metric is Database transactions per second not IOPS or Storage throughput.

Performance

Performance Benchmarking Storage Database

Virtual consensus in Delos

The Morning Paper

NOVEMBER 8, 2020

replacing Paxos with Raft), or they could be shims over external storage systems. A minimal Loglet needs to provide totally ordered, durable storage via the shared log API. The evaluation section has lots of good information on experiences running Delos in production, as well as some synthetic benchmarks. The NativeLoglet.

Virtualization

Virtualization Latency Storage Systems

DBaaS vs Self-Managed Cloud Databases

Scalegrid

DECEMBER 6, 2023

Self-managed databases come with their own set of expenses that must be factored in – managing a database requires time and effort which often includes backup storage, patching software upgrades as well as other typical administration tasks. Advantages of DBaaS Database management with DBaaS is like being on a luxury cruise.

Database

Database Cloud Hardware Storage

Lerner?—?using RL agents for test case scheduling

The Netflix TechBlog

MAY 21, 2019

Netflix engineers run a series of tests and benchmarks to validate the device across multiple dimensions including compatibility of the device with the Netflix SDK, device performance, audio-video playback quality, license handling, encryption and security. It could help us design and implement more targeted reward functions.

Testing

Testing AWS Lambda Network

Crucial Redis Monitoring Metrics You Must Watch

Scalegrid

JANUARY 25, 2024

Key metrics like throughput, request latency, and memory utilization are essential for assessing Redis health, with tools like the MONITOR command and Redis-benchmark for latency and throughput analysis and MEMORY USAGE/STATS commands for evaluating memory. It depends upon your application workload and its business logic.

Metrics

Metrics Monitoring Latency Cache

Data Storage Formats for Big Data Analytics: Performance and Cost Implications of Parquet, Avro, and ORC

Notes on tuning postgres for cpu and memory benchmarking

Trending Sources

Choosing the Right Storage for PostgreSQL on Kubernetes: A Benchmark Analysis

Block Size and Its Impact on Storage Performance

RabbitMQ vs. Kafka: Key Differences

PostgreSQL Benchmark: ScaleGrid vs. Amazon RDS

Best MySQL DigitalOcean Performance – ScaleGrid vs. DigitalOcean Managed Databases

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

View from Nutanix storage during Postgres DB benchmark

What is infrastructure monitoring and why is it mission-critical in the new normal?

ScaleGrid DigitalOcean Support for MySQL, PostgreSQL and Redis™ Now Available

Running the ML-Perf Storage benchmark on Nutanix files.

Measuring the importance of data quality to causal AI success

Further improved handling and reliability of OneAgent deployments

Building Netflix’s Distributed Tracing Infrastructure

Netflix at AWS re:Invent 2019

Maximizing Performance of AWS RDS for MySQL with Dedicated Log Volumes

Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook

Building a Media Understanding Platform for ML Innovations

Grafana Dashboards: A PoC Implementing the PostgreSQL Extension pg_stat_monitor

A Generalized workload generator for storage IO

Evaluating the Evaluation: A Benchmarking Checklist

InnoDB Performance Optimization Basics

How To Scale a Single-Host PostgreSQL Database With Citus

The Importance of Selecting the Proper Azure VM Size

Impact of Data locality on DB workloads.

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Evaluating the Evaluation: A Benchmarking Checklist

MySQL Key Performance Indicators (KPI) With PMM

Redis vs Memcached in 2024

The top 5 reasons to run your own database benchmarks

SQL Server Index Fill factor with Performance Benchmark

Percona Monitoring and Management 2 Scaling and Capacity Planning

RPC vs. Messaging – which is faster?

Beware of tiny working-set-sizes when testing storage performance.

Choosing a cloud DBMS: architectures and tradeoffs

The Most Important MySQL Setting

What Is a Workload in Cloud Computing

Measuring CPU performance with X-Ray and pgbench.

Virtual consensus in Delos

DBaaS vs Self-Managed Cloud Databases

Lerner?—?using RL agents for test case scheduling

Crucial Redis Monitoring Metrics You Must Watch

Stay Connected