Article and Storage - Technology Performance Pulse

Automating Twilio Recording Exports for Quality Purposes: Python Implementation Guidelines

DZone

JANUARY 7, 2025

Twilio is a call management system that provides excellent call recording capabilities, but often organizations are in need of automatically downloading and storing these recordings locally or in their preferred cloud storage. However, downloading large numbers of recordings from Twilio can be challenging.

Storage

Storage Efficiency Cloud Systems

Data Storage Formats for Big Data Analytics: Performance and Cost Implications of Parquet, Avro, and ORC

DZone

SEPTEMBER 9, 2024

One key factor that significantly affects the performance of data processing is the storage format of the data. This article explores the impact of different storage formats, specifically Parquet, Avro, and ORC on query performance and costs in big data environments on Google Cloud Platform (GCP).

Big Data

Big Data Storage Analytics Benchmarking

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Block Size and Its Impact on Storage Performance

DZone

JUNE 21, 2024

This article analyzes the correlation between block sizes and their impact on storage performance. This paper deals with definitions and understanding of structured data vs unstructured data, how various storage segments react to block size changes, and differences between I/O-driven and throughput-driven workloads.

Storage

Storage Performance Benchmarking Processing

Article: Design Pattern Proposal for Autoscaling Stateful Systems

InfoQ

JANUARY 25, 2023

In this article, Rogerio Robetti discusses the challenges in auto-scaling stateful storage systems and proposes an opinionated design solution to automatically scale up (vertical) and scale out (horizontal) from a single node up to several nodes in a cluster with minimum configuration and interference of the operator.

Design

Design Systems Storage Data Engineering

Article: Magic Pocket: Dropbox’s Exabyte-Scale Blob Storage System

InfoQ

MAY 15, 2023

A horizontally scalable exabyte-scale blob storage system which operates out of multiple regions, Magic Pocket is used to store all of Dropbox’s data. Adopting SMR technology and erasure codes, the system has extremely high durability guarantees but is cheaper than operating in the cloud. By Facundo Agriel

Storage

Storage Systems Scalability Cloud

Efficient Multimodal Data Processing: A Technical Deep Dive

DZone

FEBRUARY 27, 2025

In this article, I will walk through a comprehensive end-to-end architecture for efficient multimodal data processing while striking a balance in scalability, latency, and accuracy by leveraging GPU-accelerated pipelines, advanced neural networks , and hybrid storage platforms.

Efficiency

Efficiency Processing Latency Storage

Time Series Analysis: VAR-Model-As-A-Service Using Flask and MinIO

DZone

SEPTEMBER 25, 2023

It is the second of a series of articles that is built on top of that project, representing experiments with various statistical and machine learning models, data pipelines implemented using existing DAG tools, and storage services, both cloud-based and alternative on-premises solutions.

Storage

Storage AWS Architecture Cloud

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

DZone

MARCH 29, 2023

Data migration involves transferring data from on-premise storage to the cloud. This article discusses the challenges and best practices of data migration when transferring on-premise data to the cloud. This article discusses the challenges and best practices of data migration when transferring on-premise data to the cloud.

Best Practices

Best Practices Cloud Data Engineering Storage

An Efficient Object Storage for JUnit Tests

DZone

JANUARY 29, 2020

To resolve the problem it was suggested to find more suitable data storage. It is a key problem which we will try to resolve in this article. For some internal reasons well known Amazon S3 bucket was chosen for this purpose. The choice affected the project's unit test base.

Storage

Storage Efficiency Testing Database

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

This article outlines the key differences in architecture, performance, and use cases to help determine the best fit for your workload. Message brokers handle validation, routing, storage, and delivery, ensuring efficient and reliable communication. What is RabbitMQ?

Latency

Latency Analytics Architecture Storage

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages. Polymorphic Data Storage. Greenplum’s polymorphic data storage allows you to control the configuration for your table and partition storage with the freedom to execute and compress files within it at any time.

Big Data

Big Data Database Artificial Intelligence Open Source

Metadata Synchronization in Alluxio: Design, Implementation, and Optimization

DZone

DECEMBER 14, 2021

Metadata synchronization (sync) is a core feature in Alluxio that keeps files and directories consistent with their source of truth in under-storage systems, thus making it simple for users to reason the data retrieved from Alluxio. This article describes the design and the implementation in Alluxio to keep metadata synchronized.

Design

Design Storage Tuning Systems

Time Series Analysis: VARMAX-As-A-Service

DZone

AUGUST 29, 2023

It is the first of a series of articles that will be built on top of that project, representing experiments with various statistical and machine learning models, data pipelines implemented using existing DAG tools, and storage services, both cloud-based and alternative on-premises solutions.

Storage

Storage Cloud

Transforming Business Outcomes Through Strategic NoSQL Database Selection

DZone

NOVEMBER 25, 2023

We often dwell on the technical aspects of database selection, focusing on performance metrics , storage capacity, and querying capabilities. In a detailed article, we've discussed how to align a NoSQL database with specific business needs.

Database

Database Latency Speed Metrics

Cutting Big Data Costs: Effective Data Processing With Apache Spark

DZone

SEPTEMBER 14, 2023

In this article, we will delve into strategies to ensure that your data pipeline is resource-efficient, cost-effective, and time-efficient. Spark takes full advantage of this storage property by exclusively reading the columns that are involved in subsequent computations.

Big Data

Big Data Processing Open Source Games

The Block Allocation Policy of Virtual Distributed File System at the Source Code Level

DZone

SEPTEMBER 22, 2022

Users can allocate different storage tiers as the resources for Alluxio workers, including MEM/SSD/HDD, which are further composed of directories. In this article, we analyze the policies of block allocation from the source code. Alluxio workers are responsible for managing local resources, and they store data as blocks.

Code

Code Virtualization Systems Storage

Low Overhead Continuous Contextual Production Profiling

DZone

JUNE 15, 2023

Moreover, the process of collecting these profiles introduces overhead during application runtime and necessitates the storage and visualization of significantly large datasets. This article explores the concept of low overhead high-frequency profilers, which offer a solution to these challenges.

Latency

Latency Storage Strategy Metrics

How to Perform Load Testing Against Nebula Graph With K6

DZone

DECEMBER 17, 2021

The load testing for the database needs to be conducted usually so that the impact on the system can be monitored in different scenarios, such as query language rule optimization, storage engine parameter adjustment, etc. The operating system in this article is the x86 CentOS 7.8.

Testing

Testing Operating System Storage Performance

Using JSONB in PostgreSQL: How to Effectively Store & Index JSON Data in PostgreSQL

Scalegrid

JULY 17, 2020

What’s in this article? JSONB storage has some drawbacks vs. traditional columns: PostreSQL does not store column statistics for JSONB columns. JSONB storage results in a larger storage footprint. JSONB storage does not deduplicate the key names in the JSON. Why Store JSON in PostgreSQL? JSONB Indexes.

Storage

Storage Database Efficiency Processing

Medallion Architecture: Efficient Batch and Stream Processing Data Pipelines With Azure Databricks and Delta Lake

DZone

JULY 13, 2023

This article explores the concepts of Medallion Architecture and demonstrates how to implement batch and stream processing pipelines using Azure Databricks and Delta Lake. In Azure Databricks, this architecture can be implemented using Delta Lake to provide reliable data storage and processing capabilities.

Azure

Azure Architecture Efficiency Processing

Building an Optimized Data Pipeline on Azure Using Spark, Data Factory, Databricks, and Synapse Analytics

DZone

APRIL 11, 2023

This article will explore how these technologies can be used together to create an optimized data pipeline for data processing in the cloud. It provides built-in connectors for various data sources such as databases, file systems, cloud storage, and more.

Azure

Azure Analytics Storage Cloud

Optimizing Prometheus and Grafana with the Prometheus Operator

DZone

JULY 20, 2021

Taking a proactive and efficient approach to Kubernetes cluster monitoring can help engineering teams identify and predict many critical problems like CPU outage, memory outage, storage issues well in advance of these issues taking a toll on a business.

Monitoring

Monitoring Storage Efficiency Engineering

Further improved handling and reliability of OneAgent deployments

Dynatrace

NOVEMBER 11, 2020

In this article we’ll share highlights about two increments that are likely to fall into the “barely noticeable” category. Easier rollout thanks to log storage best practices. Easier rollout thanks to log storage best practices. What does this mean for existing installations? No problem.

Best Practices

Best Practices Storage Java Benchmarking

Reducing Your Database Hosting Costs: DigitalOcean vs. AWS vs. Azure

Scalegrid

APRIL 28, 2020

In this article, we are going to compare three of the most popular cloud providers, AWS vs. Azure vs. DigitalOcean for their database hosting costs for MongoDB® database to help you decide which cloud is best for your business. DigitalOcean using the below instance types: AWS. EC2 instances. VM instances. DigitalOcean.

Azure

Azure AWS Database Latency

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Our goal was to build a versatile and efficient data storage solution that could handle a wide variety of use cases, ranging from the simplest hashmaps to more complex data structures, all while ensuring high availability, tunable consistency, and low latency. Developers just provide their data problem rather than a database solution!

Latency

Latency Storage Cache Efficiency

Why growing AI adoption requires an AI observability strategy

Dynatrace

JANUARY 17, 2024

AI requires more compute and storage. Training AI data is resource-intensive and costly, again, because of increased computational and storage requirements. As a result, AI observability supports cloud FinOps efforts by identifying how AI adoption spikes costs because of increased usage of storage and compute resources.

Strategy

Strategy Artificial Intelligence Storage Cloud

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

High Scalability

SEPTEMBER 8, 2018

They've posted about Anna's new superpowers in Going Fast and Cheap: How We Made Anna Autoscale : Using Anna v0 as an in-memory storage engine, we set out to address the cloud storage problems described above. Each storage server collects statistics about the requests it serves, the data it stores, etc. Related Articles.

Storage

Storage Performance AWS Cloud

Using Diskspd to test SQL Server Storage Subsystems

SQL Shack

JULY 28, 2020

In this article, we will learn how to test our storage subsystems performance using Diskspd. The storage subsystem is one of the key performance factors for SQL Server because SQL Server storage engine stores database objects, tables, and indexes on the physical files.

Storage

Storage Servers Testing Engineering

Improved configurability of OneAgent log file directory location

Dynatrace

MARCH 10, 2021

If you’re responsible for maintaining your organization’s OneAgent install process or you have particular interest in the OneAgent file footprint on your monitored hosts, this article is for you. You can read about these prior enhancements in the dedicated article. You can read about how this works in Dynatrace Help.

Storage

Storage Processing Systems Monitoring

Improving Spark Memory Resource With Off-Heap In-Memory Storage

DZone

NOVEMBER 13, 2019

To share more thoughts and experiments on how Alluxio enhances Spark workloads, this article focuses on how Alluxio helps to optimize the memory utilization of Spark applications. Improve your Spark memory. In the previous tutorial , we demonstrated how to get started with Spark and Alluxio.

Storage

Storage Cache Performance

Configuring OpenTelemetry Agents to Enrich Data and Reduce Observability Costs

DZone

JANUARY 16, 2023

BindPlane OP is a powerful open-source tool that makes it easy to build and manage telemetry pipelines to ship data from IT environments of any kind and size to any analysis tool or storage destination.

Open Source

Open Source Storage Processing

How Dynatrace protects its software development and delivery life cycle against supply chain attacks

Dynatrace

DECEMBER 16, 2020

This article explains what a software supply chain attack is, and how Dynatrace protects its customers against such attacks by applying: Risk management and business continuity planning. Every storage location involving data at rest is encrypted as well. Dynatrace news. Security controls in the software development life cycle (SDL).

Software

Software Software Development AWS

Effortless Credential Management in Azure: The Power of Managed Identities

DZone

JUNE 13, 2024

While it encompasses many functionalities, the article will focus on Managed Identities. For instance, for a storage account named "Foo", its connection string might be "Bar". Azure Entra Id , formerly Azure Active Directory is a comprehensive Identity and Access Management offering from Microsoft. Why Managed Identities?

Azure

Azure Storage

How to Optimize Elasticsearch for Better Search Performance

DZone

JULY 29, 2019

One of the top trending open-source data storage that responds to most of the use cases is Elasticsearch. Elasticsearch is a distributed data storage and search engine with fault-tolerance and high availability capabilities. This article will focus on the search intensive initial and dynamic configurations of the Elasticsearch.

Big Data

Big Data Government Open Source Storage

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

The KV DAL allows applications to use a well-defined and storage engine agnostic HTTP/gRPC key-value data interface that in turn decouples applications from hard to maintain and backwards-incompatible datastore APIs. As most key-value storage engines support efficiently deleting a namespace (e.g.

Latency

Latency Storage Big Data Tuning

All of Netflix’s HDR video streaming is now dynamically optimized

The Netflix TechBlog

NOVEMBER 29, 2023

Details pertaining to HDR-VMAF exceed the scope of this article and will be covered in a future blog post; for now, suffice it to say that the first version of HDR-VMAF landed internally in 2021 and we have been improving the metric ever since. Summary Thanks to the arrival of HDR-VMAF, we were able to optimize our HDR encodes.

Open Source

Open Source Software Engineering Internet Internet

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

Dynatrace

DECEMBER 18, 2023

In this article, we take a closer look at Prometheus metrics and how we can ingest this data into Dynatrace. But often, we use additional services and solutions within our environment for backups, storage, networking, and more. As for the Collector, this will be the choice of tool for our implementation and this article.

Metrics

Metrics Engineering Energy Tuning

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Scalegrid

JUNE 4, 2020

They support PostgreSQL, MySQL and Redis, but for the sake of this article, we are going to focus on their PostgreSQL product. On average, ScaleGrid provides over 30% more storage vs. DigitalOcean for PostgreSQL at the same affordable price. So, which database service is right for your application? Compare Pricing. Instance Type/RAM.

Database

Database Latency Benchmarking Performance

Speed Trino Queries With These Performance-Tuning Tips

DZone

NOVEMBER 27, 2023

An open-source distributed SQL query engine, Trino is widely used for data analytics on distributed data storage. In this article, we will show you how to tune Trino by helping you identify performance bottlenecks and provide tuning tips that you can practice. But how do we do that?

Tuning

Tuning Speed Performance Open Source

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques. I would like to thank Daniel Kirkdorffer who reviewed the article and cleaned up the grammar. The rest of this article describes concrete data modeling techniques and patterns.

Database

Database Ecommerce Efficiency Engineering

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

DZone

JULY 3, 2023

Data engineering projects often require the setup and management of complex infrastructures that support data processing, storage, and analysis. In this article, we will explore the benefits of leveraging IaC for data engineering projects and provide detailed implementation steps to get started.

Data Engineering

Data Engineering Infrastructure Engineering Code

Kubernetes: Challenges for observability platforms

Dynatrace

NOVEMBER 23, 2020

Nevertheless, there are related components and processes, for example, virtualization infrastructure and storage systems (see image below), that can lead to problems in your Kubernetes infrastructure. Configuring storage in Kubernetes is more complex than using a file system on your host.

Virtualization

Virtualization Infrastructure Monitoring Cloud

Reduce Your Cloud Costs With Percona Kubernetes Operators

Percona

MARCH 1, 2023

Inputs These are the following instances that we will start with: AWS RDS for MySQL in us-east-1 10 x db.r5.4xlarge 200 GB storage each The cost of RDS consists mostly of two things – compute and storage. We will not consider data transfer or backup costs in this article. EBS gp2 storage is $0.10 hour or $46.08/day

Cloud

Cloud Storage AWS Database

Automating Twilio Recording Exports for Quality Purposes: Python Implementation Guidelines

Data Storage Formats for Big Data Analytics: Performance and Cost Implications of Parquet, Avro, and ORC

Trending Sources

Optimizing data warehouse storage

Block Size and Its Impact on Storage Performance

Article: Design Pattern Proposal for Autoscaling Stateful Systems

Article: Magic Pocket: Dropbox’s Exabyte-Scale Blob Storage System

Efficient Multimodal Data Processing: A Technical Deep Dive

Time Series Analysis: VAR-Model-As-A-Service Using Flask and MinIO

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

An Efficient Object Storage for JUnit Tests

RabbitMQ vs. Kafka: Key Differences

What is Greenplum Database? Intro to the Big Data Database

Metadata Synchronization in Alluxio: Design, Implementation, and Optimization

Time Series Analysis: VARMAX-As-A-Service

Transforming Business Outcomes Through Strategic NoSQL Database Selection

Cutting Big Data Costs: Effective Data Processing With Apache Spark

The Block Allocation Policy of Virtual Distributed File System at the Source Code Level

Low Overhead Continuous Contextual Production Profiling

How to Perform Load Testing Against Nebula Graph With K6

Using JSONB in PostgreSQL: How to Effectively Store & Index JSON Data in PostgreSQL

Medallion Architecture: Efficient Batch and Stream Processing Data Pipelines With Azure Databricks and Delta Lake

Building an Optimized Data Pipeline on Azure Using Spark, Data Factory, Databricks, and Synapse Analytics

Optimizing Prometheus and Grafana with the Prometheus Operator

Further improved handling and reliability of OneAgent deployments

Reducing Your Database Hosting Costs: DigitalOcean vs. AWS vs. Azure

Introducing Netflix’s Key-Value Data Abstraction Layer

Why growing AI adoption requires an AI observability strategy

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

Using Diskspd to test SQL Server Storage Subsystems

Improved configurability of OneAgent log file directory location

Improving Spark Memory Resource With Off-Heap In-Memory Storage

Configuring OpenTelemetry Agents to Enrich Data and Reduce Observability Costs

How Dynatrace protects its software development and delivery life cycle against supply chain attacks

Effortless Credential Management in Azure: The Power of Managed Identities

How to Optimize Elasticsearch for Better Search Performance

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

All of Netflix’s HDR video streaming is now dynamically optimized

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Speed Trino Queries With These Performance-Tuning Tips

NoSQL Data Modeling Techniques

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

Kubernetes: Challenges for observability platforms

Reduce Your Cloud Costs With Percona Kubernetes Operators

Stay Connected