Article, Big Data and Storage - Technology Performance Pulse

Data Storage Formats for Big Data Analytics: Performance and Cost Implications of Parquet, Avro, and ORC

DZone

SEPTEMBER 9, 2024

Efficient data processing is crucial for businesses and organizations that rely on big data analytics to make informed decisions. One key factor that significantly affects the performance of data processing is the storage format of the data.

Big Data

Big Data Storage Analytics Benchmarking

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages. When handling large amounts of complex data, or big data, chances are that your main machine might start getting crushed by all of the data it has to process in order to produce your analytics results.

Big Data

Big Data Database Artificial Intelligence Open Source

Cutting Big Data Costs: Effective Data Processing With Apache Spark

DZone

SEPTEMBER 14, 2023

In today's data-driven world, efficient data processing plays a pivotal role in the success of any project. Apache Spark , a robust open-source data processing framework, has emerged as a game-changer in this domain.

Big Data

Big Data Processing Open Source Games

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The pipelines can be stateful and the engine’s middleware should provide a persistent storage to enable state checkpointing. Interoperability with Hadoop.

Big Data

Big Data Processing Lambda Database

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

How to Optimize Elasticsearch for Better Search Performance

DZone

JULY 29, 2019

These processes are only possible with a distributed architecture and parallel processing mechanisms that Big Data tools are based on. One of the top trending open-source data storage that responds to most of the use cases is Elasticsearch.

Big Data

Big Data Government Open Source Storage

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The processed data is typically stored as data warehouse tables in AWS S3.

Latency

Latency Storage Big Data Tuning

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Storage provisioning.

Big Data

Big Data Storage Benchmarking Hardware

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

At the same time, NoSQL data modeling is not so well studied and lacks the systematic theory found in relational databases. In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques. General Notes on NoSQL Data Modeling.

Database

Database Ecommerce Efficiency Engineering

When Performance Matters, Think NVMe

DZone

MAY 21, 2019

The demand for more IT resource-intensive applications has significantly increased today, whether it is to process quicker transactions, gain real-time insight, crunch big data sets, or to meet customer expectations. That’s because NVMe provides 6x higher bandwidth and IOPS advantage compared to SAS/SATA SSD.

Performance

Performance Big Data Storage Processing

MySQL vs MongoDB: Best Choice for You

Scalegrid

FEBRUARY 11, 2025

This article will help you understand the core differences in data structure, scalability, and use cases. Whether you need a relational database for complex transactions or a NoSQL database for flexible data storage, weve got you covered. This allows for precise data manipulation and retrieval.

Scalability

Scalability Database Storage IoT

Probabilistic Data Structures for Web Analytics and Data Mining

Highly Scalable

MAY 1, 2012

In this article, I provide an overview of probabilistic data structures that allow one to estimate these and many other metrics and trade precision of the estimations for the memory consumption. I would like to thank Mikhail Khludnev and Kirill Uvaev, who reviewed this article and provided valuable suggestions. Case Study.

Analytics

Analytics Traffic Big Data Efficiency

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

With these goals in mind, two in-memory data stores, Redis and Memcached, have emerged as the top contenders. This article will explore how they handle data storage and scalability, perform in different scenarios, and, most importantly, how these factors influence your choice. Data transfer technology. 3d render.

Cache

Cache Storage Scalability Architecture

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

This article will explore hybrid cloud benefits and steps to craft a plan that aligns with your unique business challenges. Workloads from web content, big data analytics, and artificial intelligence stand out as particularly well-suited for hybrid cloud infrastructure owing to their fluctuating computational needs and scalability demands.

Strategy

Strategy Cloud Infrastructure Artificial Intelligence

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

All Things Distributed

NOVEMBER 19, 2010

In an in-depth article on Streaming Media Dan Rayburn analyzed the impact of Amazon Cloudfront move to GA: Amazons CDN Gets More Competitive, Adds SLA, New Edge Locations, Lower Pricing. Understanding Throughput-Oriented Architectures - background article in CACM on massively parallel and throughput vs latency oriented architectures.

AWS

AWS Cloud Benchmarking Storage

Choosing Consistency - All Things Distributed

All Things Distributed

FEBRUARY 24, 2010

I laid out some of these challenges in an article explaining the concept of eventual consistency. If you need to achieve high-availability and scalable performance, you will need to resort to data replication techniques. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway.

AWS

AWS Latency Database Scalability

Why You Should Spend More Time Thinking About Phone Call Tracking App

Tech News Gather

OCTOBER 7, 2023

This article sheds light on the often-underestimated capabilities of phone call tracking apps and why they deserve your undivided attention. By optimizing your marketing and customer service based on call data, you can outperform competitors who rely solely on digital analytics.

Strategy

Strategy Big Data Scalability Games

Utilities, Strategic Investments, and the CIO

The Agile Manager

FEBRUARY 27, 2012

The rise of Big Data - the ability to store and analyze large volumes of structured and unstructured, internal and external data - promises to let companies react more nimbly than ever before. A megabyte of cloud-based disk storage is no different from a kilowatt of electricity. Nor is cloud computing.

Ecommerce

Ecommerce Social Media Retail Airlines

Should You Use ClickHouse as a Main Operational Database?

Percona

JANUARY 14, 2019

In a partitioned massively parallel database system, the storage format and sorting algorithm may not be optimized for that operation as we are reading multiple partitions in parallel. To do that I’m using the ClickHouse function alphaTokens (body) which will split the “body” field into words.

Database

Database Analytics Blockchain Healthcare

Technology Performance Pulse

Data Storage Formats for Big Data Analytics: Performance and Cost Implications of Parquet, Avro, and ORC

What is Greenplum Database? Intro to the Big Data Database

Trending Sources

Cutting Big Data Costs: Effective Data Processing With Apache Spark

In-Stream Big Data Processing

Optimizing data warehouse storage

How to Optimize Elasticsearch for Better Search Performance

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Kubernetes for Big Data Workloads

NoSQL Data Modeling Techniques

When Performance Matters, Think NVMe

MySQL vs MongoDB: Best Choice for You

Probabilistic Data Structures for Web Analytics and Data Mining

Redis vs Memcached in 2024

Mastering Hybrid Cloud Strategy

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

Choosing Consistency - All Things Distributed

Why You Should Spend More Time Thinking About Phone Call Tracking App

Utilities, Strategic Investments, and the CIO

Should You Use ClickHouse as a Main Operational Database?

Stay Connected