Big Data, Cache and Scalability - Technology Performance Pulse

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. In many cases join is performed on a finite time window or other type of buffer e.g. LFU cache that contains most frequent tuples in the stream. Towards Unified Big Data Processing.

Big Data

Big Data Processing Lambda Database

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Through effortless provisioning, a larger number of small hosts provide a cost-effective and scalable platform. On-premises data centers invest in higher capacity servers since they provide more flexibility in the long run, while the procurement price of hardware is only one of many cost factors.

Open Source

Open Source Java Operating System Programming

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

The first phase involves validating functional correctness, scalability, and performance concerns and ensuring the new systems’ resilience before the migration. Additionally, for mismatches, we record the normalized and unnormalized responses from both sides to another big data table along with other relevant parameters, such as the diff.

Traffic

Traffic Latency Tuning Systems

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

All Things Distributed

AUGUST 22, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Today AWS has launched Amazon ElastiCache , a new service that makes it easy to add distributed in-memory caching to any application. Systems that make extensive use of caching almost all report a significant reduction in the cost of their database tier.

Cloud

Cloud Cache AWS Storage

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios.

Cache

Cache Storage Architecture Scalability

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Key Takeaways Distributed storage systems benefit organizations by enhancing data availability, fault tolerance, and system scalability, leading to cost savings from reduced hardware needs, energy consumption, and personnel. Variations within these storage systems are called distributed file systems.

Storage

Storage Systems Big Data Azure

Mastering Distributed SQL™ Databases in 2025

Scalegrid

JANUARY 10, 2025

Heading into 2024, SQL databases will remain essential in data management, increasingly using distributed systems to meet growing needs for scalability and reliability. They keep the features that developers like but can handle much more data, similar to NoSQL systems.

Database

Database Scalability Best Practices Blockchain

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Often these namespaces are hierarchical in nature such that it becomes easier to manage them and to decentralize control, which makes the system more scalable. There are two main types of DNS servers: authoritative servers and caching resolvers.

Cloud

Cloud Internet Internet AWS

Expanding the Cloud – An AWS Region is coming to Hong Kong

All Things Distributed

JUNE 20, 2017

After the launch of the AWS APAC (Hong Kong) Region, there will be 19 Availability Zones in Asia Pacific for customers to build flexible, scalable, secure, and highly available applications. In 2010, we opened our first AWS Region in Singapore and since then have opened additional regions: Japan, Australia, China, Korea, and India.

AWS

AWS Logistics Cloud Social Media

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Werner Vogels weblog on building scalable and robust distributed systems. The storage systems weve pioneered demonstrate extreme scalability while maintaining tight control over performance, availability, and cost. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. All Things Distributed.

Technology

Technology Technology AWS Storage

No Server Required - Jekyll & Amazon S3 - All Things Distributed

All Things Distributed

AUGUST 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. If you have a largely static site you can rely on the enormous power of S3 to make serving your content highly scalable and storing it extremely durable. My templates and blog posts are now located in DropBox and thus locally cached at each machine I use.

Servers

Servers Social Media AWS Website

Fast key-value stores: an idea whose time has come and gone

The Morning Paper

JUNE 23, 2019

Generally to cache data (including non-persistent data that never sees a backing store), to share non-persistent data across application services (e.g. If you want to store time-expiring data that should be shared across application processes, used Memcached or Redis. Fetching too much data in a single query (i.e.,

Cache

Cache Latency Google Network

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

InfoQ

JULY 3, 2023

LinkedIn introduced Couchbase as a centralized caching tier for scaling member profile reads to handle increasing traffic that has outgrown their existing database cluster. The new solution achieved over 99% hit rate, helped reduce tail latencies by more than 60% and costs by 10% annually. By Rafal Gancarz

Cache

Cache Latency Traffic Database

The Winds of Architecture Changes at the USENIX ATC 2019

ACM Sigarch

NOVEMBER 1, 2019

Alongside more traditional sessions such as Real-World Deployed Systems and Big Data Programming Frameworks, there were many papers focusing on emerging hardware architectures, including embedded multi-accelerator SoCs, in-network and in-storage computing, FPGAs, GPUs, and low-power devices. ATC ’19 was refreshingly different.

Architecture

Architecture Hardware Cache Storage

Use Digital Twins for the Next Generation in Telematics

ScaleOut Software

NOVEMBER 24, 2020

Real-Time Digital Twins Can Add Important New Capabilities to Telematics Systems and Eliminate Scalability Bottlenecks. At the same time, telemetry snapshots are stored in a data lake, such as HDFS , for offline batch analysis and visualization using big data tools like Spark.

Analytics

Analytics Architecture Scalability Software Architecture

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

ACM Sigarch

MAY 31, 2023

Heterogeneous and Composable Memory (HCM) offers a feasible solution for terabyte- or petabyte-scale systems, addressing the performance and efficiency demands of emerging big-data applications. The memory bandwidth will be a key player because the traditional method to add memory bandwidth by adding memory channels is not scalable.

Latency

Latency Hardware Cache Architecture

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

Hyper Dimension Shuffle describes how Microsoft improved the cost of data shuffling, one of the most costly operations, in their petabyte-scale internal big data analytics platform, SCOPE. BlockchainDB – it’s a blockchain underneath, and a database on top.

Blockchain

Blockchain Hardware Google Speed

Technology Performance Pulse

In-Stream Big Data Processing

Kubernetes in the wild report 2023

Trending Sources

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

Redis vs Memcached in 2024

What is a Distributed Storage System

Mastering Distributed SQL™ Databases in 2025

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Expanding the Cloud – An AWS Region is coming to Hong Kong

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

No Server Required - Jekyll & Amazon S3 - All Things Distributed

Fast key-value stores: an idea whose time has come and gone

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

The Winds of Architecture Changes at the USENIX ATC 2019

Use Digital Twins for the Next Generation in Telematics

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

Stay Connected