Big Data and Cache - Technology Performance Pulse

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. In many cases join is performed on a finite time window or other type of buffer e.g. LFU cache that contains most frequent tuples in the stream. Towards Unified Big Data Processing.

Big Data

Big Data Processing Lambda Database

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Of the organizations in the Kubernetes survey, 71% run databases and caches in Kubernetes, representing a +48% year-over-year increase. Together with messaging systems (+36% growth), organizations are increasingly using databases and caches to persist application workload states.

Open Source

Open Source Java Operating System Programming

Comparing Apache Ignite In-Memory Cache Performance With Hazelcast In-Memory Cache and Java Native Hashmap

DZone

MARCH 16, 2020

This article compares different options for the in-memory maps and their performances in order for an application to move away from traditional RDBMS tables for frequently accessed data.

Cache

Cache Java Performance Database

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

We at Netflix, as a streaming service running on millions of devices, have a tremendous amount of data about device capabilities/characteristics and runtime data in our big data platform. With large data, comes the opportunity to leverage the data for predictive and classification based analysis.

Big Data

Big Data Cache Engineering Data Engineering

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

While data lakehouses combine the flexibility and cost-efficiency of data lakes with the querying capabilities of data warehouses, it’s important to understand how these storage environments differ. Data warehouses. Data warehouses were the original big data storage option.

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Additionally, for mismatches, we record the normalized and unnormalized responses from both sides to another big data table along with other relevant parameters, such as the diff. It helps expose memory leaks, deadlocks, caching issues, and other system issues.

Traffic

Traffic Latency Tuning Systems

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

All Things Distributed

AUGUST 22, 2011

Today AWS has launched Amazon ElastiCache , a new service that makes it easy to add distributed in-memory caching to any application. Amazon ElastiCache handles the complexity of creating, scaling and managing an in-memory cache to free up brainpower for more differentiating activities. Driving down the cost of Big-Data analytics.

Cloud

Cloud Cache AWS Storage

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

Key Takeaways Redis offers complex data structures and additional features for versatile data handling, while Memcached excels in simplicity with a fast, multi-threaded architecture for basic caching needs. Introduction Caching serves a dual purpose in web development – speeding up client requests and reducing server load.

Cache

Cache Storage Architecture Scalability

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

Helios also serves as a reference architecture for how Microsoft envisions its next generation of distributed big-data processing systems being built. What follows is a discussion of where big data systems might be heading, heavily inspired by the remarks in this paper, but with several of my own thoughts mixed in.

Cloud

Cloud Big Data Latency Architecture

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

There are two main types of DNS servers: authoritative servers and caching resolvers. But the real robustness of the DNS system comes through the way lookups are handled, which is what caching resolvers do. Caching techniques ensure that the DNS system doesnt get overloaded with queries. No Server Required - Jekyll & Amazon S3.

Cloud

Cloud Internet Internet AWS

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

The Morning Paper

MAY 14, 2019

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices Gan et al., An equally large fraction are due to compute contention, followed by network, cache, memory, and disk contention. ASPLOS’19. Distributed tracing and instrumentation.

Big Data

Big Data Cloud Performance Hardware

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Handling a storage system spread across multiple physical servers introduces complexities such as unpredictability in behavior, difficulties with testing procedures, and an overall increase in administrative complexity due to the dispersed nature of data.

Storage

Storage Systems Big Data Azure

Expanding the Cloud – An AWS Region is coming to Hong Kong

All Things Distributed

JUNE 20, 2017

Beyond running their web properties and applications, Next Digital also uses Amazon RDS (database), Amazon ElastiCache (caching), and Amazon Redshift (data warehousing). Next Digital operates on AWS in a more highly available and fault-tolerant environment than their previous colocation solution.

AWS

AWS Logistics Cloud Social Media

Fast key-value stores: an idea whose time has come and gone

The Morning Paper

JUNE 23, 2019

Generally to cache data (including non-persistent data that never sees a backing store), to share non-persistent data across application services (e.g. If you want to store time-expiring data that should be shared across application processes, used Memcached or Redis. Fetching too much data in a single query (i.e.,

Cache

Cache Latency Google Network

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

InfoQ

JULY 3, 2023

LinkedIn introduced Couchbase as a centralized caching tier for scaling member profile reads to handle increasing traffic that has outgrown their existing database cluster. The new solution achieved over 99% hit rate, helped reduce tail latencies by more than 60% and costs by 10% annually. By Rafal Gancarz

Cache

Cache Latency Traffic Database

Mastering Distributed SQL™ Databases in 2025

Scalegrid

JANUARY 10, 2025

They keep the features that developers like but can handle much more data, similar to NoSQL systems. Notably, they simplify handling big data flows, offer consistent transactions, and sustain high performance even when they’re used for real-time data analysis and complex queries.

Database

Database Scalability Best Practices Blockchain

I Used The Web For A Day On A 50 MB Budget

Smashing Magazine

JULY 29, 2019

MB , that suggests I’ve got around 29 pages in my budget, although probably a few more than that if I’m able to stay on the same sites and leverage browser caching. There’s a trade-off to be made here, as external stylesheets can be cached but inline ones cannot (unless you get clever with JavaScript ). Let’s talk about caching.

Cache

Cache Mobile Google Network

No Server Required - Jekyll & Amazon S3 - All Things Distributed

All Things Distributed

AUGUST 17, 2011

My templates and blog posts are now located in DropBox and thus locally cached at each machine I use. Driving down the cost of Big-Data analytics. There are a number of pages in the "categories" section that have not been regenerated as according to the website statistics not many of those were accessed.

Servers

Servers Social Media AWS Website

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

We use high-performance transactions systems, complex rendering and object caching, workflow and queuing systems, business intelligence and data analytics, machine learning and pattern recognition, neural networks and probabilistic decision making, and a wide variety of other techniques. Driving down the cost of Big-Data analytics.

Technology

Technology Technology AWS Storage

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

In 2018, we will see new data integration patterns those rely either on a shared high-performance distributed storage interface ( Alluxio ) or a common data format ( Apache Arrow ) sitting between compute and storage. For instance, Alluxio, originally known as Tachyon, can potentially use Arrow as its in-memory data structure.

Big Data

Big Data Artificial Intelligence Storage Hardware

The Winds of Architecture Changes at the USENIX ATC 2019

ACM Sigarch

NOVEMBER 1, 2019

Alongside more traditional sessions such as Real-World Deployed Systems and Big Data Programming Frameworks, there were many papers focusing on emerging hardware architectures, including embedded multi-accelerator SoCs, in-network and in-storage computing, FPGAs, GPUs, and low-power devices. ATC ’19 was refreshingly different.

Architecture

Architecture Hardware Cache Storage

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

ACM Sigarch

MAY 31, 2023

Heterogeneous and Composable Memory (HCM) offers a feasible solution for terabyte- or petabyte-scale systems, addressing the performance and efficiency demands of emerging big-data applications. However, building and utilizing HCM presents challenges, including interconnecting various memory technologies (e.g.,

Latency

Latency Hardware Cache Architecture

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

Hyper Dimension Shuffle describes how Microsoft improved the cost of data shuffling, one of the most costly operations, in their petabyte-scale internal big data analytics platform, SCOPE. BlockchainDB – it’s a blockchain underneath, and a database on top.

Blockchain

Blockchain Hardware Google Speed

Use Digital Twins for the Next Generation in Telematics

ScaleOut Software

NOVEMBER 24, 2020

It sends messages over the cell network to the telematics system, which uses its compute servers (that is, web and application servers) to store incoming messages as snapshots in an in-memory data grid , also known as a distributed cache. The results of batch analysis are typically produced after an hour’s delay or more.

Analytics

Analytics Architecture Scalability Software Architecture

Delta: A Data Synchronization and Enrichment Platform

The Netflix TechBlog

OCTOBER 15, 2019

Part I: Overview Andreas Andreakis , Falguni Jhaveri , Ioannis Papapanagiotou , Mark Cho , Poorna Reddy , Tongliang Liu Overview It is a commonly observed pattern for applications to utilize multiple datastores where each is used to serve a specific need such as storing the canonical form of data (MySQL etc.), caching (Memcached etc.),

Transportation

Transportation Architecture Processing Storage

Technology Performance Pulse

In-Stream Big Data Processing

Kubernetes in the wild report 2023

Trending Sources

Comparing Apache Ignite In-Memory Cache Performance With Hazelcast In-Memory Cache and Java Native Hashmap

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

Redis vs Memcached in 2024

Helios: hyperscale indexing for the cloud & edge – part 1

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

What is a Distributed Storage System

Expanding the Cloud – An AWS Region is coming to Hong Kong

Fast key-value stores: an idea whose time has come and gone

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

Mastering Distributed SQL™ Databases in 2025

I Used The Web For A Day On A 50 MB Budget

No Server Required - Jekyll & Amazon S3 - All Things Distributed

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

5 data integration trends that will define the future of ETL in 2018

The Winds of Architecture Changes at the USENIX ATC 2019

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

Use Digital Twins for the Next Generation in Telematics

Delta: A Data Synchronization and Enrichment Platform

Stay Connected