Architecture, Big Data and Cache - Technology Performance Pulse

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. In many cases join is performed on a finite time window or other type of buffer e.g. LFU cache that contains most frequent tuples in the stream. Towards Unified Big Data Processing.

Big Data

Big Data Processing Lambda Database

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Of the organizations in the Kubernetes survey, 71% run databases and caches in Kubernetes, representing a +48% year-over-year increase. Together with messaging systems (+36% growth), organizations are increasingly using databases and caches to persist application workload states.

Open Source

Open Source Java Operating System Programming

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

While data lakes and data warehousing architectures are commonly used modes for storing and analyzing data, a data lakehouse is an efficient third way to store and analyze data that unifies the two architectures while preserving the benefits of both. What is a data lakehouse? Data warehouses.

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

When undertaking system migrations, one of the main challenges is establishing confidence and seamlessly transitioning the traffic to the upgraded architecture without adversely impacting the customer experience. It helps expose memory leaks, deadlocks, caching issues, and other system issues.

Traffic

Traffic Latency Tuning Systems

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

Key Takeaways Redis offers complex data structures and additional features for versatile data handling, while Memcached excels in simplicity with a fast, multi-threaded architecture for basic caching needs. With these goals in mind, two in-memory data stores, Redis and Memcached, have emerged as the top contenders.

Cache

Cache Storage Architecture Scalability

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

Helios also serves as a reference architecture for how Microsoft envisions its next generation of distributed big-data processing systems being built. These two narratives of reference architecture and ingestion/indexing system are interwoven throughout the paper. Why do we need a new reference architecture?

Cloud

Cloud Big Data Latency Architecture

The Winds of Architecture Changes at the USENIX ATC 2019

ACM Sigarch

NOVEMBER 1, 2019

This blog post gives a glimpse of the computer systems research papers presented at the USENIX Annual Technical Conference (ATC) 2019, with an emphasis on systems that use new hardware architectures. As a consequence, the vast majority of the papers in the past has usually focused on conventional X86 or GPU-accelerated architectures.

Architecture

Architecture Hardware Cache Storage

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Their design emphasizes increasing availability by spreading out files among different nodes or servers — this approach significantly reduces risks associated with losing or corrupting data due to node failure. These distributed storage services also play a pivotal role in big data and analytics operations.

Storage

Storage Systems Big Data Azure

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

While registrars manage the namespace in the DNS naming architecture, DNS servers are used to provide the mapping between names and the addresses used to identify an access point. There are two main types of DNS servers: authoritative servers and caching resolvers. Driving down the cost of Big-Data analytics.

Cloud

Cloud Internet Internet AWS

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

To our shareowners: Random forests, naÃ¯ve Bayesian estimators, RESTful services, gossip protocols, eventual consistency, data sharding, anti-entropy, Byzantine quorum, erasure coding, vector clocks. Look inside a current textbook on software architecture, and youll find few patterns that we dont apply at Amazon.

Technology

Technology Technology AWS Storage

Mastering Distributed SQL™ Databases in 2025

Scalegrid

JANUARY 10, 2025

They keep the features that developers like but can handle much more data, similar to NoSQL systems. Notably, they simplify handling big data flows, offer consistent transactions, and sustain high performance even when they’re used for real-time data analysis and complex queries.

Database

Database Scalability Best Practices Blockchain

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

The Morning Paper

MAY 14, 2019

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices Gan et al., We’re not told how Seer figures out that a major architectural change has happened. An equally large fraction are due to compute contention, followed by network, cache, memory, and disk contention.

Big Data

Big Data Cloud Performance Hardware

Fast key-value stores: an idea whose time has come and gone

The Morning Paper

JUNE 23, 2019

Generally to cache data (including non-persistent data that never sees a backing store), to share non-persistent data across application services (e.g. If you want to store time-expiring data that should be shared across application processes, used Memcached or Redis. Fetching too much data in a single query (i.e.,

Cache

Cache Latency Google Network

5 data integration trends that will define the future of ETL in 2018

Abhishek Tiwari

DECEMBER 27, 2017

A common theme across all these trends is to remove the complexity by simplifying data management as a whole. In 2018, we anticipate that ETL will either lose relevance or the ETL process will disintegrate and be consumed by new data architectures. Unified data management architecture.

Big Data

Big Data Artificial Intelligence Storage Hardware

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

InfoQ

JULY 3, 2023

LinkedIn introduced Couchbase as a centralized caching tier for scaling member profile reads to handle increasing traffic that has outgrown their existing database cluster. The new solution achieved over 99% hit rate, helped reduce tail latencies by more than 60% and costs by 10% annually. By Rafal Gancarz

Cache

Cache Latency Traffic Database

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

ACM Sigarch

MAY 31, 2023

Introduction Memory systems are evolving into heterogeneous and composable architectures. Heterogeneous and Composable Memory (HCM) offers a feasible solution for terabyte- or petabyte-scale systems, addressing the performance and efficiency demands of emerging big-data applications.

Latency

Latency Hardware Cache Architecture

Use Digital Twins for the Next Generation in Telematics

ScaleOut Software

NOVEMBER 24, 2020

However, telematics architectures face challenges in responding to telemetry in real time. Current Telematics Architecture. The volume of incoming telemetry challenges current telematics systems to keep up and quickly make sense of all the data. Challenges for Current Architectures.

Analytics

Analytics Architecture Scalability Software Architecture

Delta: A Data Synchronization and Enrichment Platform

The Netflix TechBlog

OCTOBER 15, 2019

Part I: Overview Andreas Andreakis , Falguni Jhaveri , Ioannis Papapanagiotou , Mark Cho , Poorna Reddy , Tongliang Liu Overview It is a commonly observed pattern for applications to utilize multiple datastores where each is used to serve a specific need such as storing the canonical form of data (MySQL etc.), caching (Memcached etc.),

Transportation

Transportation Architecture Processing Storage

Technology Performance Pulse

In-Stream Big Data Processing

Kubernetes in the wild report 2023

Trending Sources

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Redis vs Memcached in 2024

Helios: hyperscale indexing for the cloud & edge – part 1

The Winds of Architecture Changes at the USENIX ATC 2019

What is a Distributed Storage System

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Mastering Distributed SQL™ Databases in 2025

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

Fast key-value stores: an idea whose time has come and gone

5 data integration trends that will define the future of ETL in 2018

How LinkedIn Serves Over 4.8 Million Member Profiles per Second

Current status, needs, and challenges in Heterogeneous and Composable Memory from the HCM workshop (HPCA’23)

Use Digital Twins for the Next Generation in Telematics

Delta: A Data Synchronization and Enrichment Platform

Stay Connected