Architecture, Big Data and Scalability - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

In this blog post, we explain what Greenplum is, and break down the Greenplum architecture, advantages, major use cases, and how to get started. It’s architecture was specially designed to manage large-scale data warehouses and business intelligence workloads by giving you the ability to spread your data out across a multitude of servers.

Big Data

Big Data Database Artificial Intelligence Open Source

What Should You Know About Graph Database’s Scalability?

DZone

JANUARY 20, 2023

Having a distributed and scalable graph database system is highly sought after in many enterprise scenarios. Do Not Be Misled Designing and implementing a scalable graph database system has never been a trivial task.

Scalability

Scalability Big Data Hardware Internet

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. As a result, the input data typically goes from the data source to the in-stream pipeline via a persistent buffer that allows clients to move their reading pointers back and forth.

Big Data

Big Data Processing Lambda Database

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

Then, big data analytics technologies, such as Hadoop, NoSQL, Spark, or Grail, the Dynatrace data lakehouse technology, interpret this information. Here are the six steps of a typical ITOA process : Define the data infrastructure strategy. Identify data use cases and develop a scalable delivery model with documentation.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

We adopted the following mission statement to guide our investments: “Provide a complete and accurate data lineage system enabling decision-makers to win moments of truth.” Nonetheless, Netflix data landscape (see below) is complex and many teams collaborate effectively for sharing the responsibility of our data system management.

Infrastructure

Infrastructure Big Data Transportation Architecture

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Limited data availability constrains value creation. Modern IT environments — whether multicloud, on-premises, or hybrid-cloud architectures — generate exponentially increasing data volumes. Grail addresses today’s challenges of big data and cloud everywhere: Grail is highly scalable, cost-effective, and super-fast.

Analytics

Analytics Artificial Intelligence Storage Serverless

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

By collecting, accessing and analyzing network data from a variety of sources like VPC Flow Logs , ELB Access Logs, eBPF flow logs on the instances, etc, we can provide network insight to users and central teams through multiple data visualization techniques like Lumen , Atlas , etc. What is BPF?

Network

Network Transportation AWS Cloud

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

As cloud and big data complexity scales beyond the ability of traditional monitoring tools to handle, next-generation cloud monitoring and observability are becoming necessities for IT teams. Hybrid cloud combines an on-premises or private data center with public cloud infrastructure. What is cloud monitoring?

Cloud

Cloud Monitoring Best Practices Infrastructure

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Logs highlight observability challenges Ingesting, storing, and processing the unprecedented explosion of data from sources such as software as a service, multicloud environments, containers, and serverless architectures can be overwhelming for today’s organizations.

Analytics

Analytics Infrastructure Storage Architecture

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Through effortless provisioning, a larger number of small hosts provide a cost-effective and scalable platform. On-premises data centers invest in higher capacity servers since they provide more flexibility in the long run, while the procurement price of hardware is only one of many cost factors.

Open Source

Open Source Java Operating System Programming

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

As Big data and ML became more prevalent and impactful, the scalability, reliability, and usability of the orchestrating ecosystem have increasingly become more important for our data scientists and the company. Meson was based on a single leader architecture with high availability.

Java

Java Scalability Traffic Architecture

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

When undertaking system migrations, one of the main challenges is establishing confidence and seamlessly transitioning the traffic to the upgraded architecture without adversely impacting the customer experience. This blog series will examine the tools, techniques, and strategies we have utilized to achieve this goal.

Traffic

Traffic Latency Tuning Systems

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges. Performance.

Big Data

Big Data Storage Benchmarking Hardware

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

To address this, we propose developing an intelligent agent that can automatically discover, map, and query all data within an enterprise. This “Enterprise Data Model/Architect Agent” employs generative AI techniques for autonomous enterprise data modeling and architecture.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The processed data is typically stored as data warehouse tables in AWS S3. Moving data with Bulldozer at Netflix.

Latency

Latency Storage Big Data Tuning

What is container orchestration?

Dynatrace

MARCH 24, 2023

Using Marathon, its data center operating system (DC/OS) plugin, Mesos becomes a full container orchestration environment that, like Kubernetes and Docker Swarm, discovers services, balances loads, and manages application containers. Mesos also supports other orchestration engines, including Kubernetes and Docker Swarm.

Infrastructure

Infrastructure Open Source Operating System Cloud

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Key Takeaways Distributed storage systems benefit organizations by enhancing data availability, fault tolerance, and system scalability, leading to cost savings from reduced hardware needs, energy consumption, and personnel. Variations within these storage systems are called distributed file systems.

Storage

Storage Systems Big Data Azure

MySQL vs MongoDB: Best Choice for You

Scalegrid

FEBRUARY 11, 2025

This article will help you understand the core differences in data structure, scalability, and use cases. Whether you need a relational database for complex transactions or a NoSQL database for flexible data storage, weve got you covered. Diverging from MySQLs methodology, MongoDB employs an architecture without fixed schemas.

Scalability

Scalability Database Storage IoT

What is RabbitMQ Used For

Scalegrid

JUNE 28, 2024

Integrating such a backend service system supported by RabbitMQ into a web application’s architecture can drastically alter its operational dynamics. It enables the smooth processing of various actions like uploading user content, sending notifications, or performing heavy-duty data operations.

IoT

IoT Healthcare Programming Open Source

Mastering Distributed SQL™ Databases in 2025

Scalegrid

JANUARY 10, 2025

Heading into 2024, SQL databases will remain essential in data management, increasingly using distributed systems to meet growing needs for scalability and reliability. They keep the features that developers like but can handle much more data, similar to NoSQL systems.

Database

Database Scalability Best Practices Blockchain

Streaming SQL in Data Mesh

The Netflix TechBlog

NOVEMBER 3, 2023

The Data Mesh SQL Processor is a platform-managed, parameterized Flink Job that takes schematized sources and a Flink SQL query that will be executed against those sources. By leveraging Flink SQL within a Data Mesh Processor, we were able to support the streaming SQL functionality without changing the architecture of Data Mesh.

Processing

Processing Engineering Infrastructure Latency

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

This approach allows companies to combine the security and control of private clouds with public clouds’ scalability and innovation potential. Defining Hybrid Cloud Strategy The decision-making process about where to situate data and applications is vital to any hybrid cloud solution. A hybrid cloud strategy could be your answer.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios.

Cache

Cache Storage Scalability Architecture

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

Whether in analyzing A/B tests, optimizing studio production, training algorithms, investing in content acquisition, detecting security breaches, or optimizing payments, well structured and accurate data is foundational. Backfill: Backfilling datasets is a common operation in big data processing. append, overwrite, etc.).

Processing

Processing Big Data Efficiency Engineering

The Winds of Architecture Changes at the USENIX ATC 2019

ACM Sigarch

NOVEMBER 1, 2019

This blog post gives a glimpse of the computer systems research papers presented at the USENIX Annual Technical Conference (ATC) 2019, with an emphasis on systems that use new hardware architectures. As a consequence, the vast majority of the papers in the past has usually focused on conventional X86 or GPU-accelerated architectures.

Architecture

Architecture Hardware Cache Storage

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Werner Vogels weblog on building scalable and robust distributed systems. To our shareowners: Random forests, naÃ¯ve Bayesian estimators, RESTful services, gossip protocols, eventual consistency, data sharding, anti-entropy, Byzantine quorum, erasure coding, vector clocks. Driving down the cost of Big-Data analytics.

Technology

Technology Technology AWS Storage

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

Werner Vogels weblog on building scalable and robust distributed systems. These trade-offs have even impacted the way the lowest level building blocks in our computer architectures have been designed. All Things Distributed. Expanding the Cloud - Adding the Incredible Power of the Amazon EC2 Cluster GPU Instances. Comments ().

AWS

AWS Programming Latency Architecture

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

All Things Distributed

DECEMBER 8, 2016

Given this, enterprises, public sector bodies, startups, and small businesses are looking to adopt agile, scalable, and secure public cloud solutions. Access to secure, scalable, low-cost AWS infrastructure in Canada allows customers to innovate and provide tools to meet privacy, sovereignty, and compliance requirements. Scalability.

AWS

AWS Cloud Lambda Innovation

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Scalability Engineering

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

Werner Vogels weblog on building scalable and robust distributed systems. The scalability, reliability and durability requirements for Cloud Drive are very high which is why they decided to make use of the Amazon Simple Storage Service (S3) as the core component of their service. Driving down the cost of Big-Data analytics.

AWS

AWS Cloud Storage Internet

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

Today’s streaming analytics architectures are not equipped to make sense of this rapidly changing information and react to it as it arrives. This data is also periodically uploaded to a data lake for offline batch analysis that calculates key statistics and looks for big trends that can help optimize operations.

IoT

IoT Big Data Analytics Architecture

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits. This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture.

Storage

Storage Latency Efficiency Data Engineering

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Additionally, many high-end HPC applications take advantage of knowing their in-house hardware platforms to achieve major speedup by exploiting the specific processor architecture. Driving down the cost of Big-Data analytics. Comments ().

Cloud

Cloud AWS Automotive Latency

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Often these namespaces are hierarchical in nature such that it becomes easier to manage them and to decentralize control, which makes the system more scalable. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications.

Cloud

Cloud Internet Internet AWS

Register for AWS re: Invent - All Things Distributed

All Things Distributed

JULY 16, 2012

Werner Vogels weblog on building scalable and robust distributed systems. There are sessions in many different categories: Architecture, Big Data, HPC, Computer & Networking, Storage, Databases, Security, Tools & Languages, Media Sharing & Content Delivery, Managing AWS Resources, Enterprise IT, Mobile, Start-up, and more.

AWS

AWS Big Data Media Storage

Dutch Enterprises and The Cloud

All Things Distributed

SEPTEMBER 6, 2013

Shell leverages AWS for big data analytics to help achieve these goals. Due to the exponential growth of the biology and informatics fields, Unilever needs to maintain this new program within a highly-scalable environment that supports parallel computation and heavy data storage demands.

Cloud

Cloud Energy AWS Healthcare

Job Openings in AWS - Senior Leader in Database Services - All.

All Things Distributed

AUGUST 19, 2011

Werner Vogels weblog on building scalable and robust distributed systems. This team is constantly rethinking the assumptions behind how traditional databases were built and constantly working on building the right database architectures suited for the Cloud environment. Driving down the cost of Big-Data analytics.

AWS

AWS Database Storage Scalability

Around the World in 28 Days - All Things Distributed

All Things Distributed

SEPTEMBER 30, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Visiting future customers is equally exiting as you get a change to understand their current architecture, if it is a migration, and how they plan to exploit cloud services in their new setup. Driving down the cost of Big-Data analytics.

AWS

AWS Cloud Storage Best Practices

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

All Things Distributed

NOVEMBER 19, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Big news this week was of course the launch of Cluster GPU instances for Amazon EC2. Understanding Throughput-Oriented Architectures - background article in CACM on massively parallel and throughput vs latency oriented architectures. Comments ().

AWS

AWS Cloud Benchmarking Storage

Fast key-value stores: an idea whose time has come and gone

The Morning Paper

JUNE 23, 2019

The stateless + RInk (S+RInK) architecture attempts to provide the best of both worlds: to simultaneously offer both the implementation and operational simplicity of stateless application servers and the performance benefits of servers caching state in RAM. We’ve seen similar high marshalling overheads in big data systems too.)

Cache

Cache Latency Google Network

Use Digital Twins for the Next Generation in Telematics

ScaleOut Software

NOVEMBER 24, 2020

Real-Time Digital Twins Can Add Important New Capabilities to Telematics Systems and Eliminate Scalability Bottlenecks. However, telematics architectures face challenges in responding to telemetry in real time. Current Telematics Architecture. Challenges for Current Architectures.

Analytics

Analytics Architecture Scalability Software Architecture

What is Greenplum Database? Intro to the Big Data Database

What Should You Know About Graph Database’s Scalability?

Trending Sources

In-Stream Big Data Processing

What is IT operations analytics? Extract more data insights from more sources

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

How Netflix uses eBPF flow logs at scale for network insight

What is cloud monitoring? How to improve your full-stack visibility

Conducting log analysis with an observability platform and full data context

Kubernetes in the wild report 2023

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Kubernetes for Big Data Workloads

A Recap of the Data Engineering Open Forum at Netflix

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

What is container orchestration?

What is a Distributed Storage System

MySQL vs MongoDB: Best Choice for You

What is RabbitMQ Used For

Mastering Distributed SQL™ Databases in 2025

Streaming SQL in Data Mesh

Mastering Hybrid Cloud Strategy

Redis vs Memcached in 2024

Incremental Processing using Netflix Maestro and Apache Iceberg

The Winds of Architecture Changes at the USENIX ATC 2019

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Amazon EC2 Cluster GPU Instances - All Things Distributed

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Music to my Ears - All Things Distributed

The Need for Real-Time Device Tracking

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Optimizing data warehouse storage

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

Register for AWS re: Invent - All Things Distributed

Dutch Enterprises and The Cloud

Job Openings in AWS - Senior Leader in Database Services - All.

Around the World in 28 Days - All Things Distributed

This week in review: GPUs, Zombies, Biomimicry and Tom Waits.

Fast key-value stores: an idea whose time has come and gone

Use Digital Twins for the Next Generation in Telematics

Stay Connected