Big Data, Network and Scalability - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

This feature-packed database provides powerful and rapid analytics on data that scales up to petabyte volumes. Greenplum uses an MPP database design that can help you develop a scalable, high performance deployment. High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages.

Big Data

Big Data Database Artificial Intelligence Open Source

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

By Alok Tiagi , Hariharan Ananthakrishnan , Ivan Porto Carrero and Keerti Lakshminarayan Netflix has developed a network observability sidecar called Flow Exporter that uses eBPF tracepoints to capture TCP flows at near real time. Without having network visibility, it’s difficult to improve our reliability, security and capacity posture.

Network

Network Transportation AWS Cloud

Write Optimized Spark Code for Big Data Applications

DZone

MARCH 7, 2023

Apache Spark is a powerful open-source distributed computing framework that provides a variety of APIs to support big data processing. In addition, pySpark applications can be tuned to optimize performance and achieve better execution time, scalability, and resource utilization.

Big Data

Big Data Code Tuning Open Source

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. In the previous section, we noted that many distributed query processing algorithms resemble message passing networks. It is conceptually similar to the in-stream processing pipelines.

Big Data

Big Data Processing Lambda Database

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

IT operations analytics is the process of unifying, storing, and contextually analyzing operational data to understand the health of applications, infrastructure, and environments and streamline everyday operations. ITOA collects operational data to identify patterns and anomalies for faster incident management and near-real-time insights.

Analytics

Analytics Artificial Intelligence Big Data Open Source

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

As cloud and big data complexity scales beyond the ability of traditional monitoring tools to handle, next-generation cloud monitoring and observability are becoming necessities for IT teams. With agent monitoring, third-party software collects data and reports from the component that’s attached to the agent.

Cloud

Cloud Monitoring Best Practices Infrastructure

What is container orchestration?

Dynatrace

MARCH 24, 2023

But managing the deployment, modification, networking, and scaling of multiple containers can quickly outstrip the capabilities of development and operations teams. This orchestration includes provisioning, scheduling, networking, ensuring availability, and monitoring container lifecycles. How does container orchestration work?

Infrastructure

Infrastructure Open Source Operating System Cloud

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Modern IT environments — whether multicloud, on-premises, or hybrid-cloud architectures — generate exponentially increasing data volumes. The number and variety of applications, network devices, serverless functions, and ephemeral containers grows continuously. And this expansion shows no sign of slowing down.

Analytics

Analytics Artificial Intelligence Storage Serverless

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges.

Big Data

Big Data Storage Benchmarking Hardware

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Key Takeaways Distributed storage systems benefit organizations by enhancing data availability, fault tolerance, and system scalability, leading to cost savings from reduced hardware needs, energy consumption, and personnel. Variations within these storage systems are called distributed file systems.

Storage

Storage Systems Big Data Azure

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

All Things Distributed

DECEMBER 13, 2016

With the launch of the AWS Europe (London) Region, AWS can enable many more UK enterprise, public sector and startup customers to reduce IT costs, address data locality needs, and embark on rapid transformations in critical new areas, such as big data analysis and Internet of Things. Fraud.net is a good example of this.

AWS

AWS Cloud Artificial Intelligence IoT

What is RabbitMQ Used For

Scalegrid

JUNE 28, 2024

Distributed Systems In distributed systems’ sprawling networks, RabbitMQ is the glue that holds disparate components together. This system allows for scalability and efficiency, demonstrating RabbitMQ’s versatility in real-world applications where speed and reliability are crucial.

IoT

IoT Healthcare Programming Open Source

Ensuring Performance, Efficiency, and Scalability of Digital Transformation

Alex Podelko

DECEMBER 19, 2019

Boris has unique expertise in that area – especially in Big Data applications. To facilitate discussions, in addition to Q&A, we have panels, “Meeting of the Minds” sessions, and networking events. How to select appropriate IT Infrastructure to support Digital Transformation by Boris Zibitsker, BEZNext.

Efficiency

Efficiency Artificial Intelligence Scalability Performance

Mastering Distributed SQL™ Databases in 2025

Scalegrid

JANUARY 10, 2025

Heading into 2024, SQL databases will remain essential in data management, increasingly using distributed systems to meet growing needs for scalability and reliability. They keep the features that developers like but can handle much more data, similar to NoSQL systems.

Database

Database Scalability Best Practices Blockchain

Expanding the Cloud – An AWS Region is coming to Hong Kong

All Things Distributed

JUNE 20, 2017

After the launch of the AWS APAC (Hong Kong) Region, there will be 19 Availability Zones in Asia Pacific for customers to build flexible, scalable, secure, and highly available applications. As well as AWS Regions, we also have 21 AWS Edge Network Locations in Asia Pacific.

AWS

AWS Logistics Cloud Social Media

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

All Things Distributed

DECEMBER 8, 2016

Given this, enterprises, public sector bodies, startups, and small businesses are looking to adopt agile, scalable, and secure public cloud solutions. Access to secure, scalable, low-cost AWS infrastructure in Canada allows customers to innovate and provide tools to meet privacy, sovereignty, and compliance requirements. Scalability.

AWS

AWS Cloud Lambda Innovation

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

This approach allows companies to combine the security and control of private clouds with public clouds’ scalability and innovation potential. Mastering Hybrid Cloud Strategy Are you looking to leverage the best private and public cloud worlds to propel your business forward? A hybrid cloud strategy could be your answer.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Expanding the Cloud - Introducing the AWS Asia Pacific (Tokyo.

All Things Distributed

MARCH 2, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Japanese companies and consumers have become used to low latency and high-speed networking available between their businesses, residences, and mobile devices. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Comments ().

AWS

AWS Cloud Games Latency

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios. Data transfer technology.

Cache

Cache Storage Scalability Architecture

Välkommen till Stockholm – An AWS Region is coming to the Nordics

All Things Distributed

APRIL 4, 2017

After the launch of the AWS EU (Stockholm) Region, there will be 13 Availability Zones in Europe for customers to build flexible, scalable, secure, and highly available applications. It will also give customers another region where they can store their data with the knowledge that it will not leave the EU unless they move it.

AWS

AWS Airlines Latency Games

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

All Things Distributed

DECEMBER 5, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Often these namespaces are hierarchical in nature such that it becomes easier to manage them and to decentralize control, which makes the system more scalable. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. No lock-in.

Cloud

Cloud Internet Internet AWS

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Werner Vogels weblog on building scalable and robust distributed systems. The storage systems weve pioneered demonstrate extreme scalability while maintaining tight control over performance, availability, and cost. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. All Things Distributed.

Technology

Technology Technology AWS Storage

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Werner Vogels weblog on building scalable and robust distributed systems. During my academic career, I spent many years working on HPC technologies such as user-level networking interfaces, large scale high-speed interconnects, HPC software stacks, etc. Driving down the cost of Big-Data analytics. All Things Distributed.

Cloud

Cloud AWS Automotive Latency

No Server Required - Jekyll & Amazon S3 - All Things Distributed

All Things Distributed

AUGUST 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. If you have a largely static site you can rely on the enormous power of S3 to make serving your content highly scalable and storing it extremely durable. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Comments ().

Servers

Servers Social Media AWS Website

The AWS GovCloud (US) Region - All Things Distributed

All Things Distributed

AUGUST 16, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Government and Big Data. One particular early use case for AWS GovCloud (US) will be massive data processing and analytics. The scalability, flexibility and the elasticity of AWS makes it an ideal environment for the agencies to run their analytics.

AWS

AWS Government Big Data Cloud

Advancing Application Performance With NVMe Storage, Part 2

DZone

JUNE 3, 2019

Using local SSDs inside of the GPU node delivers fast access to data during training, but introduces challenges that impact the overall solution in terms of scalability, data access, and data protection.

Storage

Storage Performance Network Scalability

Dutch Enterprises and The Cloud

All Things Distributed

SEPTEMBER 6, 2013

Shell leverages AWS for big data analytics to help achieve these goals. Due to the exponential growth of the biology and informatics fields, Unilever needs to maintain this new program within a highly-scalable environment that supports parallel computation and heavy data storage demands.

Cloud

Cloud Energy AWS Healthcare

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

If a cyber network agent has observed an unusual pattern of failed login attempts, it needs to alert downstream network nodes (servers and routers) to block the kill chain in a potential attack. The list goes on. The Limitations of Today’s Streaming Analytics. A New Approach: Real-Time Device Tracking.

IoT

IoT Big Data Analytics Architecture

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

AutoOptimize relies on some of the Iceberg specific features such as snapshot and atomic operations to perform the optimizations in an accurate and scalable manner. AutoOptimize reduces end to end lag in data processing by optimizing as we go. Other Components Iceberg We use Apache Iceberg as the table format.

Storage

Storage Latency Efficiency Data Engineering

Expanding the Cloud - AWS Import/Export Support for Amazon EBS.

All Things Distributed

JULY 7, 2011

Werner Vogels weblog on building scalable and robust distributed systems. AWS Import/Export transfers data off of storage devices using Amazons high-speed internal network and bypassing the Internet. With this new functionality AWS Import/Export now supports importing data directly into Amazon EBS snapshots. Comments ().

AWS

AWS Cloud Storage Internet

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

All Things Distributed

AUGUST 20, 2012

Werner Vogels weblog on building scalable and robust distributed systems. With Amazon Glacier any organization now has access to the same data archiving capabilities as the worldâ??s for those datasets that are too large to transmit via the network AWS offers the ability to up- and download data from disks that can be shipped.

Storage

Storage Cloud AWS Media

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

All Things Distributed

JANUARY 19, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Elastic Beanstalk makes it easy for developers to deploy and manage scalable and fault-tolerant applications on the AWS cloud. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. All Things Distributed. Comments ().

AWS

AWS Cloud Java Scalability

Register for AWS re: Invent - All Things Distributed

All Things Distributed

JULY 16, 2012

Werner Vogels weblog on building scalable and robust distributed systems. There are sessions in many different categories: Architecture, Big Data, HPC, Computer & Networking, Storage, Databases, Security, Tools & Languages, Media Sharing & Content Delivery, Managing AWS Resources, Enterprise IT, Mobile, Start-up, and more.

AWS

AWS Big Data Media Storage

5 Terabyte Object Support in Amazon S3 - All Things Distributed

All Things Distributed

DECEMBER 9, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Big Just Got Bigger - 5 Terabyte Object Support in Amazon S3. Amazon S3 has always been a scalable, durable and available data repository for almost any customer workload. Driving down the cost of Big-Data analytics. Comments ().

AWS

AWS Big Data Scalability Storage

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Starting today Amazon EMR can take advantage of the Cluster Compute and Cluster GPU instances, giving customers ever more powerful components to base the large scale data processing and analysis on. Driving down the cost of Big-Data analytics.

AWS

AWS Programming Latency Architecture

MapReduce Patterns, Algorithms, and Use Cases

Highly Scalable

JANUARY 31, 2012

In other words, it can be more efficient to sort data once during insertion than sort them for each MapReduce query. Applications: ETL, Data Analysis. Problem Statement: There is a network of entities and relationships between them. Not-So-Basic MapReduce Patterns. Iterative Message Passing (Graph Processing).

C++

C++ Network Ecommerce Processing

Fast key-value stores: an idea whose time has come and gone

The Morning Paper

JUNE 23, 2019

We’ve seen similar high marshalling overheads in big data systems too.) Fetching too much data in a single query (i.e., If you decompose data across multiple keys to avoid this, you then typically run into cross-key atomicity issues. Over and above RTT times, the size of the data to be transferred also matters.

Cache

Cache Latency Google Network

The Winds of Architecture Changes at the USENIX ATC 2019

ACM Sigarch

NOVEMBER 1, 2019

Alongside more traditional sessions such as Real-World Deployed Systems and Big Data Programming Frameworks, there were many papers focusing on emerging hardware architectures, including embedded multi-accelerator SoCs, in-network and in-storage computing, FPGAs, GPUs, and low-power devices. Heterogeneous ISA. Final words.

Architecture

Architecture Hardware Cache Storage

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage.

All Things Distributed

MAY 18, 2010

Werner Vogels weblog on building scalable and robust distributed systems. This new storage option enables customers to reduce their costs by storing non-critical, reproducible data at lower levels of redundancy. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. All Things Distributed.

Storage

Storage Cloud AWS Scalability

Choosing Consistency - All Things Distributed

All Things Distributed

FEBRUARY 24, 2010

Werner Vogels weblog on building scalable and robust distributed systems. There are many factors that come into play when you need to meet stringent availability and performance requirements under ultra-scalable conditions. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Choosing Consistency.

AWS

AWS Latency Database Scalability

Use Digital Twins for the Next Generation in Telematics

ScaleOut Software

NOVEMBER 24, 2020

Real-Time Digital Twins Can Add Important New Capabilities to Telematics Systems and Eliminate Scalability Bottlenecks. At the same time, telemetry snapshots are stored in a data lake, such as HDFS , for offline batch analysis and visualization using big data tools like Spark.

Analytics

Analytics Architecture Scalability Software Architecture

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

The Morning Paper

SEPTEMBER 19, 2019

I don’t think so in this case, but this paper will take you down into the nitty-gritty of getting the best out of modern processors and networks, with up to two orders of magnitude single node throughput gains to be had. What if the network was no longer the bottleneck? Maybe we should be switching to active-memory replication ?

Blockchain

Blockchain Hardware Google Speed

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 30, 2018

Apart from networking, attending conferences like LISA in person is an effective way to upgrade your skills: you can block out work interruptions and absorb new knowledge that's been neatly summarized into sessions. We first met each other at LISA, in addition to making many other important industry connections over the years.

DevOps

DevOps Network Best Practices Programming

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 29, 2018

Apart from networking, attending conferences like LISA in person is an effective way to upgrade your skills: you can block out work interruptions and absorb new knowledge that's been neatly summarized into sessions. We first met each other at LISA, in addition to making many other important industry connections over the years.

DevOps

DevOps Network Best Practices Programming

What is Greenplum Database? Intro to the Big Data Database

How Netflix uses eBPF flow logs at scale for network insight

Trending Sources

Write Optimized Spark Code for Big Data Applications

In-Stream Big Data Processing

What is IT operations analytics? Extract more data insights from more sources

What is cloud monitoring? How to improve your full-stack visibility

What is container orchestration?

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Kubernetes for Big Data Workloads

What is a Distributed Storage System

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

What is RabbitMQ Used For

Ensuring Performance, Efficiency, and Scalability of Digital Transformation

Mastering Distributed SQL™ Databases in 2025

Expanding the Cloud – An AWS Region is coming to Hong Kong

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

Mastering Hybrid Cloud Strategy

Expanding the Cloud - Introducing the AWS Asia Pacific (Tokyo.

Redis vs Memcached in 2024

Välkommen till Stockholm – An AWS Region is coming to the Nordics

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

No Server Required - Jekyll & Amazon S3 - All Things Distributed

The AWS GovCloud (US) Region - All Things Distributed

Advancing Application Performance With NVMe Storage, Part 2

Dutch Enterprises and The Cloud

The Need for Real-Time Device Tracking

Optimizing data warehouse storage

Expanding the Cloud - AWS Import/Export Support for Amazon EBS.

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

Register for AWS re: Invent - All Things Distributed

5 Terabyte Object Support in Amazon S3 - All Things Distributed

Amazon EC2 Cluster GPU Instances - All Things Distributed

MapReduce Patterns, Algorithms, and Use Cases

Fast key-value stores: an idea whose time has come and gone

The Winds of Architecture Changes at the USENIX ATC 2019

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage.

Choosing Consistency - All Things Distributed

Use Digital Twins for the Next Generation in Telematics

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

USENIX LISA 2018: CFP Now Open

USENIX LISA 2018: CFP Now Open

Stay Connected