Big Data, Processing and Scalability - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum Database is a massively parallel processing (MPP) SQL database that is built and based on PostgreSQL. It can scale towards a multi-petabyte level data workload without a single issue, and it allows access to a cluster of powerful servers that will work together within a single SQL interface where you can view all of the data.

Big Data

Big Data Database Artificial Intelligence Open Source

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that real-time query processing and in-stream processing is the immediate need in many practical applications. Fault-tolerance.

Big Data

Big Data Processing Lambda Database

Write Optimized Spark Code for Big Data Applications

DZone

MARCH 7, 2023

Apache Spark is a powerful open-source distributed computing framework that provides a variety of APIs to support big data processing. In addition, pySpark applications can be tuned to optimize performance and achieve better execution time, scalability, and resource utilization.

Big Data

Big Data Code Tuning Open Source

What Should You Know About Graph Database’s Scalability?

DZone

JANUARY 20, 2023

Having a distributed and scalable graph database system is highly sought after in many enterprise scenarios. Do Not Be Misled Designing and implementing a scalable graph database system has never been a trivial task.

Scalability

Scalability Big Data Hardware Internet

Microsoft Azure Event Hubs

DZone

FEBRUARY 23, 2023

Introduction With big data streaming platform and event ingestion service Azure Event Hubs , millions of events can be received and processed in a single second. Any real-time analytics provider or batching/storage adaptor can transform and store data supplied to an event hub.

Azure

Azure Big Data Storage Analytics

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

by Jun He , Yingyi Zhang , and Pawan Dixit Incremental processing is an approach to process new or changed data in workflows. The key advantage is that it only incrementally processes data that are newly added or updated to a dataset, instead of re-processing the complete dataset.

Processing

Processing Big Data Efficiency Engineering

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

IT operations analytics is the process of unifying, storing, and contextually analyzing operational data to understand the health of applications, infrastructure, and environments and streamline everyday operations. ITOA automates repetitive cloud operations tasks and streamlines the flow of analytics into decision-making processes.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud.

Big Data

Big Data Analytics AWS Cloud

Optimizing dbt and Google’s BigQuery

DZone

DECEMBER 21, 2020

Setting up a data warehouse is the first step towards fully utilizing big data analysis. Still, it is one of many that need to be taken before you can generate value from the data you gather. An important step in that chain of the process is data modeling and transformation.

Big Data

Big Data Google Scalability Processing

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Central engineering teams provide paved paths (secure, vetted and supported options) and guard rails to help reduce variance in choices available for tools and technologies to support the development of scalable technical architectures.

Infrastructure

Infrastructure Big Data Transportation Architecture

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

As cloud and big data complexity scales beyond the ability of traditional monitoring tools to handle, next-generation cloud monitoring and observability are becoming necessities for IT teams. Website monitoring examines a cloud-hosted website’s processes, traffic, availability, and resource use. What is cloud monitoring?

Cloud

Cloud Monitoring Best Practices Infrastructure

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges.

Big Data

Big Data Storage Benchmarking Hardware

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

-based financial services group, discussed how the bank uses log monitoring on the Dynatrace platform with an emphasis on observability and security data. To grasp the challenges of multifeatured, cross-team cooperation dealing with observability data, consider the content of the logs generated. Dissolving data silos.

Analytics

Analytics Infrastructure Storage Architecture

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Stop worrying about log data ingest and storage — start creating value instead. Dynatrace® Grail , an additional core technology for the Dynatrace® Software Intelligence platform , is the world’s first data lakehouse with massively parallel processing (MPP) for context-rich observability, business, and security analytics.

Analytics

Analytics Artificial Intelligence Storage Serverless

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The processed data is typically stored as data warehouse tables in AWS S3. Moving data with Bulldozer at Netflix.

Latency

Latency Storage Big Data Tuning

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Through effortless provisioning, a larger number of small hosts provide a cost-effective and scalable platform. On-premises data centers invest in higher capacity servers since they provide more flexibility in the long run, while the procurement price of hardware is only one of many cost factors.

Open Source

Open Source Java Operating System Programming

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

Berkeley Packet Filter (BPF) is an in-kernel execution engine that processes a virtual instruction set, and has been extended as eBPF for providing a safe way to extend kernel functionality. The data is also used by security and other partner teams for insight and incident analysis. What is BPF?

Network

Network Transportation AWS Cloud

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

As Big data and ML became more prevalent and impactful, the scalability, reliability, and usability of the orchestrating ecosystem have increasingly become more important for our data scientists and the company. Another dimension of scalability to consider is the size of the workflow.

Java

Java Scalability Traffic Architecture

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

The first phase involves validating functional correctness, scalability, and performance concerns and ensuring the new systems’ resilience before the migration. Additionally, for mismatches, we record the normalized and unnormalized responses from both sides to another big data table along with other relevant parameters, such as the diff.

Traffic

Traffic Latency Tuning Systems

What is container orchestration?

Dynatrace

MARCH 24, 2023

Container orchestration is a process that automates the deployment and management of containerized applications and services at scale. Container orchestration enables organizations to manage and automate the many processes and services that comprise workflows.

Infrastructure

Infrastructure Open Source Operating System Cloud

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. Besides this, elimination of these features had an extremely important influence on the performance and scalability of the stores. Data duplication and denormalization are first-class citizens.

Database

Database Ecommerce Efficiency Engineering

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

This talk will delve into the creative solutions Netflix deploys to manage this high-volume, real-time data requirement while balancing scalability and cost. To handle errors efficiently, Netflix developed a rule-based classifier for error classification called “Pensive.”

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Key Takeaways Distributed storage systems benefit organizations by enhancing data availability, fault tolerance, and system scalability, leading to cost savings from reduced hardware needs, energy consumption, and personnel. This process effectively duplicates essential parts of information to safeguard against potential loss.

Storage

Storage Systems Big Data Azure

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

The Netflix TechBlog

FEBRUARY 16, 2021

To solve the challenges mentioned above and meet our rapidly evolving business needs, we re-architected the legacy SKU catalog from the ground up and partnered with the Growth Engineering team to build a scalable SKU platform. The legacy paradigm involved a fixed process in applying code changes and service release can take a couple of days.

Mobile

Mobile Engineering Infrastructure Scalability

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

All Things Distributed

DECEMBER 13, 2016

With the launch of the AWS Europe (London) Region, AWS can enable many more UK enterprise, public sector and startup customers to reduce IT costs, address data locality needs, and embark on rapid transformations in critical new areas, such as big data analysis and Internet of Things.

AWS

AWS Cloud Artificial Intelligence IoT

What is RabbitMQ Used For

Scalegrid

JUNE 28, 2024

It is widely utilized across various industries, such as finance, telecommunications, and e-commerce, for managing activities, including transaction processing, data streaming, and instantaneous messaging. Key Takeaways RabbitMQ is an open-source message broker facilitating seamless data exchange across diverse systems.

IoT

IoT Healthcare Programming Open Source

MySQL vs MongoDB: Best Choice for You

Scalegrid

FEBRUARY 11, 2025

This article will help you understand the core differences in data structure, scalability, and use cases. Whether you need a relational database for complex transactions or a NoSQL database for flexible data storage, weve got you covered. Choosing the right database often comes down to MongoDB vs MySQL.

Scalability

Scalability Database Storage IoT

Mastering Distributed SQL™ Databases in 2025

Scalegrid

JANUARY 10, 2025

Heading into 2024, SQL databases will remain essential in data management, increasingly using distributed systems to meet growing needs for scalability and reliability. They keep the features that developers like but can handle much more data, similar to NoSQL systems. This process needs careful planning and know-how.

Database

Database Scalability Best Practices Blockchain

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

There are several benefits of such optimizations like saving on storage, faster query time, cheaper downstream processing, and an increase in developer productivity by removing additional ETLs written only for query performance improvement. Some of the optimizations are prerequisites for a high-performance data warehouse.

Storage

Storage Latency Efficiency Data Engineering

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

The Netflix TechBlog

JUNE 1, 2021

At Netflix, the work that data engineers do to produce data in a robust, scalable way is incredibly important to provide the best experience to our members as they interact with our service. It’s been incredible to see how gracious people are with their time and knowledge.

Data Engineering

Data Engineering Engineering Software Engineering Big Data

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

This approach allows companies to combine the security and control of private clouds with public clouds’ scalability and innovation potential. Defining Hybrid Cloud Strategy The decision-making process about where to situate data and applications is vital to any hybrid cloud solution.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

New AWS feature: Run your website from Amazon S3 - All Things.

All Things Distributed

FEBRUARY 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. There are a few small pieces of the blogging process that I still need a server for: editing postings, managing comments and to serve search and all of these can easily be run out of a single Amazon EC2 micro instance. All Things Distributed. Comments ().

AWS

AWS Website Storage Servers

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

In such a data intensive environment, making key business decisions such as running marketing and sales campaigns, logistic planning, financial analysis and ad targeting require deriving insights from these data. However, the data infrastructure to collect, store and process data is geared toward developers (e.g.,

Cloud

Cloud Big Data AWS Analytics

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Scalability Engineering

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Werner Vogels weblog on building scalable and robust distributed systems. The storage systems weve pioneered demonstrate extreme scalability while maintaining tight control over performance, availability, and cost. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. All Things Distributed.

Technology

Technology Technology AWS Storage

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios. Data transfer technology.

Cache

Cache Storage Architecture Scalability

The AWS GovCloud (US) Region - All Things Distributed

All Things Distributed

AUGUST 16, 2011

Werner Vogels weblog on building scalable and robust distributed systems. ITAR is the International Traffic in Arms Regulatory framework which stipulates for example that data must be stored in an environment where physical and logical access is restricted to US Persons. Government and Big Data. All Things Distributed.

AWS

AWS Government Big Data Cloud

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

Werner Vogels weblog on building scalable and robust distributed systems. The methods for accessing these objects is also rapidly changing; where in the past you needed a PC or a Laptop to access these objects, now many of our electronic devices have become capable of processing them. Driving down the cost of Big-Data analytics.

AWS

AWS Cloud Storage Internet

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

And it can maintain contextual information about every data source (like the medical history of a device wearer or the maintenance history of a refrigeration system) and keep it immediately at hand to enhance the analysis. Conventional streaming analytics architectures have not kept up with the growing demands of IoT.

IoT

IoT Big Data Analytics Architecture

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

All Things Distributed

DECEMBER 8, 2016

Given this, enterprises, public sector bodies, startups, and small businesses are looking to adopt agile, scalable, and secure public cloud solutions. Access to secure, scalable, low-cost AWS infrastructure in Canada allows customers to innovate and provide tools to meet privacy, sovereignty, and compliance requirements. Scalability.

AWS

AWS Cloud Lambda Innovation

No Server Required - Jekyll & Amazon S3 - All Things Distributed

All Things Distributed

AUGUST 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. If you have a largely static site you can rely on the enormous power of S3 to make serving your content highly scalable and storing it extremely durable. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Comments ().

Servers

Servers Social Media AWS Website

What is Greenplum Database? Intro to the Big Data Database

In-Stream Big Data Processing

Trending Sources

Write Optimized Spark Code for Big Data Applications

What Should You Know About Graph Database’s Scalability?

Microsoft Azure Event Hubs

Incremental Processing using Netflix Maestro and Apache Iceberg

What is IT operations analytics? Extract more data insights from more sources

Driving down the cost of Big-Data analytics - All Things Distributed

Optimizing dbt and Google’s BigQuery

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

What is cloud monitoring? How to improve your full-stack visibility

Kubernetes for Big Data Workloads

Conducting log analysis with an observability platform and full data context

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Kubernetes in the wild report 2023

How Netflix uses eBPF flow logs at scale for network insight

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

What is container orchestration?

NoSQL Data Modeling Techniques

A Recap of the Data Engineering Open Forum at Netflix

What is a Distributed Storage System

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

What is RabbitMQ Used For

MySQL vs MongoDB: Best Choice for You

Mastering Distributed SQL™ Databases in 2025

Optimizing data warehouse storage

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

Mastering Hybrid Cloud Strategy

New AWS feature: Run your website from Amazon S3 - All Things.

Expanding the Cloud: Introducing Amazon QuickSight

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Redis vs Memcached in 2024

The AWS GovCloud (US) Region - All Things Distributed

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Music to my Ears - All Things Distributed

The Need for Real-Time Device Tracking

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

No Server Required - Jekyll & Amazon S3 - All Things Distributed

Stay Connected