Big Data, Data and Scalability - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

It can scale towards a multi-petabyte level data workload without a single issue, and it allows access to a cluster of powerful servers that will work together within a single SQL interface where you can view all of the data. This feature-packed database provides powerful and rapid analytics on data that scales up to petabyte volumes.

Big Data

Big Data Database Artificial Intelligence Open Source

Write Optimized Spark Code for Big Data Applications

DZone

MARCH 7, 2023

Apache Spark is a powerful open-source distributed computing framework that provides a variety of APIs to support big data processing. In addition, pySpark applications can be tuned to optimize performance and achieve better execution time, scalability, and resource utilization.

Big Data

Big Data Code Tuning Open Source

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. This system has been designed to supplement and succeed the existing Hadoop-based system that had too high latency of data processing and too high maintenance costs.

Big Data

Big Data Processing Lambda Database

Scaling for Success: Why Scalability Is the Forefront of Modern Applications

DZone

JUNE 13, 2023

Scalability has become the biggest buzzword in the world of Modern Applications for a good reason. In short, it is the ability to handle more data, more users, and more demand without sacrificing performance, reliability, or security. It is not uncommon to question why scalability has grabbed the attention of the masses these days.

Scalability

Scalability IoT Big Data Internet

What Should You Know About Graph Database’s Scalability?

DZone

JANUARY 20, 2023

Having a distributed and scalable graph database system is highly sought after in many enterprise scenarios. Do Not Be Misled Designing and implementing a scalable graph database system has never been a trivial task.

Scalability

Scalability Big Data Hardware Internet

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

Infrastructure

Infrastructure Big Data Transportation Architecture

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

IT operations analytics is the process of unifying, storing, and contextually analyzing operational data to understand the health of applications, infrastructure, and environments and streamline everyday operations. ITOA collects operational data to identify patterns and anomalies for faster incident management and near-real-time insights.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. At the same time, NoSQL data modeling is not so well studied and lacks the systematic theory found in relational databases. Many techniques that are described below are perfectly applicable to this model.

Database

Database Ecommerce Efficiency Engineering

Microsoft Azure Event Hubs

DZone

FEBRUARY 23, 2023

Introduction With big data streaming platform and event ingestion service Azure Event Hubs , millions of events can be received and processed in a single second. Any real-time analytics provider or batching/storage adaptor can transform and store data supplied to an event hub.

Azure

Azure Big Data Storage Analytics

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Modern organizations ingest petabytes of data daily, but legacy approaches to log analysis and management cannot accommodate this volume of data. based financial services group, discussed how the bank uses log monitoring on the Dynatrace platform with an emphasis on observability and security data.

Analytics

Analytics Infrastructure Storage Architecture

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java

Java Scalability Traffic Architecture

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud.

Big Data

Big Data Analytics AWS Cloud

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges. Performance.

Big Data

Big Data Storage Benchmarking Hardware

Optimizing dbt and Google’s BigQuery

DZone

DECEMBER 21, 2020

Setting up a data warehouse is the first step towards fully utilizing big data analysis. Still, it is one of many that need to be taken before you can generate value from the data you gather. An important step in that chain of the process is data modeling and transformation.

Big Data

Big Data Google Scalability Processing

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

As cloud and big data complexity scales beyond the ability of traditional monitoring tools to handle, next-generation cloud monitoring and observability are becoming necessities for IT teams. With agent monitoring, third-party software collects data and reports from the component that’s attached to the agent.

Cloud

Cloud Monitoring Best Practices Infrastructure

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

At much less than 1% of CPU and memory on the instance, this highly performant sidecar provides flow data at scale for network insight. Flow Collector consumes two data streams, the IP address change events from Sonar via Kafka and eBPF flow log data from the Flow Exporter sidecars.

Network

Network Transportation AWS Cloud

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

The study analyzes factual Kubernetes production data from thousands of organizations worldwide that are using the Dynatrace Software Intelligence Platform to keep their Kubernetes clusters secure, healthy, and high performing. Through effortless provisioning, a larger number of small hosts provide a cost-effective and scalable platform.

Open Source

Open Source Java Operating System Programming

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Dhevi Rajendran Dhevi Rajendran This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix.

Data Engineering

Data Engineering Engineering Software Engineering Big Data

Moving HPC to the Cloud: A Guide for 2020

High Scalability

SEPTEMBER 14, 2020

This is a guest post by Limor Maayan-Wainstein , a senior technical writer with 10 years of experience writing about cybersecurity, big data, cloud computing, web development, and more. High performance computing (HPC) enables you to solve complex problems which cannot be solved by regular computing.

Cloud

Cloud Big Data Virtualization Efficiency

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Several pain points have made it difficult for organizations to manage their data efficiently and create actual value. Limited data availability constrains value creation. Modern IT environments — whether multicloud, on-premises, or hybrid-cloud architectures — generate exponentially increasing data volumes.

Analytics

Analytics Artificial Intelligence Storage Serverless

Data Mining Problems in Retail

Highly Scalable

MARCH 10, 2015

Retail is one of the most important business domains for data science and data mining applications because of its prolific data and numerous optimization problems such as optimal prices, discounts, recommendations, and stock levels that can be solved using data analysis methods.

Retail

Retail C++ Analytics Metrics

Snowflake Workload Optimization

DZone

AUGUST 23, 2023

In the era of big data, efficient data management and query performance are critical for organizations that want to get the best operational performance from their data investments.

Big Data

Big Data Analytics Innovation Scalability

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

by Jun He , Yingyi Zhang , and Pawan Dixit Incremental processing is an approach to process new or changed data in workflows. The key advantage is that it only incrementally processes data that are newly added or updated to a dataset, instead of re-processing the complete dataset.

Processing

Processing Big Data Efficiency Engineering

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

It can happen on an edge API system servicing customer devices, between the edge and mid-tier services, or from mid-tiers to data stores. The first phase involves validating functional correctness, scalability, and performance concerns and ensuring the new systems’ resilience before the migration.

Traffic

Traffic Latency Tuning Systems

What is container orchestration?

Dynatrace

MARCH 24, 2023

Generally, container orchestration tools communicate with a user-created YAML or JSON file — formats that enable data exchange between applications and languages — that describes the configuration of the application or service.

Infrastructure

Infrastructure Open Source Operating System Cloud

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

The Netflix TechBlog

FEBRUARY 16, 2021

To solve the challenges mentioned above and meet our rapidly evolving business needs, we re-architected the legacy SKU catalog from the ground up and partnered with the Growth Engineering team to build a scalable SKU platform. SKUDB: SKU catalog data was migrated from the metadata configuration files to a relational database.

Mobile

Mobile Engineering Infrastructure Scalability

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

All Things Distributed

DECEMBER 13, 2016

The council has deployed IoT Weather Stations in Schools across the City and is using the sensor information collated in a Data Lake to gain insights on whether the weather or pollution plays a part in learning outcomes. The British Government is also helping to drive innovation and has embraced a cloud-first policy for technology adoption.

AWS

AWS Cloud Artificial Intelligence IoT

MySQL vs MongoDB: Best Choice for You

Scalegrid

FEBRUARY 11, 2025

This article will help you understand the core differences in data structure, scalability, and use cases. Whether you need a relational database for complex transactions or a NoSQL database for flexible data storage, weve got you covered.

Scalability

Scalability Database Storage IoT

Mastering Distributed SQL™ Databases in 2025

Scalegrid

JANUARY 10, 2025

Heading into 2024, SQL databases will remain essential in data management, increasingly using distributed systems to meet growing needs for scalability and reliability. They keep the features that developers like but can handle much more data, similar to NoSQL systems.

Database

Database Scalability Best Practices Blockchain

DROAM - Dreaming about Cheap Data Roaming - All Things.

All Things Distributed

JANUARY 11, 2011

Werner Vogels weblog on building scalable and robust distributed systems. DROAM - Dreaming about Cheap Data Roaming. The one thing that I have always struggled with during my travels are the data plans of the cell phone companies. for the device rental and the first day of data, and â?¬3.50 All Things Distributed.

Wireless

Wireless AWS Internet Internet

What is RabbitMQ Used For

Scalegrid

JUNE 28, 2024

It is widely utilized across various industries, such as finance, telecommunications, and e-commerce, for managing activities, including transaction processing, data streaming, and instantaneous messaging. Key Takeaways RabbitMQ is an open-source message broker facilitating seamless data exchange across diverse systems.

IoT

IoT Healthcare Programming Open Source

Ensuring Performance, Efficiency, and Scalability of Digital Transformation

Alex Podelko

DECEMBER 19, 2019

Boris has unique expertise in that area – especially in Big Data applications. How to select appropriate IT Infrastructure to support Digital Transformation by Boris Zibitsker, BEZNext. – Optimizing IT infrastructure – with specific use cases. – And probably the hottest question for operations – anomaly detection.

Efficiency

Efficiency Artificial Intelligence Scalability Performance

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

InfoQ

APRIL 15, 2024

He specifically delved into Venice DB, the NoSQL data store used for feature persistence. At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. By Rafal Gancarz

Artificial Intelligence

Artificial Intelligence Big Data Data Engineering Latency

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios.

Cache

Cache Storage Architecture Scalability

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

This approach allows companies to combine the security and control of private clouds with public clouds’ scalability and innovation potential. This combination allows for the fluid movement of data and applications across different environments, facilitating shared workloads seamlessly.

Strategy

Strategy Cloud Infrastructure Artificial Intelligence

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Scalability Engineering

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

We live in a world where massive volumes of data are generated from websites, connected devices and mobile apps. In such a data intensive environment, making key business decisions such as running marketing and sales campaigns, logistic planning, financial analysis and ad targeting require deriving insights from these data.

Cloud

Cloud Big Data AWS Analytics

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Werner Vogels weblog on building scalable and robust distributed systems. To our shareowners: Random forests, naÃ¯ve Bayesian estimators, RESTful services, gossip protocols, eventual consistency, data sharding, anti-entropy, Byzantine quorum, erasure coding, vector clocks. All Things Distributed. Comments ().

Technology

Technology Technology AWS Storage

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Scalability Engineering

What is Greenplum Database? Intro to the Big Data Database

Write Optimized Spark Code for Big Data Applications

Trending Sources

In-Stream Big Data Processing

Scaling for Success: Why Scalability Is the Forefront of Modern Applications

What Should You Know About Graph Database’s Scalability?

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

What is IT operations analytics? Extract more data insights from more sources

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

NoSQL Data Modeling Techniques

Microsoft Azure Event Hubs

Conducting log analysis with an observability platform and full data context

A Recap of the Data Engineering Open Forum at Netflix

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Driving down the cost of Big-Data analytics - All Things Distributed

Optimizing data warehouse storage

Kubernetes for Big Data Workloads

Optimizing dbt and Google’s BigQuery

What is cloud monitoring? How to improve your full-stack visibility

How Netflix uses eBPF flow logs at scale for network insight

Kubernetes in the wild report 2023

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

Moving HPC to the Cloud: A Guide for 2020

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Data Mining Problems in Retail

Snowflake Workload Optimization

Incremental Processing using Netflix Maestro and Apache Iceberg

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

What is container orchestration?

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

What is a Distributed Storage System

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

MySQL vs MongoDB: Best Choice for You

Mastering Distributed SQL™ Databases in 2025

DROAM - Dreaming about Cheap Data Roaming - All Things.

What is RabbitMQ Used For

Ensuring Performance, Efficiency, and Scalability of Digital Transformation

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

Redis vs Memcached in 2024

Mastering Hybrid Cloud Strategy

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Expanding the Cloud: Introducing Amazon QuickSight

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Stay Connected