Big Data, Design and Efficiency - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

It’s architecture was specially designed to manage large-scale data warehouses and business intelligence workloads by giving you the ability to spread your data out across a multitude of servers. This feature-packed database provides powerful and rapid analytics on data that scales up to petabyte volumes.

Big Data

Big Data Database Artificial Intelligence Open Source

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. This system has been designed to supplement and succeed the existing Hadoop-based system that had too high latency of data processing and too high maintenance costs.

Big Data

Big Data Processing Lambda Database

Sustainability: Thoughts from a software engineer

Dynatrace

MARCH 17, 2025

Until recently, improvements in data center power efficiency compensated almost entirely for the increasing demand for computing resources. The rise of big data, cryptocurrencies, and AI means the IT sector contributes significantly to global greenhouse gas emissions. However, this trend is now reversing.

Software Engineering

Software Engineering Engineering Software Software

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Uber Engineering

AUGUST 14, 2019

Maintaining Uber’s large-scale data warehouse comes with an operational cost in terms of ETL functions and storage. In our experience, optimizing for operational efficiency requires answering one key question: for which tables does the maintenance cost supersede utility?

Efficiency

Efficiency Engineering Design Storage

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

Infrastructure

Infrastructure Big Data Transportation Architecture

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud. However, this cannot be done without efficient, scalable data analytics.

Big Data

Big Data Analytics AWS Cloud

Seven benefits of AIOps to transform your business operations

Dynatrace

JULY 5, 2022

AIOps combines big data and machine learning to automate key IT operations processes, including anomaly detection and identification, event correlation, and root-cause analysis. Like the development and design phases, these applications generate massive data volumes that offer relevant and actionable insights.

Artificial Intelligence

Artificial Intelligence Cloud Innovation Strategy

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

While data lakes and data warehousing architectures are commonly used modes for storing and analyzing data, a data lakehouse is an efficient third way to store and analyze data that unifies the two architectures while preserving the benefits of both. What is a data lakehouse? Data warehouses.

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Netflix’s unique work culture and petabyte-scale data problems are what drew me to Netflix. During earlier years of my career, I primarily worked as a backend software engineer, designing and building the backend systems that enable big data analytics.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The processed data is typically stored as data warehouse tables in AWS S3. Moving data with Bulldozer at Netflix.

Latency

Latency Storage Big Data Tuning

Path to NoOps part 1: How modern AIOps brings NoOps within reach

Dynatrace

OCTOBER 25, 2022

Organizations adopt DevOps, where developers and operations work together in a continuous loop, so they can develop software and resolve issues efficiently before they affect users. He meant that more and more developers are now becoming responsible for operations, and operations are becoming ingrained in developers’ job descriptions.

DevOps

DevOps Big Data Cloud Innovation

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. the retry success probability) and compute cost efficiency (i.e., Multi-objective optimizations.

Tuning

Tuning Efficiency Big Data Engineering

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

We will show how we are building a clean and efficient incremental processing solution (IPS) by using Netflix Maestro and Apache Iceberg. IPS provides the incremental processing support with data accuracy, data freshness, and backfill for users and addresses many of the challenges in workflows. past 3 hours or 10 days).

Processing

Processing Big Data Efficiency Engineering

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

To handle errors efficiently, Netflix developed a rule-based classifier for error classification called “Pensive.” To address this, we propose developing an intelligent agent that can automatically discover, map, and query all data within an enterprise.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Python at Netflix

The Netflix TechBlog

APRIL 29, 2019

Various software systems are needed to design, build, and operate this CDN infrastructure, and a significant number of them are written in Python. Demand Engineering Demand Engineering is responsible for Regional Failovers , Traffic Distribution, Capacity Operations and Fleet Efficiency of the Netflix cloud.

Open Source

Open Source Network Infrastructure Big Data

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits. This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture.

Storage

Storage Latency Efficiency Data Engineering

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

The following figure depicts imaginary “evolution” of the major NoSQL system families, namely, Key-Value stores, BigTable-style databases, Document databases, Full Text Search Engines, and Graph databases: NoSQL Data Models. The main design theme is “ What answers do I have?” ” . GROUP BY category.

Database

Database Ecommerce Efficiency Engineering

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

ITOps refers to the process of acquiring, designing, deploying, configuring, and maintaining equipment and services that support an organization’s desired business outcomes. Adding application security to development and operations workflows increases efficiency. CloudOps teams are one step further in the digital supply chain.

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

What is APM?

Dynatrace

JUNE 1, 2020

With answers at your fingertips, data backed decisions, and real-time visibility into business KPIs, Dynatrace enables you to consistently deliver better digital business outcomes across all your channels more efficiently than ever before. Dynatrace APM – Named a Leader in APM and yet, we’re much more.

Artificial Intelligence

Artificial Intelligence Social Media Monitoring IoT

AIOps observability adoption ascends in healthcare

Dynatrace

MARCH 14, 2022

The healthcare industry is embracing cloud technology to improve the efficiency, quality, and security of patient care, and this year’s HIMSS Conference in Orlando, Fla., AIOps (or “AI for IT operations”) uses artificial intelligence so that big data can help IT teams work faster and more effectively.

Healthcare

Healthcare Artificial Intelligence Innovation Strategy

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Their design emphasizes increasing availability by spreading out files among different nodes or servers — this approach significantly reduces risks associated with losing or corrupting data due to node failure. Variations within these storage systems are called distributed file systems.

Storage

Storage Systems Big Data Azure

Probabilistic Data Structures for Web Analytics and Data Mining

Highly Scalable

MAY 1, 2012

On the other hand, when one is interested only in simple additive metrics like total page views or average price of conversion, it is obvious that raw data can be efficiently summarized, for example, on a daily basis or using simple in-stream counters. what is the cardinality of the data set)? bits per unique value.

Analytics

Analytics Traffic Big Data Efficiency

What is RabbitMQ Used For

Scalegrid

JUNE 28, 2024

Key features of RabbitMQ include message persistence to prevent data loss, flexible routing capabilities, and support for multiple messaging protocols such as AMQP, MQTT, and STOMP, enhancing its adaptability and reliability. Businesses can maintain a reliable and efficient communication system by utilizing message queues.

IoT

IoT Healthcare Programming Open Source

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

The Netflix TechBlog

FEBRUARY 16, 2021

However, with our rapid product innovation speed, the whole approach experienced significant challenges: Business Complexity: The existing SKU management solution was designed years ago when the engagement rules were simple?—?three three plans and one offer homogeneously applied to all regions. What’s Next?

Mobile

Mobile Engineering Infrastructure Scalability

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

MARCH 2, 2021

Later I enrolled in a data science program focused on helping academics transition to industry roles. A passion for making informed decisions based on data. Working on my PhD, I was using optimization techniques to design radiotherapy fractionation schemes to improve the results of clinical practices.

Analytics

Analytics C++ Innovation Engineering

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

All Things Distributed

DECEMBER 13, 2016

With the launch of the AWS Europe (London) Region, AWS can enable many more UK enterprise, public sector and startup customers to reduce IT costs, address data locality needs, and embark on rapid transformations in critical new areas, such as big data analysis and Internet of Things. Fraud.net is a good example of this.

AWS

AWS Cloud Artificial Intelligence IoT

Reimagining Experimentation Analysis at Netflix

The Netflix TechBlog

SEPTEMBER 10, 2019

Instead of relying on engineers to productionize scientific contributions, we’ve made a strategic bet to build an architecture that enables data scientists to easily contribute. The two main challenges with this approach are establishing an easy contribution framework and handling Netflix’s scale of data.

Metrics

Metrics Architecture Infrastructure Innovation

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java

Java Scalability Traffic Architecture

MySQL vs MongoDB: Best Choice for You

Scalegrid

FEBRUARY 11, 2025

Key Takeaways MySQL is a relational database management system ideal for structured data and complex relationships, ensuring data integrity and reliability. DBMS provides a systematic way to store, retrieve, and manage data, ensuring it remains organized and controlled.

Scalability

Scalability Database Storage IoT

What is AIOps? Everything you wanted to know

Dynatrace

OCTOBER 14, 2021

Gartner defines AIOps as the combination of “big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination.” But what is AIOps, exactly? And how can it support your organization? What is AIOps?

Artificial Intelligence

Artificial Intelligence DevOps Innovation Metrics

Streaming SQL in Data Mesh

The Netflix TechBlog

NOVEMBER 3, 2023

However, this design decision led to a different set of challenges. For more efficient schema management and evolution, the platform will automatically infer the output schema based on the fields selected by the SQL query. Some teams found the provided building blocks were not expressive enough.

Processing

Processing Engineering Infrastructure Latency

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

clinical data was often small enough to fit into memory on an average computer and only in rare cases would its computation require any technical ingenuity or massive computing power. There was not enough scope to explore the distributed and large-scale computing challenges that usually come with big data processing.

Data Engineering

Data Engineering Engineering Big Data Healthcare

What is Application Performance Monitoring?

Dynatrace

JUNE 1, 2020

With answers at your fingertips, data backed decisions, and real-time visibility into business KPIs, Dynatrace enables you to consistently deliver better digital business outcomes across all your channels more efficiently than ever before. Dynatrace APM – Named a Leader in APM and yet, we’re much more.

Monitoring

Monitoring Performance Social Media Artificial Intelligence

Mastering Distributed SQL™ Databases in 2025

Scalegrid

JANUARY 10, 2025

They keep the features that developers like but can handle much more data, similar to NoSQL systems. Notably, they simplify handling big data flows, offer consistent transactions, and sustain high performance even when they’re used for real-time data analysis and complex queries.

Database

Database Scalability Best Practices Blockchain

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

On the surface this is a paper about fast data ingestion from high-volume streams, with indexing to support efficient querying. Helios also serves as a reference architecture for how Microsoft envisions its next generation of distributed big-data processing systems being built. PVLDB’20.

Cloud

Cloud Big Data Latency Architecture

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

Redis Data Types and Structures The design of Redis’s data structures emphasizes versatility. Snapshots provide point-in-time captures of the dataset, which are efficient for recovery on startup. It is designed to cache plain text values, offering fast read and write access to frequently accessed data.

Cache

Cache Storage Scalability Architecture

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). Learn to balance architecture trade-offs and design scalable enterprise-level software. How to screen candidates efficiently, effectively, and without bias.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). Learn to balance architecture trade-offs and design scalable enterprise-level software. How to screen candidates efficiently, effectively, and without bias.

Education

Education Software Engineering Scalability Engineering

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

These trade-offs have even impacted the way the lowest level building blocks in our computer architectures have been designed. Graphics processing is one such area with huge computational requirements, but where each of the tasks is relatively small and often a set of operations are performed on data in the form of a pipeline.

AWS

AWS Programming Latency Architecture

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 30, 2020

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). Level up on in-demand technologies and prep for your interviews on Educative.io, featuring popular courses like the bestselling Grokking the System Design Interview.

Education

Education Software Engineering Engineering Big Data

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 24, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Engineering Big Data

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

The Morning Paper

MAY 14, 2019

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices Gan et al., Finally, we show that Seer can identify application level design bugs, and provide insights on how to better architect microservices to achieve predictable performance. ASPLOS’19.

Big Data

Big Data Cloud Performance Hardware

What is Greenplum Database? Intro to the Big Data Database

In-Stream Big Data Processing

Trending Sources

Sustainability: Thoughts from a software engineer

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Driving down the cost of Big-Data analytics - All Things Distributed

Seven benefits of AIOps to transform your business operations

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Path to NoOps part 1: How modern AIOps brings NoOps within reach

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Incremental Processing using Netflix Maestro and Apache Iceberg

A Recap of the Data Engineering Open Forum at Netflix

Python at Netflix

Optimizing data warehouse storage

NoSQL Data Modeling Techniques

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

What is APM?

AIOps observability adoption ascends in healthcare

What is a Distributed Storage System

Probabilistic Data Structures for Web Analytics and Data Mining

What is RabbitMQ Used For

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

Reimagining Experimentation Analysis at Netflix

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

MySQL vs MongoDB: Best Choice for You

What is AIOps? Everything you wanted to know

Streaming SQL in Data Mesh

Data Engineers of Netflix?—?Interview with Samuel Setegne

What is Application Performance Monitoring?

Mastering Distributed SQL™ Databases in 2025

Helios: hyperscale indexing for the cloud & edge – part 1

Redis vs Memcached in 2024

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Amazon EC2 Cluster GPU Instances - All Things Distributed

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Stay Connected