Big Data, Design and Engineering - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

It’s architecture was specially designed to manage large-scale data warehouses and business intelligence workloads by giving you the ability to spread your data out across a multitude of servers. This feature-packed database provides powerful and rapid analytics on data that scales up to petabyte volumes.

Big Data

Big Data Database Artificial Intelligence Open Source

Sustainability: Thoughts from a software engineer

Dynatrace

MARCH 17, 2025

Until recently, improvements in data center power efficiency compensated almost entirely for the increasing demand for computing resources. The rise of big data, cryptocurrencies, and AI means the IT sector contributes significantly to global greenhouse gas emissions. However, this trend is now reversing.

Software Engineering

Software Engineering Engineering Software Software

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. This system has been designed to supplement and succeed the existing Hadoop-based system that had too high latency of data processing and too high maintenance costs.

Big Data

Big Data Processing Lambda Database

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

JULY 15, 2021

Data Engineers of Netflix?—?Interview Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin, what drew you to data engineering?

Data Engineering

Data Engineering Engineering Entertainment Big Data

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Now, imagine yourself in the role of a software engineer responsible for a micro-service which publishes data consumed by few critical customer facing services (e.g. You are about to make structural changes to the data and want to know who and what downstream to your service will be impacted.

Infrastructure

Infrastructure Big Data Transportation Architecture

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Uber Engineering

AUGUST 14, 2019

Once identified, … The post Less is More: Engineering Data Warehouse Efficiency with Minimalist Design appeared first on Uber Engineering Blog. In our experience, optimizing for operational efficiency requires answering one key question: for which tables does the maintenance cost supersede utility?

Efficiency

Efficiency Engineering Design Storage

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Dhevi Rajendran Dhevi Rajendran This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix.

Data Engineering

Data Engineering Engineering Software Engineering Big Data

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications.

Big Data

Big Data Analytics AWS Cloud

Python at Netflix

The Netflix TechBlog

APRIL 29, 2019

Various software systems are needed to design, build, and operate this CDN infrastructure, and a significant number of them are written in Python. Python has long been a popular programming language in the networking space because it’s an intuitive language that allows engineers to quickly solve networking problems.

Open Source

Open Source Network Infrastructure Big Data

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Path to NoOps part 1: How modern AIOps brings NoOps within reach

Dynatrace

OCTOBER 25, 2022

NoOps is an advanced transformation of DevOps where many of the functions needed to manage, optimize and secure IT services and applications are automated within the design. Davis®— the Dynatrace purpose-built AI engine —further uses event proximity and baselined data for automatic correlation and noise suppression.

DevOps

DevOps Big Data Cloud Innovation

What is APM?

Dynatrace

JUNE 1, 2020

With our AI engine, Davis, at the core Dynatrace provides precise answers in real-time. Trying to manually keep up, configure, script and source data is beyond human capabilities and today everything must be automated and continuous. Some customers even say, having Davis is like having a whole team of engineers on their side.

Artificial Intelligence

Artificial Intelligence Social Media Monitoring IoT

DynatraceGo! APAC 2021: Lessons in thick data and keeping pace with the market

Dynatrace

AUGUST 10, 2021

Vikash Chhaganlal , GM of Engineering and Infrastructure at Kiwibank said it. BPAY is in the midst of its digital transformation journey in which it is discovering the critical importance of developing “contemporary ways of designing, operating, and using” its software. Investing in data is easy but using it is really hard”.

DevOps

DevOps Innovation Big Data Cloud

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

It also improves the engineering productivity by simplifying the existing pipelines and unlocking the new patterns. For example, a job would reprocess aggregates for the past 3 days because it assumes that there would be late arriving data, but data prior to 3 days isn’t worth the cost of reprocessing. past 3 hours or 10 days).

Processing

Processing Big Data Efficiency Engineering

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

The Netflix TechBlog

FEBRUARY 16, 2021

Membership Engineering at Netflix is responsible for the plan and pricing configurations for every market worldwide. However, with our rapid product innovation speed, the whole approach experienced significant challenges: Business Complexity: The existing SKU management solution was designed years ago when the engagement rules were simple?—?three

Mobile

Mobile Engineering Infrastructure Scalability

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

MARCH 2, 2021

At Netflix, our data scientists span many areas of technical specialization, including experimentation, causal inference, machine learning, NLP, modeling, and optimization. Together with data analytics and data engineering, we comprise the larger, centralized Data Science and Engineering group.

Analytics

Analytics C++ Innovation Engineering

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

The following figure depicts imaginary “evolution” of the major NoSQL system families, namely, Key-Value stores, BigTable-style databases, Document databases, Full Text Search Engines, and Graph databases: NoSQL Data Models. The main design theme is “ What answers do I have?” ” .

Database

Database Ecommerce Efficiency Engineering

Reimagining Experimentation Analysis at Netflix

The Netflix TechBlog

SEPTEMBER 10, 2019

Instead of relying on engineers to productionize scientific contributions, we’ve made a strategic bet to build an architecture that enables data scientists to easily contribute. The two main challenges with this approach are establishing an easy contribution framework and handling Netflix’s scale of data.

Metrics

Metrics Architecture Infrastructure Innovation

Turbocharge Your Apache Spark Jobs for Unmatched Performance

DZone

JULY 17, 2023

Apache Spark is a leading platform in the field of big data processing, known for its speed, versatility, and ease of use. Understanding Apache Spark Apache Spark is a unified computing engine designed for large-scale data processing.

Big Data

Big Data Performance Open Source Tuning

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. Rule Execution Engine is responsible for matching the collected logs against a set of predefined rules.

Tuning

Tuning Efficiency Big Data Engineering

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

ITOps refers to the process of acquiring, designing, deploying, configuring, and maintaining equipment and services that support an organization’s desired business outcomes. The Dynatrace AI engine, Davis , provides precise answers with automatic root-cause analysis to help DevOps and ITOps continuously improve workflows.

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java

Java Scalability Traffic Architecture

Streaming SQL in Data Mesh

The Netflix TechBlog

NOVEMBER 3, 2023

However, this design decision led to a different set of challenges. Based on the sources that the processor is connected to, the SQL Processor will automatically convert the upstream sources as tables within Flink’s SQL engine. Some teams found the provided building blocks were not expressive enough.

Processing

Processing Engineering Infrastructure Latency

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits. This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture.

Storage

Storage Latency Efficiency Data Engineering

What is Application Performance Monitoring?

Dynatrace

JUNE 1, 2020

With our AI engine, Davis, at the core Dynatrace provides precise answers in real-time. Trying to manually keep up, configure, script and source data is beyond human capabilities and today everything must be automated and continuous. Some customers even say, having Davis is like having a whole team of engineers on their side.

Monitoring

Monitoring Performance Social Media Artificial Intelligence

AIOps observability adoption ascends in healthcare

Dynatrace

MARCH 14, 2022

AIOps (or “AI for IT operations”) uses artificial intelligence so that big data can help IT teams work faster and more effectively. As healthcare providers consider AIOps solutions, they should evaluate whether traditional AIOps approaches designed for correlation can enable long-term success.

Healthcare

Healthcare Artificial Intelligence Innovation Strategy

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

As with any sustainable engineering design, focusing on simplicity is very important. These characteristics allow for an on-call response time that is relaxed and more in line with traditional big data analytical pipelines. And excellent logging is needed for debugging purposes and supportability.

Network

Network Tuning AWS Traffic

What is AIOps? Everything you wanted to know

Dynatrace

OCTOBER 14, 2021

Gartner defines AIOps as the combination of “big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination.” But what is AIOps, exactly? And how can it support your organization? What is AIOps? Autonomous operations.

Artificial Intelligence

Artificial Intelligence DevOps Innovation Metrics

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer!

Education

Education Software Engineering Scalability Engineering

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

Helios also serves as a reference architecture for how Microsoft envisions its next generation of distributed big-data processing systems being built. What follows is a discussion of where big data systems might be heading, heavily inspired by the remarks in this paper, but with several of my own thoughts mixed in.

Cloud

Cloud Big Data Latency Architecture

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer!

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 30, 2020

has hours of system design content. They also do live system design discussions every week. Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer!

Education

Education Software Engineering Engineering Big Data

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 24, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer! Who's Hiring? Please apply here. Apply here. Apply here. Make your job search O (1), not O ( n ). Apply here.

Education

Education Software Engineering Engineering Big Data

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

All Things Distributed

NOVEMBER 15, 2016

They require teams of data engineers to spend months building complex data models and synthesizing the data before they can generate their first report. Finally, their complex user experiences are designed for power users and not suitable for the fast-growing segment of business users. Enter Amazon QuickSight.

Analytics

Analytics Availability Media Social Media

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

We use high-performance transactions systems, complex rendering and object caching, workflow and queuing systems, business intelligence and data analytics, machine learning and pattern recognition, neural networks and probabilistic decision making, and a wide variety of other techniques. Driving down the cost of Big-Data analytics.

Technology

Technology Technology AWS Storage

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

All Things Distributed

AUGUST 20, 2012

The service redundantly stores data in multiple facilities and on multiple devices within each facility, as Amazon Glacier is designed to provide average annual durability of 99.999999999% for each item stored. With Amazon Glacier any organization now has access to the same data archiving capabilities as the worldâ??s

Storage

Storage Cloud AWS Media

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

Amazon S3 is used by enterprises of all sizes and is designed to handle scaling extremely well; it stores hundreds of billions of objects and easily performs several hundreds of thousands of storage transaction a second. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. More details at [link].

AWS

AWS Cloud Storage Internet

No Server Required - Jekyll & Amazon S3 - All Things Distributed

All Things Distributed

AUGUST 17, 2011

It is simple and elegant, as you would expect from someone who has won several design awards. Jekyll in written in Ruby and uses YAML for metadata management and uses the Liquid template engine to manipulate the content. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Expanding the Cloud â??

Servers

Servers Social Media AWS Website

MySQL vs MongoDB: Best Choice for You

Scalegrid

FEBRUARY 11, 2025

Key Takeaways MySQL is a relational database management system ideal for structured data and complex relationships, ensuring data integrity and reliability. This structured approach makes relational databases ideal for applications requiring precise data management and complex transactions.

Scalability

Scalability Database Storage IoT

What is Greenplum Database? Intro to the Big Data Database

Sustainability: Thoughts from a software engineer

Trending Sources

In-Stream Big Data Processing

Data Engineers of Netflix?—?Interview with Kevin Wylie

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

A Recap of the Data Engineering Open Forum at Netflix

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

Data Engineers of Netflix?—?Interview with Samuel Setegne

Driving down the cost of Big-Data analytics - All Things Distributed

Python at Netflix

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Path to NoOps part 1: How modern AIOps brings NoOps within reach

What is APM?

DynatraceGo! APAC 2021: Lessons in thick data and keeping pace with the market

Incremental Processing using Netflix Maestro and Apache Iceberg

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

NoSQL Data Modeling Techniques

Reimagining Experimentation Analysis at Netflix

Turbocharge Your Apache Spark Jobs for Unmatched Performance

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Streaming SQL in Data Mesh

Optimizing data warehouse storage

What is Application Performance Monitoring?

AIOps observability adoption ascends in healthcare

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

What is AIOps? Everything you wanted to know

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Helios: hyperscale indexing for the cloud & edge – part 1

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

Music to my Ears - All Things Distributed

No Server Required - Jekyll & Amazon S3 - All Things Distributed

MySQL vs MongoDB: Best Choice for You

Stay Connected