Engineering and Storage - Technology Performance Pulse

Part 1: A Survey of Analytics Engineering Work at Netflix

The Netflix TechBlog

DECEMBER 17, 2024

This article is the first in a multi-part series sharing a breadth of Analytics Engineering work at Netflix, recently presented as part of our annual internal Analytics Engineering conference. Subsequent posts will detail examples of exciting analytic engineering domain applications and aspects of the technical craft.

Analytics

Analytics Engineering Entertainment Metrics

Dynatrace + Metis: Helping developers & SREs solve Database issues with AI

Dynatrace

MARCH 5, 2025

Site Reliability Engineers (SREs) also face significant challenges in maintaining database reliability, ensuring performance, and preventing disruptions in highly dynamic and distributed environments. One slow query, an inefficient index, or a schema misstep can grind an application to a halt.

Database

Database Development Tuning DevOps

Build systems more reliably with Dynatrace: Chaos Engineering

Dynatrace

AUGUST 21, 2024

To enhance reliability, testing the software under these conditions is crucial to prepare for potential issues by leveraging chaos engineering or similar tools. Chaos engineering is a practice that extends beyond traditional failure testing by identifying unpredictable issues. It forms the cornerstone of chaos engineering experiments.

Engineering

Engineering Systems Latency Metrics

Catching up with OpenTelemetry in 2025

Dynatrace

FEBRUARY 27, 2025

To get a better idea of OpenTelemetry trends in 2025 and how to get the most out of it in your observability strategy, some of our Dynatrace open-source engineers and advocates picked out the innovations they find most interesting. Because its constantly evolving, staying up to date with the latest in OpenTelemetry is no small feat.

Tuning

Tuning Open Source Innovation Monitoring

Platform engineering: Empowering key Kubernetes use cases with Dynatrace

Dynatrace

OCTOBER 30, 2023

Today, speed and DevOps automation are critical to innovating faster, and platform engineering has emerged as an answer to some of the most significant challenges DevOps teams are facing. It needs to be engineered properly as a product or service, and it needs automation, observability, and security in itself.”

Engineering

Engineering DevOps Innovation Storage

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Dynatrace

APRIL 10, 2025

Simplify data ingestion and up-level storage for better, faster querying : With Dynatrace, petabytes of data are always hot for real-time insights, at a cold cost. Business-focused, unified platform approach : A unified platform approach enables platform engineering and self-service portals, simplifying operations and reducing costs.

Strategy

Strategy Storage Network Architecture

Mastering Disk Space Management with MongoDB® Storage Engines

Scalegrid

MAY 11, 2024

MongoDB offers several storage engines that cater to various use cases. The default storage engine in earlier versions was MMAPv1, which utilized memory-mapped files and document-level locking. Choosing the appropriate storage engine can have a significant impact on application performance.

Storage

Storage Engineering Cache Database

60 seconds to self-upgrading observability on Google Kubernetes Engine

Dynatrace

MARCH 23, 2020

A decade ago, while working for a large hosting provider, I led a team that was thrown into turmoil over the purchasing of server and storage hardware in preparation for a multi-million dollar super-bowl ad campaign. Rapid OneAgent rollouts on Google Kubernetes Engine. Dynatrace news. The OneAgent Helm chart is one-of-a-kind.

Google

Google Engineering Metrics Hardware

Empowering Developers With Scalable, Secure, and Customizable Storage Solutions

DZone

MARCH 22, 2024

As a developer, engineer, or architect, finding the right storage solution that seamlessly integrates with your infrastructure while providing the necessary scalability, security, and performance can be a daunting task. Whether you're a small startup or a large enterprise, StoneFly's storage solutions can grow with your business.

Storage

Storage Scalability Development Network

MezzFS?—?Mounting object storage in Netflix’s media processing platform

The Netflix TechBlog

MARCH 6, 2019

Mounting object storage in Netflix’s media processing platform By Barak Alon (on behalf of Netflix’s Media Cloud Engineering team) MezzFS (short for “Mezzanine File System”) is a tool we’ve developed at Netflix that mounts cloud objects as local files via FUSE. MezzFS has a number of features, including: Stream objects ?— ?

Media

Media Storage Processing Cache

New continuous compliance requirements drive the need to converge observability and security

Dynatrace

DECEMBER 12, 2024

For example, for companies with over 1,000 DevOps engineers, the potential savings are between $3.4 million to $5 million annually in increased developer efficiency with our vulnerability and exposure offering alone. Were challenging these preconceptions.

Analytics

Analytics Government Efficiency Innovation

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

Dynatrace

DECEMBER 18, 2023

For busy site reliability engineers, ensuring system reliability, scalability, and overall health is an imperative that’s getting harder to achieve in ever-expanding, cloud-native, container-based environments. Because of its adaptability, Prometheus has become an essential tool for observability engineering.

Metrics

Metrics Engineering Energy Tuning

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Title Launch Observability at Netflix Scale

The Netflix TechBlog

MARCH 4, 2025

This standardization enhances adoption within the personalization stack, simplifies the system, and improves understanding and debuggability for engineers. They must also provide enough information for partner engineers to identify the problem with the underlying service in cases of system-level issues.

Traffic

Traffic Strategy Entertainment Innovation

Growth Engineering at Netflix?—?Automated Imagery Generation

The Netflix TechBlog

FEBRUARY 9, 2021

Growth Engineering at Netflix?—?Automated In the Growth Engineering team, we refer to this as the top of the signup funnel. For more background on the signup funnel and Growth Engineering’s role in the signup funnel, please read our initial post on the topic: Growth Engineering at Netflix? Growth Engineering at Netflix?—?Automated

Engineering

Engineering Storage Latency Entertainment

Designing Instagram

High Scalability

JANUARY 11, 2022

Machine Learning Engineer at Amazon and has led several machine-learning initiatives across the Amazon ecosystem. FUN FACT : In this talk , Rodrigo Schmidt, director of engineering at Instagram talks about the different challenges they have faced in scaling the data infrastructure at Instagram. This is a guest post by Ankit Sirmorya.

Design

Design Media Storage Logistics

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Every image you hover over isnt just a visual placeholder; its a critical data point that fuels our sophisticated personalization engine. The enriched data is seamlessly accessible for both real-time applications via Kafka and historical analysis through storage in an Apache Iceberg table.

Tuning

Tuning Latency Efficiency Storage

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages. Polymorphic Data Storage. Greenplum’s polymorphic data storage allows you to control the configuration for your table and partition storage with the freedom to execute and compress files within it at any time.

Big Data

Big Data Database Artificial Intelligence Open Source

Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System

Uber Engineering

SEPTEMBER 9, 2021

Uber deploys a few storage technologies to store business data based on their application model.

Storage

Storage Systems Engineering Technology

Distributed tracing with Dynatrace just got even better

Dynatrace

MARCH 11, 2025

Say hello to advanced trace an alytics and new data storage and capture options. Site reliability engineers, performance architects, and developers can now leverage dynamic analysis tools like dashboards and workflows to explore trends, automate processes, and maintain control at an unprecedented level. But why stop there?

Analytics

Analytics Games Innovation Metrics

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Additionally, the tight coupling with multiple native database APIs — APIs that continually evolve and sometimes introduce backward-incompatible changes — resulted in org-wide engineering efforts to maintain and optimize our microservice’s data access. Each namespace may use different backends: Cassandra, EVCache, or combinations of multiple.

Latency

Latency Storage Cache Efficiency

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

DZone

MARCH 29, 2023

Data migration involves transferring data from on-premise storage to the cloud. The article will also explore the role of data engineering in ensuring successful data transfer and integration and different approaches to data migration.

Best Practices

Best Practices Cloud Data Engineering Storage

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

DZone

JULY 3, 2023

Data engineering projects often require the setup and management of complex infrastructures that support data processing, storage, and analysis. In this article, we will explore the benefits of leveraging IaC for data engineering projects and provide detailed implementation steps to get started.

Data Engineering

Data Engineering Infrastructure Engineering Code

Dynatrace expands root cause analysis to Kubernetes with Davis AI

Dynatrace

SEPTEMBER 26, 2022

Progressive rollouts, rollbacks, storage orchestration, bin packing, self-healing, cost efficiency, and access to the Cloud Native Computing Foundation (CNCF) ecosystem carry heavy observability challenges. Such context is easy to understand using the Dynatrace Davis AI engine. Incidents are harder to solve.

Storage

Storage Engineering Traffic Infrastructure

Storage Autoscaling With Percona Operator for MongoDB

Percona

FEBRUARY 10, 2023

In the cloud era, however, developers and operation engineers started fully embracing automation tools making their job significantly easier. Today along with their team, we will see how pvc-autoresizer can automate storage scaling for MongoDB clusters on Kubernetes. In our lab we will use AWS EKS with a standard storage class.

Storage

Storage Blockchain AWS Cloud

Nine ways technology executives can get significant business value with the right observability platform

Dynatrace

MAY 21, 2024

That’s because it does not require any pre-prepared schemas, and access to cold/hot storage is fully automatic and with zero latency. Insights are therefore dispersed in a multitude of data lakes, storage systems, and reporting platforms. Moreover, it is fast, powered by its massively parallel processing data lakehouse.

Technology

Technology Technology Analytics Storage

OpenPipeline: Simplify access to critical business data

Dynatrace

NOVEMBER 4, 2024

Reduced storage and query overhead for business use cases. Sensitive business data is separated from IT observability data. Improved data management. Fine-grained permission and retention policies can be tailored to individual business use cases. Business events are a small, often negligible subset of log data.

Analytics

Analytics Airlines Metrics Monitoring

AWS serverless services: Exploring your options

Dynatrace

OCTOBER 7, 2021

This means you no longer have to provision, scale, and maintain servers to run your applications, databases, and storage systems. Speed is next; serverless solutions are quick to spin up or down as needed, and there are no delays due to limited storage or resource access. AWS offers four serverless offerings for storage.

Serverless

Serverless AWS Lambda Storage

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

a Netflix member via Twitter This is an example of a question our on-call engineers need to answer to help resolve a member issue?—?which We needed to increase engineering productivity via distributed request tracing. That is the first question our engineering teams asked us when integrating the tracer library.

Infrastructure

Infrastructure Transportation Storage Open Source

The history of Grail: Why you need a data lakehouse

Dynatrace

OCTOBER 4, 2022

This architecture offers rich data management and analytics features (taken from the data warehouse model) on top of low-cost cloud storage systems (which are used by data lakes). This decoupling ensures the openness of data and storage formats, while also preserving data in context. Grail is built for such analytics, not storage.

Artificial Intelligence

Artificial Intelligence Analytics Storage Architecture

How To Deploy the ELK Stack on Kubernetes

DZone

OCTOBER 24, 2023

The ELK stack is an abbreviation for Elasticsearch, Logstash, and Kibana, which offers the following capabilities: Elasticsearch: a scalable search and analytics engine with a log analytics tool and application-formed database, perfect for data-driven applications.

Analytics

Analytics Storage Infrastructure Scalability

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

The Challenge of Title Launch Observability As engineers, were wired to track system metrics like error rates, latencies, and CPU utilizationbut what about metrics that matter to a titlessuccess? Additionally, the time-sensitive nature of these investigations precludes the use of cold storage, which cannot meet the stringent SLAs required.

Traffic

Traffic Scalability Strategy Monitoring

MySQL General Tablespaces: A Powerful Storage Option for Your Data

Percona

JANUARY 4, 2024

Managing storage and performance efficiently in your MySQL database is crucial, and general tablespaces offer flexibility in achieving this. In contrast to the single system tablespace that holds system tables by default, general tablespaces are user-defined storage containers for multiple InnoDB tables.

Storage

Storage Engineering Database Open Source

How a data lakehouse brings data insights to life

Dynatrace

OCTOBER 4, 2022

For IT infrastructure managers and site reliability engineers, or SREs , logs provide a treasure trove of data. In most data storage models, indexing engines enable faster access to query logs. But indexing requires schema management and additional storage to be effective, which adds cost and overhead.

Analytics

Analytics Storage Infrastructure Metrics

Improved Alerting with Atlas Streaming Eval

The Netflix TechBlog

APRIL 27, 2023

Engineers want their alerting system to be realtime, reliable, and actionable. A few years ago, we were paged by our SRE team due to our Metrics Alerting System falling behind — critical application health alerts reached engineers 45 minutes late! It opens doors to support more exciting use-cases.

Storage

Storage Cache Metrics Database

Using JSONB in PostgreSQL: How to Effectively Store & Index JSON Data in PostgreSQL

Scalegrid

JULY 17, 2020

JSONPath brings a powerful JSON query engine to PostgreSQL. JSONB storage has some drawbacks vs. traditional columns: PostreSQL does not store column statistics for JSONB columns. JSONB storage results in a larger storage footprint. JSONB storage does not deduplicate the key names in the JSON.

Storage

Storage Database Efficiency Processing

How to Perform Load Testing Against Nebula Graph With K6

DZone

DECEMBER 17, 2021

The load testing for the database needs to be conducted usually so that the impact on the system can be monitored in different scenarios, such as query language rule optimization, storage engine parameter adjustment, etc. The operating system in this article is the x86 CentOS 7.8.

Testing

Testing Operating System Storage Performance

What is log analytics? How a modern observability approach provides critical business insight

Dynatrace

JULY 29, 2022

As development and site reliability engineering (SRE) teams strive to release software faster, log analytics can provide key insight into software quality as part of a broader DevOps observability and automation initiative. Cold storage and rehydration. Cold storage and rehydration. Better-quality code. Inadequate context.

Analytics

Analytics Storage Retail DevOps

What is log analytics? How a modern observability approach provides critical business insight

Dynatrace

JULY 29, 2022

As development and site reliability engineering (SRE) teams strive to release software faster, log analytics can provide key insight into software quality as part of a broader DevOps observability and automation initiative. Cold storage and rehydration. Cold storage and rehydration. Better-quality code. Inadequate context.

Analytics

Analytics Storage Retail DevOps

Remote Workstations for the Discerning Artists

The Netflix TechBlog

MARCH 8, 2021

As an engineer, I can work anywhere with a standard laptop as long as I have an IDE and access to Stack Overflow. They could need a GPU when doing graphics-intensive work or extra large storage to handle file management. To meet this need, the Studio Infrastructure team has created Netflix Workstations.

Entertainment

Entertainment Storage Open Source Hardware

Apache Kafka + Apache Flink = Match Made in Heaven

DZone

MAY 5, 2023

This blog post explores the benefits of combining both open-source frameworks, shows unique differentiators of Flink versus Kafka, and discusses when to use a Kafka-native streaming engine like Kafka Streams instead of Flink. The Tremendous Adoption of Apache Kafka and Apache Flink Apache Kafka became the de facto standard for data streaming.

Open Source

Open Source Storage Innovation Engineering

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

High Scalability

SEPTEMBER 8, 2018

They've posted about Anna's new superpowers in Going Fast and Cheap: How We Made Anna Autoscale : Using Anna v0 as an in-memory storage engine, we set out to address the cloud storage problems described above. Each storage server collects statistics about the requests it serves, the data it stores, etc. Related Articles.

Storage

Storage Performance AWS Cloud

Part 1: A Survey of Analytics Engineering Work at Netflix

Dynatrace + Metis: Helping developers & SREs solve Database issues with AI

Trending Sources

Build systems more reliably with Dynatrace: Chaos Engineering

Catching up with OpenTelemetry in 2025

Platform engineering: Empowering key Kubernetes use cases with Dynatrace

Optimizing data warehouse storage

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Mastering Disk Space Management with MongoDB® Storage Engines

60 seconds to self-upgrading observability on Google Kubernetes Engine

Empowering Developers With Scalable, Secure, and Customizable Storage Solutions

MezzFS?—?Mounting object storage in Netflix’s media processing platform

New continuous compliance requirements drive the need to converge observability and security

A Recap of the Data Engineering Open Forum at Netflix

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Title Launch Observability at Netflix Scale

Growth Engineering at Netflix?—?Automated Imagery Generation

Designing Instagram

Introducing Impressions at Netflix

What is Greenplum Database? Intro to the Big Data Database

Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System

Distributed tracing with Dynatrace just got even better

Introducing Netflix’s Key-Value Data Abstraction Layer

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

Dynatrace expands root cause analysis to Kubernetes with Davis AI

Storage Autoscaling With Percona Operator for MongoDB

Nine ways technology executives can get significant business value with the right observability platform

OpenPipeline: Simplify access to critical business data

AWS serverless services: Exploring your options

Building Netflix’s Distributed Tracing Infrastructure

The history of Grail: Why you need a data lakehouse

How To Deploy the ELK Stack on Kubernetes

Title Launch Observability at Netflix Scale

MySQL General Tablespaces: A Powerful Storage Option for Your Data

How a data lakehouse brings data insights to life

Improved Alerting with Atlas Streaming Eval

Using JSONB in PostgreSQL: How to Effectively Store & Index JSON Data in PostgreSQL

How to Perform Load Testing Against Nebula Graph With K6

What is log analytics? How a modern observability approach provides critical business insight

What is log analytics? How a modern observability approach provides critical business insight

Remote Workstations for the Discerning Artists

Apache Kafka + Apache Flink = Match Made in Heaven

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

Stay Connected