Architecture, Data and Storage - Technology Performance Pulse

Dynatrace elevates data security with separated storage and unique encryption keys for each tenant

Dynatrace

NOVEMBER 29, 2024

Dynatrace continues to deliver on its commitment to keeping your data secure in the cloud. Enhancing data separation by partitioning each customer’s data on the storage level and encrypting it with a unique encryption key adds an additional layer of protection against unauthorized data access.

Storage

Storage AWS Azure Architecture

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Dynatrace

APRIL 10, 2025

As an executive, I am always seeking simplicity and efficiency to make sure the architecture of the business is as streamlined as possible. Simplify data ingestion and up-level storage for better, faster querying : With Dynatrace, petabytes of data are always hot for real-time insights, at a cold cost.

Strategy

Strategy Storage Network Architecture

Efficient Multimodal Data Processing: A Technical Deep Dive

DZone

FEBRUARY 27, 2025

Multimodal data processing is the evolving need of the latest data platforms powering applications like recommendation systems, autonomous vehicles, and medical diagnostics. Handling multimodal data spanning text, images, videos, and sensor inputs requires resilient architecture to manage the diversity of formats and scale.

Efficiency

Efficiency Processing Latency Storage

Ready for changes with Hexagonal Architecture

The Netflix TechBlog

MARCH 10, 2020

We had an interesting challenge on our hands: we needed to build the core of our app from scratch, but we also needed data that existed in many different systems. Existing data was crucial to the behavior and business logic of our application. Existing data was crucial to the behavior and business logic of our application.

Architecture

Architecture Transportation Java Strategy

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

It can scale towards a multi-petabyte level data workload without a single issue, and it allows access to a cluster of powerful servers that will work together within a single SQL interface where you can view all of the data. This feature-packed database provides powerful and rapid analytics on data that scales up to petabyte volumes.

Big Data

Big Data Database Artificial Intelligence Open Source

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Medallion Architecture: Efficient Batch and Stream Processing Data Pipelines With Azure Databricks and Delta Lake

DZone

JULY 13, 2023

In today's data-driven world, organizations need efficient and scalable data pipelines to process and analyze large volumes of data. Medallion Architecture provides a framework for organizing data processing workflows into different zones, enabling optimized batch and stream processing.

Azure

Azure Architecture Efficiency Processing

The history of Grail: Why you need a data lakehouse

Dynatrace

OCTOBER 4, 2022

Some time ago, at a restaurant near Boston, three Dynatrace colleagues dined and discussed the growing data challenge for enterprises. At its core, this challenge involves a rapid increase in the amount—and complexity—of data collected within a company. Work with different and independent data types. Grail architectural basics.

Artificial Intelligence

Artificial Intelligence Analytics Storage Architecture

How a data lakehouse brings data insights to life

Dynatrace

OCTOBER 4, 2022

For IT infrastructure managers and site reliability engineers, or SREs , logs provide a treasure trove of data. But on their own, logs present just another data silo as IT professionals attempt to troubleshoot and remediate problems. Data volume explosion in multicloud environments poses log issues.

Analytics

Analytics Storage Infrastructure Metrics

Why and How We Built a Primary-Replica Architecture of ClickHouse

DZone

AUGUST 13, 2024

As our data grew, we had problems with AWS Redshift which was slow and expensive. But this also caused storage challenges like disk failures and data recovery. We innovatively use its snapshot feature to implement a primary-replica architecture for ClickHouse.

Artificial Intelligence

Artificial Intelligence Architecture AWS Storage

New continuous compliance requirements drive the need to converge observability and security

Dynatrace

DECEMBER 12, 2024

Move beyond logs-only security: Embrace a comprehensive, end-to-end approach that integrates all data from observability and security. More technology, more complexity The benefits of cloud-native architecture for IT systems come with the complexity of maintaining real-time visibility into security compliance and risk posture.

Analytics

Analytics Government Efficiency Innovation

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

RabbitMQ is designed for flexible routing and message reliability, while Kafka handles high-throughput event streaming and real-time data processing. Both serve distinct purposes, from managing message queues to ingesting large data volumes. What is RabbitMQ?

Latency

Latency Analytics Architecture Storage

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

While data lakes and data warehousing architectures are commonly used modes for storing and analyzing data, a data lakehouse is an efficient third way to store and analyze data that unifies the two architectures while preserving the benefits of both. What is a data lakehouse?

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Second, developers had to constantly re-learn new data modeling practices and common yet critical data access patterns. To overcome these challenges, we developed a holistic approach that builds upon our Data Gateway Platform. Data Model At its core, the KV abstraction is built around a two-level map architecture.

Latency

Latency Storage Cache Servers

Dynatrace log collection for ARM unlocks power-efficient architecture for your enterprise

Dynatrace

DECEMBER 5, 2023

Without observability, the benefits of ARM are lost Over the last decade and a half, a new wave of computer architecture has overtaken the world. ARM architecture, based on a processor type optimized for cloud and hyperscale computing, has become the most prevalent on the planet, with billions of ARM devices currently in use.

Architecture

Architecture Efficiency Energy Monitoring

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Every image you hover over isnt just a visual placeholder; its a critical data point that fuels our sophisticated personalization engine. This nuanced integration of data and technology empowers us to offer bespoke content recommendations. This queue ensures we are consistently capturing raw events from our global userbase.

Tuning

Tuning Latency Efficiency Storage

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Title Launch Observability at Netflix Scale

The Netflix TechBlog

MARCH 4, 2025

Part 3: System Strategies and Architecture By: VarunKhaitan With special thanks to my stunning colleagues: Mallika Rao , Esmir Mesic , HugoMarques This blog post is a continuation of Part 2 , where we cleared the ambiguity around title launch observability at Netflix. Store the data in an optimized, highly distributed datastore.

Traffic

Traffic Strategy Entertainment Innovation

Data privacy by design: How an observability platform protects data security

Dynatrace

APRIL 19, 2023

Creating an ecosystem that facilitates data security and data privacy by design can be difficult, but it’s critical to securing information. When organizations focus on data privacy by design, they build security considerations into cloud systems upfront rather than as a bolt-on consideration.

Design

Design Storage Programming Analytics

IT automation central to navigating cloud complexity and data explosion

Dynatrace

SEPTEMBER 23, 2022

Organizations continue to turn to multicloud architecture to deliver better, more secure software faster. But IT teams need to embrace IT automation and new data storage models to benefit from modern clouds. As they enlist cloud models, organizations now confront increasing complexity and a data explosion.

Cloud

Cloud Artificial Intelligence Innovation Architecture

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Dynatrace

OCTOBER 3, 2024

Considering the latest State of Observability 2024 report, it’s evident that multicloud environments not only come with an explosion of data beyond humans’ ability to manage it. It’s increasingly difficult to ingest, manage, store, and sort through this amount of data. You can find the list of use cases here.

Performance

Performance Architecture Innovation Latency

Measuring the importance of data quality to causal AI success

Dynatrace

JANUARY 4, 2024

While this approach can be effective if the model is trained with a large amount of data, even in the best-case scenarios, it amounts to an informed guess, rather than a certainty. But to be successful, data quality is critical. Teams need to ensure the data is accurate and correctly represents real-world scenarios. Consistency.

Government

Government Analytics Benchmarking Storage

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Boost DevOps maturity with observability and a data lakehouse

Dynatrace

JUNE 9, 2023

ln a world driven by macroeconomic uncertainty, businesses increasingly turn to data-driven decision-making to stay agile. They’re unleashing the power of cloud-based analytics on large data sets to unlock the insights they and the business need to make smarter decisions. All of these factors challenge DevOps maturity.

DevOps

DevOps Analytics Storage Metrics

Data ingestion pipeline with Operation Management

The Netflix TechBlog

MARCH 7, 2023

These media focused machine learning algorithms as well as other teams generate a lot of data from the media files, which we described in our previous blog , are stored as annotations in Marken. Marken Architecture Marken’s architecture diagram is as follows. in a video file.

Media

Media Latency Architecture Database

Getting answers from data starts with automated log acquisition, at any scale

Dynatrace

OCTOBER 14, 2022

Log data provides a unique source of truth for debugging applications, optimizing infrastructure, and investigating security incidents. This contextualization of log data enables AI-powered problem detection and root cause analysis at scale. Dynamic landscape and data handling requirements result in manual work.

Storage

Storage Government Network Infrastructure

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Modern organizations ingest petabytes of data daily, but legacy approaches to log analysis and management cannot accommodate this volume of data. based financial services group, discussed how the bank uses log monitoring on the Dynatrace platform with an emphasis on observability and security data.

Analytics

Analytics Infrastructure Storage Architecture

AWS serverless services: Exploring your options

Dynatrace

OCTOBER 7, 2021

To get a better understanding of AWS serverless, we’ll first explore the basics of serverless architectures, review AWS serverless offerings, and explore common use cases. Serverless architecture: A primer. Serverless architecture shifts application hosting functions away from local servers onto those managed by providers.

Serverless

Serverless AWS Lambda Storage

How unified data and analytics offers a new approach to software intelligence

Dynatrace

OCTOBER 4, 2022

Software and data are a company’s competitive advantage. But for software to work perfectly, organizations need to use data to optimize every phase of the software lifecycle. The only way to address these challenges is through observability data — logs, metrics, and traces. Teams interact with myriad data types.

Analytics

Analytics Software Software Storage

Enhance data management with Grail: Ultimate guide to custom buckets and security policies

Dynatrace

OCTOBER 6, 2023

Grail: Enterprise-ready data lakehouse Grail, the Dynatrace causational data lakehouse, was explicitly designed for observability and security data, with artificial intelligence integrated into its foundation. Tables are a physical data model, essentially the type of observability data that you can store.

Artificial Intelligence

Artificial Intelligence Metrics Analytics Storage

How to observe logs with Journald and Dynatrace

Dynatrace

APRIL 4, 2025

It supports multi-line logs, handles log rotation, and even includes mechanisms to check for data corruption. Grail, the Dynatrace schema on-read data lakehouse , is at the heart of the Dynatrace platform. The Grail architecture ensures scalability, making log data accessible for detailed analysis regardless of volume.

Analytics

Analytics Operating System Scalability Infrastructure

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Dynatrace

FEBRUARY 16, 2023

How do you get more value from petabytes of exponentially exploding, increasingly heterogeneous data? The short answer: The three pillars of observability—logs, metrics, and traces—converging on a data lakehouse. To solve this problem, Dynatrace launched Grail, its causational data lakehouse , in 2022.

Analytics

Analytics Innovation Metrics Database

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System

Uber Engineering

SEPTEMBER 9, 2021

Uber deploys a few storage technologies to store business data based on their application model.

Storage

Storage Systems Engineering Technology

Designing Instagram

High Scalability

JANUARY 11, 2022

Architecture. from a client it performs two parallel operations: i) persisting the action in the data store ii) publish the action in a streaming data store for a pub-sub model. User Feed Service, Media Counter Service) read the actions from the streaming data store and performs their specific tasks. High Level Design.

Design

Design Media Storage Logistics

Time Series Analysis: VAR-Model-As-A-Service Using Flask and MinIO

DZone

SEPTEMBER 25, 2023

It is the second of a series of articles that is built on top of that project, representing experiments with various statistical and machine learning models, data pipelines implemented using existing DAG tools, and storage services, both cloud-based and alternative on-premises solutions.

Storage

Storage AWS Architecture Cloud

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Several pain points have made it difficult for organizations to manage their data efficiently and create actual value. Limited data availability constrains value creation. Modern IT environments — whether multicloud, on-premises, or hybrid-cloud architectures — generate exponentially increasing data volumes.

Analytics

Analytics Artificial Intelligence Storage Serverless

Improved Alerting with Atlas Streaming Eval

The Netflix TechBlog

APRIL 27, 2023

Atlas is an in-memory time-series database that ingests multiple billions of time-series per day and retains the last two weeks of data. To serve the increasing number of push down queries, the in-memory storage layer would need to scale up as well, and it became clear that this would push the already expensive storage costs far higher.

Storage

Storage Cache Metrics Database

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

Open-Sourcing Metaflow, a Human-Centric Framework for Data Science

The Netflix TechBlog

DECEMBER 3, 2019

Netflix applies data science to hundreds of use cases across the company, including optimizing content delivery and video encoding. Data scientists at Netflix relish our culture that empowers them to work autonomously and use their judgment to solve problems independently. How could we improve the quality of life for data scientists?

Open Source

Open Source AWS Infrastructure Energy

Kubernetes security essentials: Understanding Kubernetes security misconfigurations

Dynatrace

APRIL 22, 2025

Kubernetes architecture and how security misconfigurations can affect each component. etcd database The etcd Database is the brain of your cluster, storing all configuration data that the API server uses to verify and maintain the cluster state. Every permission granted should be scrutinized and justified. Real-world impact.

Connect Fluentd logs with Dynatrace traces, metrics, and topology data to enhance Kubernetes observability

Dynatrace

APRIL 8, 2022

Fluentd is an open-source data collector that unifies log collection, processing, and consumption. Output plugins deliver logs to storage solutions, analytics tools, and observability platforms like Dynatrace. Built-in resiliency ensures data completeness and consistency even if Fluentd or an endpoint service goes down temporarily.

Metrics

Metrics Analytics Software Architecture Open Source

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

SEPTEMBER 24, 2021

As described by the white paper Apple ProRes ( link ), the target data rate of the Apple ProRes HQ for 1920x1080 at 29.97 Table 1: Movie and File Size Examples Initial Architecture A simplified view of our initial cloud video processing pipeline is illustrated in the following diagram. is 220 Mbps.

Cloud

Cloud Media Storage Cache

Dynatrace elevates data security with separated storage and unique encryption keys for each tenant

Cut costs and complexity: 5 strategies for reducing tool sprawl with Dynatrace

Trending Sources

Efficient Multimodal Data Processing: A Technical Deep Dive

Ready for changes with Hexagonal Architecture

What is Greenplum Database? Intro to the Big Data Database

Optimizing data warehouse storage

Medallion Architecture: Efficient Batch and Stream Processing Data Pipelines With Azure Databricks and Delta Lake

The history of Grail: Why you need a data lakehouse

How a data lakehouse brings data insights to life

Why and How We Built a Primary-Replica Architecture of ClickHouse

New continuous compliance requirements drive the need to converge observability and security

RabbitMQ vs. Kafka: Key Differences

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Introducing Netflix’s Key-Value Data Abstraction Layer

Dynatrace log collection for ARM unlocks power-efficient architecture for your enterprise

Introducing Impressions at Netflix

Introducing Netflix TimeSeries Data Abstraction Layer

Title Launch Observability at Netflix Scale

Data privacy by design: How an observability platform protects data security

IT automation central to navigating cloud complexity and data explosion

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Measuring the importance of data quality to causal AI success

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Boost DevOps maturity with observability and a data lakehouse

Data ingestion pipeline with Operation Management

Getting answers from data starts with automated log acquisition, at any scale

Netflix’s Distributed Counter Abstraction

Conducting log analysis with an observability platform and full data context

AWS serverless services: Exploring your options

How unified data and analytics offers a new approach to software intelligence

Enhance data management with Grail: Ultimate guide to custom buckets and security policies

How to observe logs with Journald and Dynatrace

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

A Recap of the Data Engineering Open Forum at Netflix

Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System

Designing Instagram

Time Series Analysis: VAR-Model-As-A-Service Using Flask and MinIO

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Improved Alerting with Atlas Streaming Eval

What is a Distributed Storage System

Open-Sourcing Metaflow, a Human-Centric Framework for Data Science

Kubernetes security essentials: Understanding Kubernetes security misconfigurations

Connect Fluentd logs with Dynatrace traces, metrics, and topology data to enhance Kubernetes observability

Netflix Cloud Packaging in the Terabyte Era

Stay Connected