Infrastructure, Latency and Storage - Technology Performance Pulse

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Message brokers handle validation, routing, storage, and delivery, ensuring efficient and reliable communication. Message Broker vs. Distributed Event Streaming Platform RabbitMQ functions as a message broker, managing message confirmation, routing, storage, and delivery within a queue. What is RabbitMQ?

Latency

Latency Analytics Architecture Storage

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

Now let’s look at how we designed the tracing infrastructure that powers Edgar. If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls.

Infrastructure

Infrastructure Transportation Storage Open Source

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Vidhya Arvind , Rajasekhar Ummadisetty , Joey Lynch , Vinay Chella Introduction At Netflix our ability to deliver seamless, high-quality, streaming experiences to millions of users hinges on robust, global backend infrastructure. Data Model At its core, the KV abstraction is built around a two-level map architecture.

Latency

Latency Storage Cache Servers

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

The Challenge of Title Launch Observability As engineers, were wired to track system metrics like error rates, latencies, and CPU utilizationbut what about metrics that matter to a titlessuccess? This approach provides a few advantages: Low burden on existing systems: Log processing imposes minimal changes to existing infrastructure.

Traffic

Traffic Scalability Strategy Monitoring

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Scalegrid

JUNE 4, 2020

As an open source database, it’s a highly popular choice for enterprise applications looking to modernize their infrastructure and reduce their total cost of ownership, along with startup and developer applications looking for a powerful, flexible and cost-effective database to work with. Compare Latency. At a glance – TLDR.

Database

Database Latency Benchmarking Performance

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Infrastructure

Optimize Citrix platform performance and user experience with Dynatrace (GA)

Dynatrace

JANUARY 15, 2020

Therefore, it requires multidimensional and multidisciplinary monitoring: Infrastructure health —automatically monitor the compute, storage, and network resources available to the Citrix system to ensure a stable platform. OneAgent: Citrix infrastructure performance. OneAgent: SAP infrastructure performance. Citrix VDA.

Latency

Latency Performance Virtualization Infrastructure

Designing Instagram

High Scalability

JANUARY 11, 2022

FUN FACT : In this talk , Rodrigo Schmidt, director of engineering at Instagram talks about the different challenges they have faced in scaling the data infrastructure at Instagram. After that, the post gets added to the feed of all the followers in the columnar data storage. System Components. Fetching User Feed. Optimization.

Design

Design Media Storage Logistics

Best MySQL DigitalOcean Performance – ScaleGrid vs. DigitalOcean Managed Databases

Scalegrid

JUNE 22, 2020

Compare Latency. On average, ScaleGrid achieves almost 30% lower latency over DigitalOcean for the same deployment configurations. ScaleGrid provides 30% more storage on average vs. DigitalOcean for MySQL at the same affordable price. Read-Intensive Latency Benchmark. Balanced Workload Latency Benchmark.

Database

Database Benchmarking Latency Performance

The Power of Caching: Boosting API Performance and Scalability

DZone

AUGUST 16, 2023

Caching is the process of storing frequently accessed data or resources in a temporary storage location, such as memory or disk, to improve retrieval speed and reduce the need for repetitive processing.

Cache

Cache Scalability Performance Latency

Optimize your environment: Unveiling Dynatrace Hyper-V extension for enhanced performance and efficient troubleshooting

Dynatrace

OCTOBER 23, 2023

Secondly, determining the correct allocation of resources (CPU, memory, storage) to each virtual machine to ensure optimal performance without over-provisioning can be difficult. Firstly, managing virtual networks can be complex as networking in a virtual environment differs significantly from traditional networking.

Efficiency

Efficiency Virtualization Hardware Performance

Get seamless insights into Nutanix clusters with Dynatrace

Dynatrace

NOVEMBER 9, 2023

By integrating Nutanix metrics into Dynatrace, you can gain valuable insights into the performance and health of your Nutanix infrastructure. Performance monitoring Dynatrace can collect performance metrics from Nutanix clusters, including latency, IOPS (Input/Output Operations Per Second), and network throughput.

Virtualization

Virtualization Storage Metrics Monitoring

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

Data warehouses offer a single storage repository for structured data and provide a source of truth for organizations. Unlike data warehouses, however, data is not transformed before landing in storage. A data lakehouse provides a cost-effective storage layer for both structured and unstructured data. Data management.

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

While clustering across wide-area networks (WANs) is discouraged due to latency issues, leased links can mitigate some connectivity challenges. Keeping queues short minimizes latency and enhances the overall efficiency of message delivery in RabbitMQ. Keeping queues short maintains a responsive and efficient RabbitMQ setup.

Best Practices

Best Practices Traffic Strategy Efficiency

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

Best practices and key metrics for improving mobile app performance

Dynatrace

DECEMBER 13, 2023

This includes how quickly the application loads, how much load it is putting on the device, how much storage is being used, and how frequently it crashes. By monitoring metrics such as error rates, response times, and network latency, developers can identify trends and potential issues, so they don’t become critical. Issue remediation.

Best Practices

Best Practices Mobile Metrics Performance

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Dynatrace

JUNE 25, 2020

The network latency between cluster nodes should be around 10 ms or less. For Premium HA, this has been extended from 10 ms latency (in the same network region) to around 100 ms network latency due to asynchronous data replication between regions. In the image below, three downed nodes make an entire cluster unavailable.

Availability

Availability Hardware Latency Traffic

Why growing AI adoption requires an AI observability strategy

Dynatrace

JANUARY 17, 2024

AI requires more compute and storage. Training AI data is resource-intensive and costly, again, because of increased computational and storage requirements. As a result, AI observability supports cloud FinOps efforts by identifying how AI adoption spikes costs because of increased usage of storage and compute resources.

Strategy

Strategy Artificial Intelligence Storage Cloud

Build systems more reliably with Dynatrace: Chaos Engineering

Dynatrace

AUGUST 21, 2024

These releases often assumed ideal conditions such as zero latency, infinite bandwidth, and no network loss, as highlighted in Peter Deutsch’s eight fallacies of distributed systems. With Dynatrace, teams can seamlessly monitor the entire system, including network switches, database storage, and third-party dependencies.

Engineering

Engineering Systems Latency Metrics

Managing risk for financial services: The secret to visibility and control during times of volatility

Dynatrace

APRIL 8, 2024

Optimize the IT infrastructure supporting risk management processes and controls for maximum performance and resilience. The IT infrastructure, services, and applications that enable processes for risk management must perform optimally. Once teams solidify infrastructure and application performance, security is the subsequent priority.

Analytics

Analytics Infrastructure Efficiency Technology

Service level objectives: 5 SLOs to get started

Dynatrace

JUNE 1, 2023

Note : you might hear the term latency used instead of response time. Both latency and response time are critical to ensure reliability. Latency typically refers to the time it takes for a single request to travel from its source to its destination. Latency primarily focuses on the time spent in transit.

Latency

Latency Website Traffic DevOps

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

The data warehouse is not designed to serve point requests from microservices with low latency. Therefore, we must efficiently move data from the data warehouse to a global, low-latency and highly-reliable key-value store. Bulldozer abstracts the underlying infrastructure on how the data moves.

Latency

Latency Storage Big Data Tuning

The AWS Storage Gateway - All Things Distributed

All Things Distributed

JANUARY 23, 2012

Expanding the Cloud - The AWS Storage Gateway. Today Amazon Web Services has launched the AWS Storage Gateway, making the power of secureÂ and reliable cloud storage accessible from customersâ?? With the launch of the AWS Storage Gateway our customers can now integrate their on-premises IT environment with AWSâ??s

Storage

Storage AWS Virtualization Cloud

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

This is particularly important as we build out new functionality that relies on Pushy; a strong, stable infrastructure foundation allows our partners to continue to build on top of Pushy with confidence. KeyValue is an abstraction over the storage engine itself, which allows us to choose the best storage engine that meets our SLO needs.

Latency

Latency Cache Tuning Efficiency

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

Dynatrace

OCTOBER 3, 2024

Without distributed tracing, pinpointing the cause of increased latency could take hours or even days. There is no need to think about schema and indexes, re-hydration, or hot/cold storage. Interact with data intuitively and easily and benefit from immediate, AI-supported insights.

Performance

Performance Architecture Innovation Latency

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

ITOps is an IT discipline involving actions and decisions made by the operations team responsible for an organization’s IT infrastructure. Besides the traditional system hardware, storage, routers, and software, ITOps also includes virtual components of the network and cloud infrastructure. What does IT operations do?

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

Scale up your Dynatrace Managed software-intelligence deployment with self-healing insights

Dynatrace

JUNE 8, 2020

As a software intelligence platform, Dynatrace is woven into the fabric of your business systems, actively managing and providing self-healing capabilities for all aspects of your applications and vital infrastructure. Metrics are provided for general host info like CPU usage and memory consumption, OneAgent traffic, and network latency.

Software

Software Software Programming Metrics

Optimize Citrix platform performance and user experience with a new extension (Preview)

Dynatrace

SEPTEMBER 25, 2019

Therefore, it requires multidimensional and multidisciplinary monitoring: Infrastructure health —automatically monitor the compute, storage, and network resources available to the Citrix system to ensure a stable platform. OneAgent: Citrix infrastructure performance. OneAgent: SAP infrastructure performance. Citrix VDA.

Latency

Latency Performance Virtualization Infrastructure

Narrowing the gap between serverless and its state with storage functions

The Morning Paper

JANUARY 28, 2020

Narrowing the gap between serverless and its state with storage functions , Zhang et al., Shredder is " a low-latency multi-tenant cloud store that allows small units of computation to be performed directly within storage nodes. " SoCC’19. "Narrowing Shredder’s implementation is built on top of Seastar.

Serverless

Serverless Storage Latency Cloud

How Edge and Industrial IoT Will Converge in 2025: A New Era for Smart Manufacturing

VoltDB

NOVEMBER 20, 2024

This proximity reduces latency and enables real-time decision-making. Edge computing will process and filter this data before sending only the most relevant insights to the cloud, making large-scale IIoT deployments more feasible and reducing cloud storage and bandwidth costs.

IoT

IoT Energy Latency Automotive

Unlock the power of contextual log analytics

Dynatrace

OCTOBER 2, 2024

For instance, in a Kubernetes environment, if an application fails, logs in context not only highlight the error alongside corresponding log entries but also provide correlated logs from surrounding services and infrastructure components. There is no need to think about schema and indexes, re-hydration, or hot/cold storage.

Analytics

Analytics AWS DevOps Cloud

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. Netflix runs dozens of stateful services on AWS under strict sub-millisecond tail-latency requirements, which brings unique challenges. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

What’s New at ScaleGrid – September 2024

Scalegrid

SEPTEMBER 10, 2024

We’re proud to introduce AWS Outposts support, allowing you to manage cloud infrastructure on-premises while maintaining full AWS integration. Additionally, we’ve added the Philadelphia AWS Local Zone , helping to reduce latency for customers operating in the eastern U.S.

Latency

Latency AWS Storage Tuning

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

The Site Reliability Guardian helps automate release validation based on SLOs and important signals that define the expected behavior of your applications in terms of availability, performance errors, throughput, latency, etc. A study by Amazon found that increasing page load time by just 100 milliseconds costs 1% in sales.

AWS

AWS Efficiency Azure Cloud

Observability platform vs. observability tools

Dynatrace

DECEMBER 22, 2021

Metrics are measures of critical system values, such as CPU utilization or average write latency to persistent storage. As a result, teams can gain full visibility into their applications and multicloud infrastructure. A database could start executing a storage management process that consumes database server resources.

Artificial Intelligence

Artificial Intelligence Metrics Architecture DevOps

Netflix Drive

The Netflix TechBlog

MAY 5, 2021

Netflix Drive relies on a data store that will be the persistent storage layer for assets, and a metadata store which will provide a relevant mapping from the file system hierarchy to the data store entities. Finally, once the encoded copy is prepared, this copy can be persisted by Netflix Drive to a persistent storage tier in the cloud.

Media

Media Storage Architecture Cloud

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

Amazon DynamoDB offers low, predictable latencies at any scale. In response, we began to develop a collection of storage and database technologies to address the demanding scalability and reliability requirements of the Amazon.com ecommerce platform. s read latency, particularly as dataset sizes grow. The growth of Amazonâ??s

Scalability

Scalability Database Ecommerce Latency

How digital experience monitoring helps deliver business observability

Dynatrace

APRIL 26, 2022

Gartner estimates that by 2025, 70% of digital business initiatives will require infrastructure and operations (I&O) leaders to include digital experience metrics in their business reporting. With DEM solutions, organizations can operate over on-premise network infrastructure or private or public cloud SaaS or IaaS offerings.

Monitoring

Monitoring Social Media IoT Metrics

Evolution of ML Fact Store

The Netflix TechBlog

APRIL 26, 2022

fact logging client, ETL, query client, and data quality infrastructure. The first version of our logger library optimized for storage by deduplicating facts and optimized for network i/o using different compression methods for each fact. Design evolution Axion fact store has four components?—?fact

Storage

Storage Design Scalability Latency

Introducing Dynatrace built-in data observability on Davis AI and Grail

Dynatrace

JANUARY 31, 2024

million” – Gartner Data observability is a practice that helps organizations understand the full lifecycle of data, from ingestion to storage and usage, to ensure data health and reliability. This requires monitoring of the upstream infrastructure, applications, or platform supporting those data streams.

DevOps

DevOps Analytics Airlines Metrics

Service level objective examples: 5 SLO examples for faster, more reliable apps

Dynatrace

JUNE 1, 2023

Note : you might hear the term latency used instead of response time. Both latency and response time are critical to ensure reliability. Latency typically refers to the time it takes for a single request to travel from its source to its destination. Latency primarily focuses on the time spent in transit.

Traffic

Traffic Website Latency DevOps

Redis® Monitoring Strategies for 2025

Scalegrid

JANUARY 21, 2025

Identifying key Redis metrics such as latency, CPU usage, and memory metrics is crucial for effective Redis monitoring. To monitor Redis instances effectively, collect Redis metrics focusing on cache hit ratio, memory allocated, and latency threshold.

Strategy

Strategy Monitoring Latency DevOps

Netflix’s Distributed Counter Abstraction

RabbitMQ vs. Kafka: Key Differences

Trending Sources

Building Netflix’s Distributed Tracing Infrastructure

Optimizing data warehouse storage

Introducing Netflix’s Key-Value Data Abstraction Layer

Title Launch Observability at Netflix Scale

Comparing PostgreSQL DigitalOcean Performance & Pricing – ScaleGrid vs. DigitalOcean Managed Databases

Introducing Netflix TimeSeries Data Abstraction Layer

Optimize Citrix platform performance and user experience with Dynatrace (GA)

Designing Instagram

Best MySQL DigitalOcean Performance – ScaleGrid vs. DigitalOcean Managed Databases

The Power of Caching: Boosting API Performance and Scalability

Optimize your environment: Unveiling Dynatrace Hyper-V extension for enhanced performance and efficient troubleshooting

Get seamless insights into Nutanix clusters with Dynatrace

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Best Practices for Scaling RabbitMQ

What is a Distributed Storage System

Best practices and key metrics for improving mobile app performance

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Why growing AI adoption requires an AI observability strategy

Build systems more reliably with Dynatrace: Chaos Engineering

Managing risk for financial services: The secret to visibility and control during times of volatility

Service level objectives: 5 SLOs to get started

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The AWS Storage Gateway - All Things Distributed

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Analyze OpenTelemetry traces and log data at scale: Accelerate troubleshooting and optimize application performance

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Scale up your Dynatrace Managed software-intelligence deployment with self-healing insights

Optimize Citrix platform performance and user experience with a new extension (Preview)

Narrowing the gap between serverless and its state with storage functions

How Edge and Industrial IoT Will Converge in 2025: A New Era for Smart Manufacturing

Unlock the power of contextual log analytics

Netflix at AWS re:Invent 2019

What’s New at ScaleGrid – September 2024

Implementing AWS well-architected pillars with automated workflows

Observability platform vs. observability tools

Netflix Drive

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

How digital experience monitoring helps deliver business observability

Evolution of ML Fact Store

Introducing Dynatrace built-in data observability on Davis AI and Grail

Service level objective examples: 5 SLO examples for faster, more reliable apps

Redis® Monitoring Strategies for 2025

Stay Connected