Efficiency, Engineering and Latency - Technology Performance Pulse

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing enables software engineers to model their applications’ business logic as high-level representations in a directed acyclic graph without explicitly defining a physical execution plan. We designed experimental scenarios inspired by chaos engineering. This significantly increases event latency.

Engineering

Engineering Tuning Latency Open Source

How to Scale Elasticsearch to Solve Your Scalability Issues

DZone

FEBRUARY 26, 2025

One such open-source, distributed search and analytics engine is Elasticsearch, which is very efficient at handling data in large sets and high-velocity queries. This extra network overhead will easily result in increased latency compared to a single-node architecture where data access is straightforward.

Scalability

Scalability Open Source Latency Architecture

Site reliability engineering: 5 things you need to know

Dynatrace

FEBRUARY 4, 2021

What is site reliability engineering? Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. Dynatrace news. SRE focuses on automation.

Engineering

Engineering DevOps Government Latency

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Every image you hover over isnt just a visual placeholder; its a critical data point that fuels our sophisticated personalization engine. This setup allows for efficient streaming of real-time data through Kafka and the preservation of historical data in Iceberg, providing a comprehensive and flexible data processing and storage solution.

Tuning

Tuning Latency Efficiency Storage

Site reliability engineering: 5 things to you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. Organizations can then integrate these skilled engineers at key points in the DevOps life cycle.

Engineering

Engineering DevOps Government Latency

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

Yet, many are confined to a brief temporal window due to constraints in serving latency or training costs. Key insights from this shiftinclude: A Data-Centric Approach : Shifting focus from model-centric strategies, which heavily rely on feature engineering, to a data-centric one.

Tuning

Tuning Efficiency Latency Strategy

Enhancing Kubernetes cluster management key to platform engineering success

Dynatrace

MARCH 29, 2024

Five of the most common include cluster instability, resource and cost management, security, observability, and stress on engineering teams. Engineering teams are overwhelmed with stuff to do.” You can ask for the best configuration to reduce latency or improve the user experience.”

Engineering

Engineering DevOps Operating System Cloud

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

The Netflix TechBlog

SEPTEMBER 29, 2022

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support for Non-Parallelizable Workloads by Kostas Christidis Introduction Timestone is a high-throughput, low-latency priority queueing system we built in-house to support the needs of Cosmos , our media encoding platform. Over the past 2.5

Latency

Latency Systems Media Serverless

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination. This model supports both simple and complex data models, balancing flexibility and efficiency.

Latency

Latency Storage Cache Efficiency

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

The Challenge of Title Launch Observability As engineers, were wired to track system metrics like error rates, latencies, and CPU utilizationbut what about metrics that matter to a titlessuccess? The stakes are even higher when ensuring every title launches flawlessly.

Traffic

Traffic Scalability Strategy Monitoring

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

This guide will cover how to distribute workloads across multiple nodes, set up efficient clustering, and implement robust load-balancing techniques. While clustering across wide-area networks (WANs) is discouraged due to latency issues, leased links can mitigate some connectivity challenges.

Best Practices

Best Practices Traffic Strategy Efficiency

Noisy Neighbor Detection with eBPF

The Netflix TechBlog

SEPTEMBER 10, 2024

By Jose Fernandez , Sebastien Dabdoub , Jason Koch , Artem Tkachuk The Compute and Performance Engineering teams at Netflix regularly investigate performance issues in our multi-tenant environment. To emit a run queue latency metric, we leveraged three eBPF hooks: sched_wakeup, sched_wakeup_new, and sched_switch.

Latency

Latency Metrics Programming Monitoring

Optimizing your Kubernetes clusters without breaking the bank

Dynatrace

JANUARY 14, 2022

The Akamas vision is that only an autonomous optimization approach powered by AI can effectively enable performance engineers, SREs, and architects to identify the best configurations that ensure maximum service performance and resilience, at the lowest possible cost and at business speed. below 500ms) and error rates (e.g. lower than 2%.).

Latency

Latency Tuning Efficiency AWS

Edgar: Solving Mysteries Faster with Observability

The Netflix TechBlog

SEPTEMBER 2, 2020

Edgar helps Netflix teams troubleshoot distributed systems efficiently with the help of a summarized presentation of request tracing, logs, analysis, and metadata. For engineers, instead of whodunit, the question is often “what failed and why?” That alone might give an engineer the knowledge she needs to reproduce the issue.

Latency

Latency Transportation Engineering Traffic

For your eyes only: improving Netflix video quality with neural networks

The Netflix TechBlog

NOVEMBER 17, 2022

While conventional video codecs remain prevalent, NN-based video encoding tools are flourishing and closing the performance gap in terms of compression efficiency. How do we apply neural networks at scale efficiently? In order to have a viable solution, we took several steps to improve efficiency.

Network

Network Media Innovation Efficiency

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The data warehouse is not designed to serve point requests from microservices with low latency. Moving data with Bulldozer at Netflix.

Latency

Latency Storage Big Data Tuning

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

This is a set of best practices and guidelines that help you design and operate reliable, secure, efficient, cost-effective, and sustainable systems in the cloud. The framework comprises six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability.

AWS

AWS Efficiency Azure Cloud

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

With these clear benefits, we continued to build out this functionality for more devices, enabling the same efficiency wins. It was very efficient, but it had a set job size, requiring manual intervention if we wanted to horizontally scale it, and it required manual intervention when rolling out a new version.

Latency

Latency Cache Tuning Efficiency

Dynatrace supports the newly released AWS Lambda Response Streaming

Dynatrace

APRIL 7, 2023

Dynatrace is a launch partner in support of AWS Lambda Response Streaming , a new capability enabling customers to improve the efficiency and performance of their Lambda functions. Customers can use AWS Lambda Response Streaming to improve performance for latency-sensitive applications and return larger payload sizes.

Lambda

Lambda AWS Serverless Latency

The Netflix Cosmos Platform

The Netflix TechBlog

MARCH 1, 2021

It supports both high throughput services that consume hundreds of thousands of CPUs at a time, and latency-sensitive workloads where humans are waiting for the results of a computation. The subsystems all communicate with each other asynchronously via Timestone, a high-scale, low-latency priority queuing system. Warm capacity.

Serverless

Serverless Media Latency Social Media

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Cache

Cache Latency Airlines Logistics

SRE vs DevOps: What you need to know

Dynatrace

FEBRUARY 24, 2021

SRE is the transformation of traditional operations practices by using software engineering and DevOps principles to improve the availability, performance, and scalability of releases by building resiliency into apps and infrastructure. Reduced latency. Efficiency. Investing in automation and tooling to avoid toil.

DevOps

DevOps Software Engineering Speed Google

Mastering Disk Space Management with MongoDB® Storage Engines

Scalegrid

MAY 11, 2024

MongoDB offers several storage engines that cater to various use cases. The default storage engine in earlier versions was MMAPv1, which utilized memory-mapped files and document-level locking. The newer, pluggable storage engine, WiredTiger, addresses this by using prefix compression, collection-level locking, and row-based storage.

Storage

Storage Engineering Cache Database

Service level objectives: 5 SLOs to get started

Dynatrace

JUNE 1, 2023

Note : you might hear the term latency used instead of response time. Both latency and response time are critical to ensure reliability. Latency typically refers to the time it takes for a single request to travel from its source to its destination. Latency primarily focuses on the time spent in transit.

Latency

Latency Website Traffic DevOps

Dynatrace accelerates business transformation with new AI observability solution

Dynatrace

JANUARY 31, 2024

Model observability provides visibility into resource consumption and operation costs, aiding in optimization and ensuring the most efficient use of available resources. Observing AI models Running AI models at scale can be resource-intensive. However, organizations must consider which use cases will bring them the biggest ROI.

Cache

Cache Azure Infrastructure Monitoring

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

SEPTEMBER 24, 2021

Figure 1: A Simplified Video Processing Pipeline With this architecture, chunk encoding is very efficient and processed in distributed cloud computing instances. Uploading and downloading data always come with a penalty, namely latency. In order to do that, the storage cloud object is modeled as a number of fixed size parts.

Cloud

Cloud Media Storage Cache

Taming DORA compliance with AI, observability, and security

Dynatrace

AUGUST 27, 2024

This can require process re-engineering to fill gaps and ensuring clear communication and collaboration across security, operations, and development teams. Moreover, the Davis AI engine assists in prioritizing what needs to be fixed first. Dynatrace Security Analytics can also improve the effectiveness and efficiency of threat hunts.

Best Practices

Best Practices Government DevOps Analytics

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix TechBlog

JANUARY 10, 2024

This architecture shift greatly reduced the processing latency and increased system resiliency. We expanded pipeline support to serve our studio/content-development use cases, which had different latency and resiliency requirements as compared to the traditional streaming use case. divide the input video into small chunks 2.

Processing

Processing Media Latency Innovation

What is AWS Lambda?

Dynatrace

APRIL 5, 2021

The 2014 launch of AWS Lambda marked a milestone in how organizations use cloud services to deliver their applications more efficiently, by running functions at the edge of the cloud without the cost and operational overhead of on-premises servers. AWS continues to improve how it handles latency issues. Dynatrace news.

Lambda

Lambda AWS Serverless Hardware

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

High Scalability

SEPTEMBER 8, 2018

Anna is not only incredibly fast, it’s incredibly efficient and elastic too: an autoscaling, multi-tier, selectively-replicating cloud service. The issue is that Anna is now orders of magnitude more efficient than competing systems, in addition to being orders of magnitude faster. What's changed ?

Storage

Storage Performance AWS Cloud

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

a Netflix member via Twitter This is an example of a question our on-call engineers need to answer to help resolve a member issue?—?which We needed to increase engineering productivity via distributed request tracing. That is the first question our engineering teams asked us when integrating the tracer library.

Infrastructure

Infrastructure Transportation Storage Open Source

Engineering dependability and fault tolerance in a distributed system

High Scalability

FEBRUARY 19, 2021

This means a system that is not merely available but is also engineered with extensive redundant measures to continue to work as its users expect. reliability situations, where continuity of service is essential, with redundant elements continuously in-service, such as with airplane engines. This ensures reliability.

Engineering

Engineering Systems Availability Scalability

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

We have deployed Auto Remediation in production for handling memory configuration errors and unclassified errors of Spark jobs and observed its efficiency and effectiveness (e.g., For efficient error handling, Netflix developed an error classification service, called Pensive, which leverages a rule-based classifier for error classification.

Tuning

Tuning Efficiency Big Data Engineering

Automated observability, security, and reliability at scale

Dynatrace

JULY 18, 2023

However, scaling up software development requires more tools along the software product lifecycle, which must be configured promptly and efficiently. Efficient environment configuration at scale One of software engineers’ most significant challenges is managing the numerous tools and technologies required for the software product lifecycle.

Best Practices

Best Practices Code Infrastructure Latency

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits. This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture.

Storage

Storage Latency Efficiency Data Engineering

DevOps observability: A guide for DevOps and DevSecOps teams

Dynatrace

JANUARY 18, 2023

From site reliability engineering to service-level objectives and DevSecOps, these resources focus on how organizations are using these best practices to innovate at speed without sacrificing quality, reliability, or security. SRE applies software engineering principles to operations and infrastructure processes. What is DevOps?

DevOps

DevOps Best Practices Innovation Strategy

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Over the years, this platform took on support for both elastic online services and fully featured batch workloads supporting use cases across Netflix engineering.

AWS

AWS Entertainment Open Source Benchmarking

How BizDevOps can “shift left” using SLOs to automate quality gates

Dynatrace

MAY 5, 2021

At Perform 2021 , Dynatrace’s Kristof Renders, Services Practice Manager for Autonomous Cloud Enablement, joined Sumit Nagal, Principal Engineer at Intuit, to demonstrate how service-level objectives (SLOs) and business-level objectives (BLOs) can “shift left.” For example, improving latency by as little as 0.1

Benchmarking

Benchmarking Latency Speed Software

DevOps automation: From event-driven automation to answer-driven automation [with causal AI]

Dynatrace

JULY 24, 2023

In the world of DevOps and SRE, DevOps automation answers the undeniable need for efficiency and scalability. This evolution in automation, referred to as answer-driven automation, empowers teams to address complex issues in real time, optimize workflows, and enhance overall operational efficiency.

DevOps

DevOps Traffic Efficiency Servers

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. As data streams grow in complexity, processing efficiency can decline. Increased latency during peak loads. Balancing efficiency with carbon footprint reduction goals.

IoT

IoT Energy Logistics Latency

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

SEPTEMBER 8, 2020

This allowed Android engineers to have much more control and observability over how we get our data. For each route we migrated, we wanted to make sure we were not introducing any regressions: either in the form of missing (or worse, wrong) data, or by increasing the latency of each endpoint. This meant that data that was static (e.g.

Latency

Latency Cache Java Traffic

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

It also improves the engineering productivity by simplifying the existing pipelines and unlocking the new patterns. We will show how we are building a clean and efficient incremental processing solution (IPS) by using Netflix Maestro and Apache Iceberg. There are three common issues that the dataset owners usually face.

Processing

Processing Big Data Efficiency Engineering

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

The Netflix TechBlog

SEPTEMBER 3, 2021

Remote calls are never free; they impose extra latency, increase probability of an error, and consume network bandwidth. The solution we use within the Netflix Studio Engineering is protobuf FieldMask. This (alongside some other techniques like ZigZag encoding for signed types) makes protobuf messages space-efficient.

Design

Design Java Code Servers

Why applying chaos engineering to data-intensive applications matters

How to Scale Elasticsearch to Solve Your Scalability Issues

Trending Sources

Site reliability engineering: 5 things you need to know

Introducing Impressions at Netflix

Site reliability engineering: 5 things to you need to know

Foundation Model for Personalized Recommendation

Enhancing Kubernetes cluster management key to platform engineering success

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

Introducing Netflix’s Key-Value Data Abstraction Layer

Title Launch Observability at Netflix Scale

Best Practices for Scaling RabbitMQ

Noisy Neighbor Detection with eBPF

Optimizing your Kubernetes clusters without breaking the bank

Edgar: Solving Mysteries Faster with Observability

For your eyes only: improving Netflix video quality with neural networks

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Implementing AWS well-architected pillars with automated workflows

Introducing Netflix TimeSeries Data Abstraction Layer

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Dynatrace supports the newly released AWS Lambda Response Streaming

The Netflix Cosmos Platform

Predictive CPU isolation of containers at Netflix

SRE vs DevOps: What you need to know

Mastering Disk Space Management with MongoDB® Storage Engines

Service level objectives: 5 SLOs to get started

Dynatrace accelerates business transformation with new AI observability solution

Netflix Cloud Packaging in the Terabyte Era

Taming DORA compliance with AI, observability, and security

Rebuilding Netflix Video Processing Pipeline with Microservices

What is AWS Lambda?

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

Building Netflix’s Distributed Tracing Infrastructure

Engineering dependability and fault tolerance in a distributed system

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Automated observability, security, and reliability at scale

Optimizing data warehouse storage

DevOps observability: A guide for DevOps and DevSecOps teams

Netflix at AWS re:Invent 2019

How BizDevOps can “shift left” using SLOs to automate quality gates

DevOps automation: From event-driven automation to answer-driven automation [with causal AI]

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Seamlessly Swapping the API backend of the Netflix Android app

Incremental Processing using Netflix Maestro and Apache Iceberg

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

Stay Connected