Document, Latency and Tuning - Technology Performance Pulse

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Its partitioned log architecture supports both queuing and publish-subscribe models, allowing it to handle large-scale event processing with minimal latency. Apache Kafka uses a custom TCP/IP protocol for high throughput and low latency. Apache Kafka, designed for distributed event streaming, maintains low latency at scale.

Latency

Latency Analytics Architecture Storage

OpenTelemetry 101: A nontechnical guide for IT leaders and enthusiasts

Dynatrace

JULY 22, 2024

Traces are used for performance analysis, latency optimization, and root cause analysis. The OpenTelemetry website provides detailed documentation for each language to guide you through the necessary steps to set up your environment. Capture critical performance indicators such as request latency, error rates, and resource usage.

Latency

Latency Best Practices Metrics Open Source

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Applying Netflix DevOps Patterns to Windows

The Netflix TechBlog

AUGUST 22, 2019

The challenges with service management included: Stale documentation OS Updates High cognitive overhead A lack of continuous testing Scaling Image Creation Our existing AMI baking tool Aminator does not support Windows so we had to leverage other tools. Services are more reliable, testable, and documented.

DevOps

DevOps AWS Tuning Infrastructure

Automated observability, security, and reliability at scale

Dynatrace

JULY 18, 2023

Whether tracking internal, workload-centric indicators such as errors, duration, or saturation or focusing on the golden signals and other user-centric views such as availability, latency, traffic, or engagement, SLOs-as-code enables coherent and consistent monitoring throughout the environment at scale.

Best Practices

Best Practices Code Infrastructure Latency

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

Scalegrid

OCTOBER 17, 2019

While there is plenty of well-documented benefits to using a connection pooler, there are some arguments to be made against using one: Introducing a middleware in the communication inevitably introduces some latency. A connection pooler is an almost indispensable part of a production-ready PostgreSQL setup.

Architecture

Architecture Database Latency Servers

Achieving observability in async workflows

The Netflix TechBlog

MAY 14, 2021

Prodicle Distribution Prodicle Distribution allows a production office coordinator to send secure, watermarked documents, such as scripts, to crew members as attachments or links, and track delivery. One distribution job might result in several thousand watermarked documents and links being created.

Traffic

Traffic Java Latency Google

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

By collecting and analyzing key performance metrics of the service over time, we can assess the impact of the new changes and determine if they meet the availability, latency, and performance requirements. They enable us to further fine-tune and configure the system, ensuring the new changes are integrated smoothly and seamlessly.

Traffic

Traffic Metrics Systems Strategy

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Dynatrace

DECEMBER 22, 2019

Amazon ElastiCache (see AWS documentation for Memcached and Redis ). The example below visualizes average latency by API name and stage for a specific AWS API Gateway. Stay tuned for updates in Q1 2020. Amazon CloudFront. Amazon Cognito. Amazon EC2 Spot Fleet. Amazon Elastic Container Service (ECS). Amazon EMR. Requirements.

AWS

AWS Metrics IoT Storage

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Dynatrace

DECEMBER 22, 2019

Amazon ElastiCache (see AWS documentation for Memcached and Redis ). The example below visualizes average latency by API name and stage for a specific AWS API Gateway. Stay tuned for updates in Q1 2020. Amazon CloudFront. Amazon Cognito. Amazon EC2 Spot Fleet. Amazon Elastic Container Service (ECS). Amazon EMR. Requirements.

AWS

AWS Metrics IoT Storage

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly

MARCH 25, 2025

Any scenario in which a student is looking for information that the corpus of documents can answer. Wrong document retrieval : Debug chunking strategy, retrieval method. For example, if youre building a document QA tool, upgrading from basic OCR to AI-powered extractionthink Mistral OCRmight give the biggest lift on your system!

Systems

Systems Development Tuning Monitoring

AI Essentials for Tech Executives

O'Reilly

FEBRUARY 18, 2025

The eval process combines: Human review Model-based evaluation A/B testing The results then inform two parallel streams: Fine-tuning with carefully curated data Prompt engineering improvements These both feed into model improvements, which starts the cycle again. Were experiencing high latency in responses.

Latency

Latency Tuning Metrics Processing

Netflix Video Quality at Scale with Cosmos Microservices

The Netflix TechBlog

NOVEMBER 2, 2021

This enables us to use our scale to increase throughput and reduce latencies. Here, based on the video length, the throughput and latency requirements, available scale etc., To aid our transition, we introduced another Cosmos microservice: the Document Conversion Service (DCS). VQS is called using the measureQuality endpoint.

Media

Media Innovation Metrics Latency

Towards a Reliable Device Management Platform

The Netflix TechBlog

AUGUST 30, 2021

The maintenance costs over time for an Alpakka-Kafka-based solution is much lower than that for the other solutions, as both Akka and Alpakka-Kafka are mature ecosystems in terms of documentation and community support, having been around for at least 12 and 6 years, respectively. million elements.

Latency

Latency Traffic Transportation Cloud

Best Practices for a Seamless MongoDB Upgrade

Percona

NOVEMBER 2, 2023

Improved performance : MongoDB continually fine-tunes its database engine, resulting in faster query execution and reduced latency. ” MongoDB upgrades follow a well-documented and structured approach, ensuring the process goes smoothly.

Best Practices

Best Practices Hardware Tuning Scalability

Software engineering for machine learning: a case study

The Morning Paper

JULY 7, 2019

In addition to availability, our respondents focus most heavily on supporting the following data attributes: “accessibility, accuracy, authoritativeness, freshness, latency, structuredness, ontological typing, connectedness, and semantic joinability.” To address this, rigorous rollout processes are required.

Software Engineering

Software Engineering Engineering Software Software

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

Passive instances across regions are also possible, though it is recommended to operate in the same region as the database host in order to keep the change capture latencies low. Stay Tuned DBLog has additional capabilities which are not covered by this blog post, such as: Ability to capture table schemas without using locks.

Database

Database Traffic Transportation Open Source

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

Passive instances across regions are also possible, though it is recommended to operate in the same region as the database host in order to keep the change capture latencies low. Stay Tuned DBLog has additional capabilities which are not covered by this blog post, such as: Ability to capture table schemas without using locks.

Database

Database Traffic Transportation Open Source

Solaris to Linux Migration 2017

Brendan Gregg

SEPTEMBER 5, 2017

Here's some output from my zfsdist tool, in bcc/BPF, which measures ZFS latency as a histogram on Linux: # zfsdist. Tracing ZFS operation latency. There's a lot about Linux containers that isn't well documented yet, especially since it's a moving target. Hit Ctrl-C to end. ^C

Virtualization

Virtualization AWS Engineering Hardware

Testing MySQL 8.0.16 on Skylake with innodb_spin_wait_pause_multiplier

HammerDB

MAY 5, 2019

However in the Skylake microarchitecture (you can see a list of CPUs here ) the PAUSE instruction changed and in the documentation it says “the latency of the PAUSE instruction in prior generation microarchitectures is about 10 cycles, whereas in Skylake microarchitecture it has been extended to as many as 140 cycles.”

Testing

Testing Tuning Storage Latency

MongoDB Best Practices: Security, Data Modeling, & Schema Design

Percona

APRIL 17, 2023

The main objective of this post is to share my experience over the past years tuning MongoDB and centralize the diverse sources that I crossed in this journey in a unique place. The CFQ works well for many general use cases but lacks latency guarantees. On the other hand, MongoDB schema design takes a document-oriented approach.

Best Practices

Best Practices Design Tuning Database

Using Modern Image Formats: AVIF And WebP

Smashing Magazine

SEPTEMBER 29, 2021

Tip: When evaluating quality, compression and fine-tuning of modern formats, Squoosh.app ’s ability to perform a visual side-by-side comparison is helpful. CDN servers are often located closer to users than origin servers and can have a shorter round-trip times (RTT), improving network latency. Large preview ).

Open Source

Open Source Speed Website Google

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

Here are 8 fallacies of data pipeline The pipeline is reliable Topology is stateless Pipeline is infinitely scalable Processing latency is minimum Everything is observable There is no domino effect Pipeline is cost-effective Data is homogeneous The pipeline is reliable The inconvenient truth is that pipeline is not reliable.

Latency

Latency Analytics Scalability Engineering

Working at Netflix 2017

Brendan Gregg

MAY 16, 2017

A latency outlier issue that happened every 15 minutes. This is documented in the Netflix [culture deck], and after three years I still find it true. For Linux, I've also done tuning, kernel analysis, [gdb], testing of [hist triggers], testing of some perf patches, and contributed a few trivial patches of my own.

Java

Java Entertainment Engineering Scalability

HTTP/3: Practical Deployment Options (Part 3)

Smashing Magazine

SEPTEMBER 6, 2021

Finally, not inlining resources has an added latency cost because the file needs to be requested. As such, a micro-optimization is, again, how you probably need to fine-tune things on a low level to really benefit from it. hundreds of pages spread over more than seven documents. What Does It All Mean? What Does It All Mean?

Network

Network Servers Cache Traffic

SQL Server I/O Basics Chapter #1

SQL Server According to Bob

JANUARY 11, 2020

Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document.

Servers

Servers Cache Media Hardware

HTTP/3: Performance Improvements (Part 2)

Smashing Magazine

AUGUST 22, 2021

Because we are dealing with network protocols here, we will mainly look at network aspects, of which two are most important: latency and bandwidth. Latency can be roughly defined as the time it takes to send a packet from point A (say, the client) to point B (the server). Two-way latency is often called round-trip time (RTT).

Performance

Performance Network Latency Servers

Mastering MongoDB® Timeout Settings

Scalegrid

DECEMBER 14, 2023

Faster detection of a bad connection is accomplished when there’s a lower value, while latency or intermittent network issues may be better served by increasing it slightly – although overly long timeouts can result in application hang-ups with little chance for an established link. Careful consideration must be given before making changes.

Java

Java Network Servers Database

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

John McCalpin

JANUARY 22, 2018

Each of the two vector units can issue one FMA instruction per cycle, assuming that there are enough independent accumulators to tolerate the 6-cycle dependent-operation latency. of the “adjusted peak performance”, there is no longer a significant upside to performance tuning.

Latency

Latency Hardware Code Testing

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

John McCalpin

JANUARY 22, 2018

Each of the two vector units can issue one FMA instruction per cycle, assuming that there are enough independent accumulators to tolerate the 6-cycle dependent-operation latency. of the “adjusted peak performance”, there is no longer a significant upside to performance tuning. vfmadd213pd %zmm16, %zmm17, %zmm26.

Latency

Latency Hardware Code Testing

SQL Server I/O Basics Chapter #2

SQL Server According to Bob

JANUARY 11, 2020

Copyright The information that is contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. After reading this document you will better understand SQL Server I/O needs and capabilities.

Servers

Servers Cache Database Media

Will AWS Have Anything New To Say About Sustainability at re:Invent 2024?

Adrian Cockcroft

NOVEMBER 18, 2024

Discover how their solution saves customers hours of manual effort by automating the analysis of tens of thousands of documents to better manage investor events, report internally to executive teams, and find new investors to target. This lightning talk provides environmental sustainability insights that are specific to large language models.

AWS

AWS Energy Lambda Government

Technology Performance Pulse

Netflix’s Distributed Counter Abstraction

RabbitMQ vs. Kafka: Key Differences

Trending Sources

OpenTelemetry 101: A nontechnical guide for IT leaders and enthusiasts

Introducing Netflix TimeSeries Data Abstraction Layer

Applying Netflix DevOps Patterns to Windows

Automated observability, security, and reliability at scale

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

Achieving observability in async workflows

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

AI Essentials for Tech Executives

Netflix Video Quality at Scale with Cosmos Microservices

Towards a Reliable Device Management Platform

Best Practices for a Seamless MongoDB Upgrade

Software engineering for machine learning: a case study

DBLog: A Generic Change-Data-Capture Framework

DBLog: A Generic Change-Data-Capture Framework

Solaris to Linux Migration 2017

Testing MySQL 8.0.16 on Skylake with innodb_spin_wait_pause_multiplier

MongoDB Best Practices: Security, Data Modeling, & Schema Design

Using Modern Image Formats: AVIF And WebP

Friends don't let friends build data pipelines

Working at Netflix 2017

HTTP/3: Practical Deployment Options (Part 3)

SQL Server I/O Basics Chapter #1

HTTP/3: Performance Improvements (Part 2)

Mastering MongoDB® Timeout Settings

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

SQL Server I/O Basics Chapter #2

Will AWS Have Anything New To Say About Sustainability at re:Invent 2024?

Stay Connected