Remove Latency Remove Metrics Remove Traffic
article thumbnail

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. This approach has a handful of benefits.

Traffic 347
article thumbnail

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic 285
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Optimising for High Latency Environments

CSS Wizardry

This gives fascinating insights into the network topography of our visitors, and how much we might be impacted by high latency regions. Round-trip-time (RTT) is basically a measure of latency—how long did it take to get from one endpoint to another and back again? RTT data should be seen as an insight and not a metric.

Latency 222
article thumbnail

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

So, we relied on higher-level metrics-based testing: AB Testing and Sticky Canaries. The control group’s traffic utilized the legacy Falcor stack, while the experiment population leveraged the new GraphQL client and was directed to the GraphQL Shim. The Replay Tester tool samples raw traffic streams from Mantis.

Traffic 358
article thumbnail

Seeing through hardware counters: a journey to threefold performance increase

The Netflix TechBlog

A quick canary test was free of errors and showed lower latency, which is expected given that our standard canary setup routes an equal amount of traffic to both the baseline running on 4xl and the canary on 12xl. What’s worse, average latency degraded by more than 50%, with both CPU and latency patterns becoming more “choppy.”

Hardware 363
article thumbnail

Investigation of a Workbench UI Latency Issue

The Netflix TechBlog

Using this approach, we observed latencies ranging from 1 to 10 seconds, averaging 7.4 Blame The Notebook Now that we have an objective metric for the slowness, let’s officially start our investigation. Meanwhile, traffic from other ports, such as port 22 for SSH, remained unaffected. We then exported the .har

Latency 215
article thumbnail

Noisy Neighbor Detection with eBPF

The Netflix TechBlog

Continuous Instrumentation of the Linux Scheduler To ensure the reliability of our workloads that depend on low latency responses, we instrumented the run queue latency for each container, which measures the time processes spend in the scheduling queue before being dispatched to the CPU.

Latency 262