Remove Latency Remove Software Engineering Remove Traffic
article thumbnail

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

This shift is leading more organizations to hire site reliability engineers to guarantee the reliability and resiliency of their services. How site reliability engineering affects organizations’ bottom line SRE applies the disciplines of software engineering to infrastructure management, both on-premises and in the cloud.

article thumbnail

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

In that scenario, the system would need to deal with the data propagation latency directly, for example, by use of timeouts or client-originated update tracking mechanisms. With traffic growth, a single leader node handling all request volume started becoming overloaded. The cache is kept in sync with the current leader process.

Cache 233
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Automated observability, security, and reliability at scale

Dynatrace

To handle this challenge, enterprises need to automate and streamline the onboarding and lifecycle of tool configurations in the software development processes, including aspects of observability, security, alerting, and remediation. Development teams must set up tailored configurations for each tool and component they’re responsible for.

article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. Netflix runs dozens of stateful services on AWS under strict sub-millisecond tail-latency requirements, which brings unique challenges. Wednesday?—?December

AWS 38
article thumbnail

Achieving observability in async workflows

The Netflix TechBlog

Prodicle Distribution Our service is required to be elastic and handle bursty traffic. We are expected to process 1,000 watermarks for a single distribution in a minute, with non-linear latency growth as the number of watermarks increases. Things got hairy. We wanted a scalable service that was near real-time, 2.

Traffic 166
article thumbnail

Curbing Connection Churn in Zuul

The Netflix TechBlog

That’s a significant amount and certainly more than is necessary relative to the traffic on most clusters. More acutely, if a traffic spike occurs and Zuul instances scale up, it exponentially increases connections open to origins. There is effectively no churn of connections, even at peak traffic.

Traffic 178
article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. Netflix runs dozens of stateful services on AWS under strict sub-millisecond tail-latency requirements, which brings unique challenges. Wednesday?—?December

AWS 100