Availability, Testing and Traffic - Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. This approach has a handful of benefits.

Traffic

Traffic Latency Tuning Systems

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic

Traffic Metrics Systems Strategy

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Smashing Magazine

JANUARY 7, 2025

How To Design For High-Traffic Events And Prevent Your Website From Crashing How To Design For High-Traffic Events And Prevent Your Website From Crashing Saad Khan 2025-01-07T14:00:00+00:00 2025-01-07T22:04:48+00:00 This article is sponsored by Cloudways Product launches and sales typically attract large volumes of traffic.

Traffic

Traffic Website Design Cache

Dynatrace Cost & Carbon Optimization certified for accuracy and transparency

Dynatrace

MARCH 5, 2025

Industry certification for Dynatrace Cost & Carbon Optimization To enhance the trust our customers and partners have in our approach, we commissioned the Sustainable Digital Infrastructure Alliance (SDIA) to test and certify the Cost & Carbon Optimization app. The certification results are now publicly available.

Energy

Energy Analytics Traffic Cloud

Managing High Availability in PostgreSQL – Part III: Patroni

Scalegrid

AUGUST 22, 2019

In the final post of this series, we will review the last solution, Patroni by Zalando, and compare all three at the end so you can determine which high availability framework is best for your PostgreSQL hosting deployment. Managing High Availability in PostgreSQL – Part I: PostgreSQL Automatic Failover. Standby Server Tests.

Availability

Availability Servers Network Testing

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Scalegrid

SEPTEMBER 5, 2024

Managing High Availability (HA) in your PostgreSQL hosting is very important to ensuring your database deployment clusters maintain exceptional uptime and strong operational performance so your data is always available to your application. Effective management of failover and switchover operations is crucial for high availability.

Availability

Availability Servers Database Open Source

Ensuring the Successful Launch of Ads on Netflix

The Netflix TechBlog

JUNE 1, 2023

To do this, we devised a novel way to simulate the projected traffic weeks ahead of launch by building upon the traffic migration framework described here. New content or national events may drive brief spikes, but, by and large, traffic is usually smoothly increasing or decreasing.

Traffic

Traffic Best Practices Systems Testing

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

The three strategies we will discuss today are AB Testing , Replay Testing, and Sticky Canaries. To launch Phase 1 safely, we used AB Testing. To launch Phase 2 safely, we used Replay Testing and Sticky Canaries. We knew we could test the same query with the same inputs and consistently expect the same results.

Traffic

Traffic Latency Metrics Cache

What is synthetic testing?

Dynatrace

OCTOBER 16, 2023

Synthetic testing simulates real-user behaviors within an application or service to pinpoint potential problems. Here’s a look at why this testing matters, how it works, and what companies need to get the most from this approach. What is synthetic testing? RUM, meanwhile, requires actual users.

Testing

Testing Best Practices Testing Tools Monitoring

7 Best Performance Testing Tools to Look Out for in 2021

DZone

DECEMBER 28, 2020

The system could work efficiently with a specific number of concurrent users; however, it may get dysfunctional with extra loads during peak traffic. Performances testing helps establish the scalability, stability, and speed of the software application. Confirming scalability, dependability, stability, and speed of the app is crucial.

Performance Testing

Performance Testing Testing Tools Testing Performance

OneAgent for Linux on IBM Z (General Availability)

Dynatrace

NOVEMBER 20, 2019

Having released this functionality in an Early Adopter Release with OneAgent version 1.173 and Dynatrace version 1.174 back in August 2019, we’re now happy to announce the General Availability of OneAgent full-stack monitoring for Linux on the IBM Z platform, sometimes informally referred to as Z/Linux. Release details.

Availability

Availability Hardware Java Tuning

PyMongo Tutorial: Testing MongoDB Failover in Your Python App

Scalegrid

MAY 2, 2019

When deploying in production, it’s highly recommended to setup in a MongoDB replica set configuration so your data is geographically distributed for high availability. It is also recommended that SSL connections be enabled to encrypt the client-database traffic. Testing Failover Behavior.

Testing

Testing Network Database Servers

Keeping Netflix Reliable Using Prioritized Load Shedding

The Netflix TechBlog

NOVEMBER 2, 2020

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. Logs and background requests are examples of this type of traffic.

Traffic

Traffic Metrics Infrastructure Architecture

Automate CI/CD pipelines with Dynatrace: Part 2, Deploy stage

Dynatrace

NOVEMBER 28, 2023

This step is crucial as this environment is used for the final validation and testing phase before the code is released into production. This can lead to a lack of insight into how the code will behave when exposed to heavy traffic. Furthermore, augmenting test coverage to mirror the scenarios encountered in production is imperative.

Traffic

Traffic Best Practices Strategy Engineering

COVID-19 and Digital Services: An Action Plan for the Unexpected

Dynatrace

APRIL 22, 2020

While most government agencies and commercial enterprises have digital services in place, the current volume of usage — including traffic to critical employment, health and retail/eCommerce services — has reached levels that many organizations have never seen before or tested against. So how do you know what to prepare for?

Traffic

Traffic Ecommerce Retail Government

9 key DevOps metrics for success

Dynatrace

SEPTEMBER 28, 2021

Two important ways to improve this metric are to implement quality assurance testing throughout multiple development environments and to automate testing and DevOps processes. A change failure rate above 40% can indicate poor testing procedures, which means teams will need to make more changes than necessary, eroding efficiency.

DevOps

DevOps Metrics Traffic Efficiency

Architected for resiliency: How Dynatrace withstands data center outages

Dynatrace

JUNE 15, 2021

The subject line said: “Success Story: Major Issue in single AWS Frankfurt Availability Zone!” The problem started at 1:24PM PDT, with the services starting to become available again about 3 hours later. This number was so low because the automatic traffic redirect was so fast it kept the impact so low.

AWS

AWS Traffic Architecture Azure

Helping your digital services run optimally for your customers and employees during COVID-19

Dynatrace

APRIL 16, 2020

With most employees now working from home, and the demand on e-commerce platforms hits an all-time high, applications and infrastructure are under intense pressure with new usage patterns that have never been planned for or tested against. SaaS vendor RUM functionality is available for free for new users through September 19, 2020.

Traffic

Traffic Government Database Network

Measuring Network Performance in Mobile Safari

CSS Wizardry

FEBRUARY 25, 2021

Google has a pretty tight grip on the tech industry: it makes by far the most popular browser with the best DevTools, and the most popular search engine, which means that web developers spend most of their time in Chrome, most of their visitors are in Chrome, and a lot of their search traffic will be coming from Google. Why This Is a Problem.

Network

Network Mobile Performance Traffic

Service level objectives: 5 SLOs to get started

Dynatrace

JUNE 1, 2023

These organizations rely heavily on performance, availability, and user satisfaction to drive sales and retain customers. Availability Availability SLO quantifies the expected level of service availability over a specific time period. Availability is typically expressed in 9’s, such as 99.9%. or 99.99% of the time.

Latency

Latency Website Traffic DevOps

What is a service mesh?

Dynatrace

MAY 21, 2021

This becomes even more challenging when the application receives heavy traffic, because a single microservice might become overwhelmed if it receives too many requests too quickly. The Envoy proxies also collect and report telemetry on all traffic among the services in the mesh. Why do you need a service mesh?

Traffic

Traffic DevOps Infrastructure Network

OneAgent for Linux on IBM Z now available in Early Adopter Release

Dynatrace

AUGUST 8, 2019

We’re happy to announce the Early Adopter Release of OneAgent full-stack monitoring for Linux on the IBM Z platform, sometimes informally referred to as Z/Linux (available with OneAgent version 1.173 and Dynatrace version 1.174). For details on available metrics, see our help page on host performance monitoring. Dynatrace news.

Availability

Availability Hardware Java Tuning

The Ultimate Guide to Database High Availability

Percona

JUNE 22, 2023

To make data count and to ensure cloud computing is unabated, companies and organizations must have highly available databases. This guide provides an overview of what high availability means, the components involved, how to measure high availability, and how to achieve it. How does high availability work?

Availability

Availability Database Open Source Hardware

Kubernetes vs Docker: What’s the difference?

Dynatrace

SEPTEMBER 29, 2021

The time and effort saved with testing and deployment are a game-changer for DevOps. This opens the door to auto-scalable applications, which effortlessly matches the demands of rapidly growing and varying user traffic. Containers can be replicated or deleted on the fly to meet varying end-user traffic. What is Docker?

Open Source

Open Source DevOps Traffic Cloud

Geek Reading - Week of June 5, 2013

DZone

OCTOBER 11, 2022

Making Google’s CalDAV and CardDAV APIs available for everyone ( Google Developers Blog). Improving testing by using real traffic from production ( Hacker News). Improving testing by using real traffic from production ( Hacker News). SAP to acquire Hybris to jumpstart its presence in e-commerce ( VentureBeat).

Java

Java Best Practices Google Analytics

Seeing through hardware counters: a journey to threefold performance increase

The Netflix TechBlog

NOVEMBER 9, 2022

At Netflix, we periodically reevaluate our workloads to optimize utilization of available capacity. A quick canary test was free of errors and showed lower latency, which is expected given that our standard canary setup routes an equal amount of traffic to both the baseline running on 4xl and the canary on 12xl.

Hardware

Hardware Cache Performance Latency

What are quality gates? How to use quality gates to deliver better software at speed and scale

Dynatrace

FEBRUARY 21, 2024

Quality gates after load/performance testing Teams can use quality gates to evaluate performance metrics. Before a new version of the application is deployed, the software is subject to a series of load tests that evaluate capacity and performance under a series of simulated traffic and application demands.

Speed

Speed Software Software Latency

Efficient SLO event integration powers successful AIOps

Dynatrace

APRIL 5, 2024

When the SLO status converges to an optimal value of 100%, and there’s substantial traffic (calls/min), BurnRate becomes more relevant for anomaly detection. Let’s assume we created a service-availability SLO, monitoring the request failure count against the overall request counts. What characterizes a weak SLO?

Efficiency

Efficiency Traffic Tuning Metrics

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

Cloud migration is the process of transferring some or all your data, software, and operations to a cloud-based computing environment that offers unlimited scale and high availability. In case of a spike in traffic, you can automatically spin up more resources, often in a matter of seconds. Improved performance and availability.

Cloud

Cloud Traffic Best Practices Strategy

Real user monitoring vs. synthetic monitoring: Understanding best practices

Dynatrace

JUNE 27, 2022

These development and testing practices ensure the performance of critical applications and resources to deliver loyalty-building user experiences. RUM, however, has some limitations, including the following: RUM requires traffic to be useful. For example, in e-commerce, you can validate and test checking out a shopping cart.

Best Practices

Best Practices Monitoring Wireless Traffic

Ready-to-Use High Availability Architectures for MySQL and PostgreSQL

Percona

JUNE 12, 2023

When it comes to access to their applications, users demand instant, reliable, and secure interactions — and that means databases must be highly available. With database high availability (HA), services are largely uninterrupted, and end users are largely satisfied. The obvious answer is this: To achieve high availability.

Architecture

Architecture Availability Open Source Healthcare

Automated Change Impact Analysis with Site Reliability Guardian

Dynatrace

FEBRUARY 15, 2023

SREs use Service-Level Indicators (SLI) to see the complete picture of service availability, latency, performance, and capacity across various systems, especially revenue-critical systems. This is all available out-of-the-box with the default workflow template provided by Site Reliability Guardian.

DevOps

DevOps Latency Traffic Best Practices

Is working-from-home affecting productivity? Use Dynatrace to find out and optimize!

Dynatrace

MARCH 25, 2020

Thomas has set up Dynatrace Real User Monitoring in a way for it to monitor internal and external traffic separately. Splitting traffic into two separate applications also allows you to: Enforce different SLAs for internal vs external. Example #2 ensuring DevOps tool chain availability at Dynatrace.

DevOps

DevOps Traffic Monitoring Engineering

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

Scalegrid

APRIL 16, 2020

Each of these models is suitable for production deployments and high traffic applications, and are available for all of our supported databases, including MySQL , PostgreSQL , Redis™ and MongoDB® database ( Greenplum® database coming soon). This can result in significant cost savings for high traffic applications. Expert Tip.

Cloud

Cloud Azure AWS Database

Implementing service-level objectives to improve software quality

Dynatrace

DECEMBER 27, 2022

Stable, well-calibrated SLOs pave the way for teams to automate additional processes and testing throughout the software delivery lifecycle. First, it helps to understand that applications and all the services and infrastructure that support them generate telemetry data based on traffic from real users. Availability.

Software

Software Software Benchmarking Latency

Types Of Performance Testing and When to Use Them

DZone

FEBRUARY 26, 2021

To ensure that users get high-performing software that works seamlessly under all load conditions, performance testing is necessary. This test helps to measure the speed, scalability, reliability, and stability of software under varying loads, thus it ensures stable performance. Today, let's learn more about this testing type in depth.

Performance Testing

Performance Testing Testing Performance Latency

Dynatrace Application Security detects and blocks attacks automatically in real-time

Dynatrace

FEBRUARY 10, 2022

Static Application Security Testing (SAST) solutions are a traditional way of addressing this. WAFs protect the network perimeter and monitor, filter, or block HTTP traffic. Compared to intrusion detection systems (IDS/IPS), WAFs are focused on the application traffic. Unfortunately, they also introduce risk. How to get started.

Traffic

Traffic Benchmarking Innovation Java

All of Netflix’s HDR video streaming is now dynamically optimized

The Netflix TechBlog

NOVEMBER 29, 2023

HDR was launched at Netflix in 2016 and the number of titles available in HDR has been growing ever since. A vital aspect of such development is subjective testing with HDR encodes in order to generate training data. The pandemic, however, posed unique challenges in conducting a conventional in-lab subjective test with HDR encodes.

Open Source

Open Source Software Engineering Internet Internet

Service level objective examples: 5 SLO examples for faster, more reliable apps

Dynatrace

JUNE 1, 2023

These organizations rely heavily on performance, availability, and user satisfaction to drive sales and retain customers. Availability Availability SLO quantifies the expected level of service availability over a specific time period. Availability is typically expressed in 9’s, such as 99.9%. or 99.99% of the time.

Traffic

Traffic Website Latency DevOps

Towards a Reliable Device Management Platform

The Netflix TechBlog

AUGUST 30, 2021

By Benson Ma , Alok Ahuja Introduction At Netflix, hundreds of different device types, from streaming sticks to smart TVs, are tested every day through automation to ensure that new software releases continue to deliver the quality of the Netflix experience that our customers enjoy. In this blog post, we will focus on the latter feature set.

Latency

Latency Traffic Transportation Cloud

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

In PACELC terms we choose PC/EC and have the same level of availability for writes of our previous system while improving our theoretical availability for reads. With traffic growth, a single leader node handling all request volume started becoming overloaded. A single request in the tests below consists of one query.

Cache

Cache Latency Traffic Systems

What is infrastructure as code? Discover the basics, benefits, and best practices

Dynatrace

JUNE 10, 2022

In large organizations, it’s not uncommon to have hundreds of applications — each with its own specific infrastructure requirements based on architecture, function, traffic, and more. Test, test, test. Because IAC is code, it needs regular testing. IAC solves the issue of complex infrastructure environments.

Best Practices

Best Practices Infrastructure Code Speed

Dynatrace ensures continuous software quality by combining synthetic monitoring and automatic release validation

Dynatrace

JUNE 28, 2022

With this approach, teams can scale testing for all environments, which reduces efforts in replicating, updating, and maintaining test scripts. The ability to scale testing as part of the software development lifecycle (SDLC) has proven difficult. No external tools or additional configurations are needed.

Monitoring

Monitoring Software Software DevOps

How to start with SLOs to align Business, DevOps, and SREs

Dynatrace

DECEMBER 16, 2021

If that service is slow, failing, or not available at all it results in frustration mentioned in some of the comments on social media and the app store. It’s the same concept as Test Driven Development (TDD) where you start with tests that will fail until you finish implementing the code so tests will succeed.

DevOps

DevOps Social Media Mobile Metrics

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Trending Sources

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Dynatrace Cost & Carbon Optimization certified for accuracy and transparency

Managing High Availability in PostgreSQL – Part III: Patroni

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Ensuring the Successful Launch of Ads on Netflix

Migrating Netflix to GraphQL Safely

What is synthetic testing?

7 Best Performance Testing Tools to Look Out for in 2021

OneAgent for Linux on IBM Z (General Availability)

PyMongo Tutorial: Testing MongoDB Failover in Your Python App

Keeping Netflix Reliable Using Prioritized Load Shedding

Automate CI/CD pipelines with Dynatrace: Part 2, Deploy stage

COVID-19 and Digital Services: An Action Plan for the Unexpected

9 key DevOps metrics for success

Architected for resiliency: How Dynatrace withstands data center outages

Helping your digital services run optimally for your customers and employees during COVID-19

Measuring Network Performance in Mobile Safari

Service level objectives: 5 SLOs to get started

What is a service mesh?

OneAgent for Linux on IBM Z now available in Early Adopter Release

The Ultimate Guide to Database High Availability

Kubernetes vs Docker: What’s the difference?

Geek Reading - Week of June 5, 2013

Seeing through hardware counters: a journey to threefold performance increase

What are quality gates? How to use quality gates to deliver better software at speed and scale

Efficient SLO event integration powers successful AIOps

What is cloud migration?

Real user monitoring vs. synthetic monitoring: Understanding best practices

Ready-to-Use High Availability Architectures for MySQL and PostgreSQL

Automated Change Impact Analysis with Site Reliability Guardian

Is working-from-home affecting productivity? Use Dynatrace to find out and optimize!

Bring Your Own Cloud (BYOC) vs. Dedicated Hosting at ScaleGrid

Implementing service-level objectives to improve software quality

Types Of Performance Testing and When to Use Them

Dynatrace Application Security detects and blocks attacks automatically in real-time

All of Netflix’s HDR video streaming is now dynamically optimized

Service level objective examples: 5 SLO examples for faster, more reliable apps

Towards a Reliable Device Management Platform

Consistent caching mechanism in Titus Gateway

What is infrastructure as code? Discover the basics, benefits, and best practices

Dynatrace ensures continuous software quality by combining synthetic monitoring and automatic release validation

How to start with SLOs to align Business, DevOps, and SREs

Stay Connected