Servers, Systems and Traffic - Technology Performance Pulse

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience.

Traffic

Traffic Latency Tuning Systems

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

To this end, we developed a Rapid Event Notification System (RENO) to support use cases that require server initiated communication with devices in a scalable and extensible manner. In this blog post, we will give an overview of the Rapid Event Notification System at Netflix and share some of the learnings we gained along the way.

Systems

Systems Traffic Architecture Mobile

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 2 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. This is where large-scale system migrations come into play.

Traffic

Traffic Metrics Systems Strategy

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Smashing Magazine

JANUARY 7, 2025

How To Design For High-Traffic Events And Prevent Your Website From Crashing How To Design For High-Traffic Events And Prevent Your Website From Crashing Saad Khan 2025-01-07T14:00:00+00:00 2025-01-07T22:04:48+00:00 This article is sponsored by Cloudways Product launches and sales typically attract large volumes of traffic.

Traffic

Traffic Website Design Cache

Optimizing Server Management With HAProxy’s Advanced Health Checks

DZone

DECEMBER 11, 2023

HAProxy is one of the cornerstones in complex distributed systems, essential for achieving efficient load balancing and high availability. This open-source software, lauded for its reliability and high performance, is a vital tool in the arsenal of network administrators, adept at managing web traffic across diverse server environments.

Servers

Servers Traffic Open Source Games

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

Before GraphQL: Monolithic Falcor API implemented and maintained by the API Team Before moving to GraphQL, our API layer consisted of a monolithic server built with Falcor. A single API team maintained both the Java implementation of the Falcor framework and the API Server. To launch Phase 1 safely, we used AB Testing.

Traffic

Traffic Latency Metrics Cache

A Dynatrace champions guide to get ahead of digital marketing campaigns

Dynatrace

JULY 1, 2020

In my last blog , I’ve provided an example of this happening, whereby the traffic spiked and quadrupled the usual incoming traffic. These are all interesting metrics from marketing point of view, and also highly interesting to you as they allow you to engage with the teams that are driving the traffic against your IT-system.

Traffic

Traffic Analytics Metrics Servers

Choosing the Appropriate AWS Load Balancer: ALB vs. NLB

DZone

SEPTEMBER 14, 2023

With the advent of cloud computing, managing network traffic and ensuring optimal performance have become critical aspects of system architecture. Amazon Web Services (AWS), a leading cloud service provider, offers a suite of load balancers to manage network traffic effectively for applications running on its platform.

AWS

AWS Traffic Network Architecture

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. In this multi-part blog series, we take you behind the scenes of our system that processes billions of impressions daily.

Tuning

Tuning Latency Efficiency Storage

5 powerful use cases beyond debugging for Dynatrace Live Debugger

Dynatrace

MARCH 25, 2025

You can verify any system settings that might impact your tests and see them in action. Load generators simulate traffic. Maybe you want to monitor performance under different system loads. Or maybe you want to correlate an event with other events in your system. In many ways, it’s more of an art than a science.

Benchmarking

Benchmarking Code Open Source Engineering

Detecting RegreSSHion with Dynatrace (CVE-2024-6387)

Dynatrace

JULY 2, 2024

The Qualys Threat Research Unit (TRU) has discovered a Remote Unauthenticated Code Execution (RCE) vulnerability in OpenSSH server (sshd) in glibc-based Linux systems. This can result in a complete system takeover, malware installation, data manipulation, and the creation of backdoors for persistent access.

AWS

AWS Network Traffic Servers

The new normal of digital experience delivery – lessons learned from monitoring mission-critical websites during COVID-19

Dynatrace

MAY 6, 2020

Over the last two month s, w e’ve monito red key sites and applications across industries that have been receiving surges in traffic , including government, health insurance, retail, banking, and media. The following day, a normally mundane Wednesday , traffic soared to 128,000 sessions.

Website

Website Monitoring Retail Media

COVID-19 and Digital Services: An Action Plan for the Unexpected

Dynatrace

APRIL 22, 2020

All of this puts a lot of pressure on IT systems and applications. Step 1: Understand Traffic Patterns and Potential Spikes; Remove Team Silos. The impact of traffic spikes is illustrated by the load that eCommerce web sites typically see during Black Friday. The next step is to understand when your system is going to break.

Traffic

Traffic Ecommerce Retail Government

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Introduction to Message Brokers Message brokers enable applications, services, and systems to communicate by acting as intermediaries between senders and receivers. This decoupling simplifies system architecture and supports scalability in distributed environments.

Latency

Latency Analytics Architecture Storage

Six causes of major software outages–And how to avoid them

Dynatrace

AUGUST 8, 2024

Possible scenarios A Distributed Denial of Service (DDoS) attack overwhelms servers with traffic, making a website or service unavailable. Ransomware encrypts essential data, locking users out of systems and halting operations until a ransom is paid. Human error Human error remains one of the leading causes of tech outages.

Software

Software Software Infrastructure Network

Kubernetes vs Docker: What’s the difference?

Dynatrace

SEPTEMBER 29, 2021

Think of containers as the packaging for microservices that separate the content from its environment – the underlying operating system and infrastructure. A standard Docker container can run anywhere, on a personal computer (for example, PC, Mac, Linux), in the cloud, on local servers, and even on edge devices. What is Docker?

Open Source

Open Source DevOps Traffic Cloud

What is vulnerability management? And why runtime vulnerability detection makes the difference

Dynatrace

AUGUST 18, 2022

But managing the breadth of the vulnerabilities that can put your systems at risk is challenging. Security vulnerabilities are weaknesses in applications, operating systems, networks, and other IT services and infrastructure that would allow an attacker to compromise a system, steal data, or otherwise disrupt IT operations.

Traffic

Traffic Java Network DevOps

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Scalegrid

SEPTEMBER 5, 2024

If the primary server encounters issues, operations are smoothly transitioned to a standby server with minimal interruption. Key Takeaways PostgreSQL automatic failover enhances high availability by seamlessly switching to standby servers during primary server failures, minimizing downtime, and maintaining business continuity.

Availability

Availability Servers Database Open Source

Managing High Availability in PostgreSQL – Part III: Patroni

Scalegrid

AUGUST 22, 2019

In a distributed system, consensus plays an important role in determining consistency, and Patroni uses DCS to attain consensus. This way, at any point in time, there can only be one master running in the system. Standby Server Tests. Reboot the server. patronictl list did not display this server. Test Scenario.

Availability

Availability Servers Network Testing

CrowdStrike update crisis: How Dynatrace helped customers recover in hours

Dynatrace

JULY 31, 2024

The resulting outages wreaked havoc on customer experiences and left IT professionals scrambling to quickly find and repair affected systems. Dynatrace offers various out-of-the-box features and applications to provide a high-density overview of system health for all hosts and related metrics in a single view.

Airlines

Airlines Monitoring Healthcare Traffic

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. This guide delves into how these systems work, the challenges they solve, and their essential role in businesses and technology.

Storage

Storage Systems Big Data Azure

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Dynatrace

APRIL 25, 2023

For example, to handle traffic spikes and pay only for what they use. Observability is essential to ensure the reliability, security and quality of any software system. However, serverless applications have unique characteristics that make observability more difficult than in traditional server-based applications.

Serverless

Serverless Lambda Azure AWS

Observe syslog with Dynatrace ActiveGate, a secure, trusted edge component

Dynatrace

JULY 15, 2024

These include traditional on-premises network devices and servers for infrastructure applications like databases, websites, or email. One change to send syslog to Dynatrace You can now use the syslog ingestion endpoint on Dynatrace Environment ActiveGate for performant network and system monitoring.

Infrastructure

Infrastructure Network Azure Monitoring

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

message Item ( Bytes key, Bytes value, Metadata metadata, Integer chunk ) Database Agnostic Abstraction The KV abstraction is designed to hide the implementation details of the underlying database, offering a consistent interface to application developers regardless of the optimal storage system for that use case.

Latency

Latency Storage Cache Servers

Automatic intelligent observability into Envoy-proxied services of your Istio service mesh (GA)

Dynatrace

OCTOBER 13, 2021

The increasing number of smaller, decoupled services brings new challenges for controlling complexity within systems. Istio manages this with the help of Envoy, a lightweight remote configurable proxy server that can dynamically route traffic through a service mesh. Are you new to Dynatrace?

Traffic

Traffic Monitoring Technology Technology

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

In case of a spike in traffic, you can automatically spin up more resources, often in a matter of seconds. Likewise, you can scale down when your application experiences decreased traffic. For example, as traffic increases, costs will too. Analyze your resource consumption and traffic patterns. Reduced cost.

Cloud

Cloud Traffic Best Practices Strategy

DevOps automation: From event-driven automation to answer-driven automation [with causal AI]

Dynatrace

JULY 24, 2023

Consider an event-driven automation system designed for incident management. When a server experiences an outage, the system promptly triggers an alert and initiates actions like restarting a server or redirecting traffic to a redundant server. But it doesn’t stop there.

DevOps

DevOps Traffic Efficiency Servers

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

As the number of Titus users increased over the years, the load and pressure on the system increased substantially. cell): Titus Job Coordinator is a leader elected process managing the active state of the system. For example, a batch workflow orchestration system may create multiple jobs which are part of a single workflow execution.

Cache

Cache Latency Traffic Systems

The road to observability with OpenTelemetry demo part 1: Identifying metrics and traces

Dynatrace

MAY 17, 2023

Anyone who’s concerned with developing, delivering, and operating software knows the importance of making software and the systems it runs on observable. When software runs in a monolithic stack on on-site servers, observability is manageable enough. Why should I adopt observability?

Metrics

Metrics Open Source Traffic Cache

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Dynatrace

JUNE 25, 2020

Minimized cross-data center network traffic. By utilizing embedded smart routing capabilities, Dynatrace minimizes cross-region network traffic—OneAgent traffic stays within the same network region. You can set up different proxy servers for the Mission Control uplink for each data center. Self-contained turnkey solution.

Availability

Availability Hardware Latency Traffic

What is web application security? Everything you need to know.

Dynatrace

JUNE 9, 2021

A web application is any application that runs on a web server and is accessed by a user through a web browser. The Marriott data breach, in which one of its reservation systems had been compromised and hundreds of millions of customer records, including credit card and passport numbers, were stolen. What is web application security?

Open Source

Open Source Entertainment Tuning Internet

How to Optimize Digital Experience and Operations with Dynatrace

Dynatrace

AUGUST 30, 2019

She was speaking about how her team is providing Visibility as a Service (VaaS) in order to continuously monitor and optimize their systems running across private and public cloud environments. A big factor in good Digital Performance is the back-end system that powers your digitally offered use-cases.

Cache

Cache Database Architecture Government

Achieving 100Gbps intrusion prevention on a single server

The Morning Paper

NOVEMBER 15, 2020

Achieving 100 Gbps intrusion prevention on a single server , Zhao et al., This stems from a combination of Jevon’s paradox and the interconnectedness of systems – doing more in one area often leads to a need for more elsewhere too. Today’s paper choice is a wonderful example of pushing the state of the art on a single server.

Servers

Servers Hardware Latency Design

Simplified observability for your SNMP devices

Dynatrace

MARCH 22, 2021

To keep infrastructure and bare metal servers running smoothly, a long list of additional devices are used, such as UPS devices, rack cases that provide their own cooling, power sources, and other measures that are designed to prevent failures. But manual configuration of observability for systems like this is nearly impossible.

Metrics

Metrics Network Infrastructure Traffic

Optimizing Java XPath CPU and memory overhead by 98%

Dynatrace

MARCH 9, 2022

Therefore, it was unsurprising to see a huge spike in traffic for Family Visa enrollment via Metrash. The system saw up to 800 application requests per second – far more than anticipated. More worrisome was a spike in CPU usage, resulting in severe service disruption as backend processing systems crashed due to the spike in load.

Java

Java Traffic Government Code

Is working-from-home affecting productivity? Use Dynatrace to find out and optimize!

Dynatrace

MARCH 25, 2020

Are the systems we rely on every day as reliable via the home internet connection? Example #1 Order System: No change in user or buyers’ behavior. Both types of users mentioned access the same system, the only difference is that employees access it via the internal network and externals through a public exposed URL.

DevOps

DevOps Traffic Monitoring Engineering

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Note: Contrary to what the name may suggest, this system is not built as a general-purpose time series database. Those use cases are well served by the Netflix Atlas telemetry system. Effectively managing this data at scale to extract valuable insights is crucial for ensuring optimal user experiences and system reliability.

Latency

Latency Storage Traffic Tuning

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Dynatrace

MAY 17, 2023

Think about items such as general system metrics (for example, CPU utilization, free memory, number of services), the connectivity status, details of our web server, or even more granular in-application tasks like database queries. Let’s click “Apache Web Server apache” now.

Metrics

Metrics Database Monitoring Network

Lessons learned from enterprise service-level objective management

Dynatrace

MAY 19, 2022

Every organization’s goal is to keep its systems available and resilient to support business demands. Lastly, error budgets, as the difference between a current state and the target, represent the maximum amount of time a system can fail per the contractual agreement without repercussions. Dynatrace news. A world of misunderstandings.

Automotive

Automotive Latency Architecture Azure

Python at Netflix

The Netflix TechBlog

APRIL 29, 2019

Content is placed on the network of servers in the Open Connect CDN as close to the end user as possible, improving the streaming experience for our customers and reducing costs for both Netflix and our Internet Service Provider (ISP) partners. We have created streams of events from a number of systems that get unified into a single tool.

Open Source

Open Source Network Infrastructure Big Data

Edgar: Solving Mysteries Faster with Observability

The Netflix TechBlog

SEPTEMBER 2, 2020

Edgar helps Netflix teams troubleshoot distributed systems efficiently with the help of a summarized presentation of request tracing, logs, analysis, and metadata. The more complex a system, the more places to look for clues. In an earlier blog post, we discussed Telltale , our health monitoring system. What is Edgar?

Latency

Latency Transportation Engineering Traffic

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

Uptime Institute’s 2022 Outage Analysis report found that over 60% of system outages resulted in at least $100,000 in total losses, up from 39% in 2019. At the lowest level, SLIs provide a view of service availability, latency, performance, and capacity across systems. More than one in seven outages cost more than $1 million.

Best Practices

Best Practices DevOps Latency Metrics

PyMongo Tutorial: Testing MongoDB Failover in Your Python App

Scalegrid

MAY 2, 2019

It is also recommended that SSL connections be enabled to encrypt the client-database traffic. With MongoDB deployments, failovers aren’t considered major events as they were with traditional database management systems. Testing Failover Behavior. Configuring the Network Timeout Values. during network issues and failovers.

Testing

Testing Network Database Servers

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Rapid Event Notification System at Netflix

Trending Sources

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Optimizing Server Management With HAProxy’s Advanced Health Checks

Migrating Netflix to GraphQL Safely

A Dynatrace champions guide to get ahead of digital marketing campaigns

Choosing the Appropriate AWS Load Balancer: ALB vs. NLB

Introducing Impressions at Netflix

5 powerful use cases beyond debugging for Dynatrace Live Debugger

Detecting RegreSSHion with Dynatrace (CVE-2024-6387)

The new normal of digital experience delivery – lessons learned from monitoring mission-critical websites during COVID-19

COVID-19 and Digital Services: An Action Plan for the Unexpected

RabbitMQ vs. Kafka: Key Differences

Top PostgreSQL 17 New Features

Six causes of major software outages–And how to avoid them

Kubernetes vs Docker: What’s the difference?

What is vulnerability management? And why runtime vulnerability detection makes the difference

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Managing High Availability in PostgreSQL – Part III: Patroni

CrowdStrike update crisis: How Dynatrace helped customers recover in hours

What is a Distributed Storage System

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Observe syslog with Dynatrace ActiveGate, a secure, trusted edge component

Introducing Netflix’s Key-Value Data Abstraction Layer

Automatic intelligent observability into Envoy-proxied services of your Istio service mesh (GA)

What is cloud migration?

DevOps automation: From event-driven automation to answer-driven automation [with causal AI]

Consistent caching mechanism in Titus Gateway

The road to observability with OpenTelemetry demo part 1: Identifying metrics and traces

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

What is web application security? Everything you need to know.

How to Optimize Digital Experience and Operations with Dynatrace

Achieving 100Gbps intrusion prevention on a single server

Simplified observability for your SNMP devices

Optimizing Java XPath CPU and memory overhead by 98%

Is working-from-home affecting productivity? Use Dynatrace to find out and optimize!

Introducing Netflix TimeSeries Data Abstraction Layer

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Lessons learned from enterprise service-level objective management

Python at Netflix

Edgar: Solving Mysteries Faster with Observability

Site reliability done right: 5 SRE best practices that deliver on business objectives

PyMongo Tutorial: Testing MongoDB Failover in Your Python App

Stay Connected