Event, Strategy and Traffic - Technology Performance Pulse

Black Friday traffic exposes gaps in observability strategies

Dynatrace

SEPTEMBER 2, 2022

What’s the problem with Black Friday traffic? But that’s difficult when Black Friday traffic brings overwhelming and unpredictable peak loads to retailer websites and exposes the weakest points in a company’s infrastructure, threatening application performance and user experience. Why Black Friday traffic threatens customer experience.

Traffic

Traffic Strategy Retail Ecommerce

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. This approach has a handful of benefits.

Traffic

Traffic Latency Tuning Systems

The keys to selecting a platform for end-to-end observability

Dynatrace

DECEMBER 2, 2024

It should also be possible to analyze data in context to proactively address events, optimize performance, and remediate issues in real time. This enables proactive changes such as resource autoscaling, traffic shifting, or preventative rollbacks of bad code deployment ahead of time.

Artificial Intelligence

Artificial Intelligence DevOps Architecture Cloud

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Dynatrace

JANUARY 21, 2025

For example, if you’re monitoring network traffic and the average over the past 7 days is 500 Mbps, the threshold will adapt to this baseline. An anomaly will be identified if traffic suddenly drops below 200 Mbps or above 800 Mbps, helping you identify unusual spikes or drops.

Traffic

Traffic Metrics Analytics Monitoring

Efficient SLO event integration powers successful AIOps

Dynatrace

APRIL 5, 2024

The first part of this blog post briefly explores the integration of SLO events with AI. Consequently, the AI is founded upon the related events, and due to the detection parameters (threshold, period, analysis interval, frequent detection, etc), an issue arose. By analogy, envision an apple tree where an apple drops.

Efficiency

Efficiency Traffic Tuning Metrics

Title Launch Observability at Netflix Scale

The Netflix TechBlog

MARCH 4, 2025

Part 3: System Strategies and Architecture By: VarunKhaitan With special thanks to my stunning colleagues: Mallika Rao , Esmir Mesic , HugoMarques This blog post is a continuation of Part 2 , where we cleared the ambiguity around title launch observability at Netflix. We call this capability TimeTravel.

Traffic

Traffic Strategy Entertainment Innovation

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

Using the source of truth: Logs serve as a reliable source of truth by providing a comprehensive record of system events. To detect issues proactively, we need to simulate traffic and predict system behavior in advance. Once artificial traffic is generated, discarding the response object and relying solely on logs becomes inefficient.

Traffic

Traffic Scalability Strategy Monitoring

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

We can experiment with different content placements or promotional strategies to boost visibility and engagement. Analyzing impression history, for example, might help determine how well a specific row on the home page is functioning or assess the effectiveness of a merchandising strategy.

Tuning

Tuning Latency Efficiency Storage

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Youll also learn strategies for maintaining data safety and managing node failures so your RabbitMQ setup is always up to the task. Think of it as a sophisticated traffic management system that ensures smooth flow and prevents congestion.

Best Practices

Best Practices Traffic Strategy Scalability

Ensuring the Successful Launch of Ads on Netflix

The Netflix TechBlog

JUNE 1, 2023

To do this, we devised a novel way to simulate the projected traffic weeks ahead of launch by building upon the traffic migration framework described here. New content or national events may drive brief spikes, but, by and large, traffic is usually smoothly increasing or decreasing.

Traffic

Traffic Best Practices Systems Testing

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

The three strategies we will discuss today are AB Testing , Replay Testing, and Sticky Canaries. Let’s discuss the three testing strategies in further detail. The control group’s traffic utilized the legacy Falcor stack, while the experiment population leveraged the new GraphQL client and was directed to the GraphQL Shim.

Traffic

Traffic Latency Metrics Cache

Automate CI/CD pipelines with Dynatrace: Part 2, Deploy stage

Dynatrace

NOVEMBER 28, 2023

Even when the staging environment closely mirrors the production environment, achieving a complete replication of all potential scenarios, such as simulating extremely high traffic volumes to assess software performance, remains challenging. This can lead to a lack of insight into how the code will behave when exposed to heavy traffic.

Traffic

Traffic Best Practices Strategy Engineering

Six causes of major software outages–And how to avoid them

Dynatrace

AUGUST 8, 2024

As recent events have demonstrated, major software outages are an ever-present threat in our increasingly digital world. Possible scenarios A Distributed Denial of Service (DDoS) attack overwhelms servers with traffic, making a website or service unavailable.

Software

Software Software Infrastructure Network

COVID-19 and Digital Services: An Action Plan for the Unexpected

Dynatrace

APRIL 22, 2020

While most government agencies and commercial enterprises have digital services in place, the current volume of usage — including traffic to critical employment, health and retail/eCommerce services — has reached levels that many organizations have never seen before or tested against. There are proven strategies for handling this.

Traffic

Traffic Ecommerce Retail Government

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

RabbitMQ is designed for flexible routing and message reliability, while Kafka handles high-throughput event streaming and real-time data processing. Kafka is optimized for high-throughput event streaming , excelling in real-time analytics and large-scale data ingestion. What is Apache Kafka?

Latency

Latency Analytics Architecture Storage

Unlock end-to-end observability insights with Dynatrace PurePath 4 seamless integration of OpenTracing for Java

Dynatrace

DECEMBER 9, 2020

To address potentially high numbers of requests during online shopping events like Singles Day or Black Friday, it’s crucial that this online shop have a memory storage strategy that allows for speed, scaling, and resilience of all microservices, especially the shopping cart service.

Java

Java Traffic Architecture Serverless

A Dynatrace champions guide to get ahead of digital marketing campaigns

Dynatrace

JULY 1, 2020

In my last blog , I’ve provided an example of this happening, whereby the traffic spiked and quadrupled the usual incoming traffic. These are all interesting metrics from marketing point of view, and also highly interesting to you as they allow you to engage with the teams that are driving the traffic against your IT-system.

Traffic

Traffic Analytics Metrics Servers

Get quick alerts and avoid false positives with the new baseline setting

Dynatrace

MARCH 26, 2020

This means that Dynatrace alerts more quickly when an error spike occurs in a high-traffic service (compared to a low-traffic service where statistical confidence is lower). This setting distinguishes between slowdown alerts and error detection, so that you can choose an individual strategy for each.

Traffic

Traffic Monitoring Efficiency Strategy

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Building on these foundational abstractions, we developed the TimeSeries Abstraction — a versatile and scalable solution designed to efficiently store and query large volumes of temporal event data with low millisecond latencies, all in a cost-effective manner across various use cases. Let’s dive into the various aspects of this abstraction.

Latency

Latency Storage Traffic Tuning

Network performance monitoring top of mind for CloudOps teams

Dynatrace

MAY 19, 2023

Network traffic growth is the main reason for increasing spending, largely because of the adoption of hybrid and multi-cloud architectures. Unifying data, unifying observability Merging siloed data streams in a unified observ ability strategy requires a different approach.

Network

Network Monitoring Performance Traffic

Transparent and confident software delivery with Dynatrace Release Analysis

Dynatrace

APRIL 28, 2021

Organizations that have transitioned to agile software development strategies (including the adoption of a DevOps culture and continuous delivery automation) enforce automated solutions for such decision making—or at the very least, use automation in the gathering of a release-quality metrics. Events ingestion. Kubernetes metadata.

Software

Software Software Strategy Metrics

From syslog to AWS Firehose: Dynatrace log management innovations that enhance observability

Dynatrace

SEPTEMBER 5, 2024

Let’s delve deeper into how these capabilities can transform your observability strategy, starting with our new syslog support. It also enhances syslog messages with additional context and optimizes network traffic, improving overall system resilience and security.

Innovation

Innovation AWS Analytics Storage

Customer expectations for retail: Beyond digital experience

Dynatrace

AUGUST 28, 2023

IT teams spend months preparing for the peak traffic they anticipate will arrive with holiday shopping. Decentralized last-mile delivery strategies such as micro-fulfillment centers complicate inventory management and order fulfillment oversight. (Though the three-second rule for page load time is often misinterpreted).

Retail

Retail Logistics Innovation Analytics

What is security analytics?

Dynatrace

JUNE 10, 2024

For example, an organization might use security analytics tools to monitor user behavior and network traffic. Improved compliance A better understanding of data security across multiple applications and environments provides a unified view of events and information. This offers two advantages for compliance.

Analytics

Analytics Network Open Source Hardware

Noisy Neighbor Detection with eBPF

The Netflix TechBlog

SEPTEMBER 10, 2024

During this event, we generate a timestamp and store it in an eBPF hash map using the process ID as the key. Each event includes a run queue latency sample with a cgroup ID, which we associate with running containers on the host. This spike would be noticeable in the userspace application if it were serving HTTP traffic.

Latency

Latency Metrics Programming Monitoring

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

In databases like MySQL and PostgreSQL, transaction logs are the source of CDC events. Some of DBLog’s features are: Processes captured log events in-order. Interleaves log with dump events, by taking dumps in chunks. No locks on tables are ever acquired, which prevent impacting write traffic on the source database.

Database

Database Traffic Transportation Open Source

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

With traffic growth, a single leader node handling all request volume started becoming overloaded. The path over which data travels from Titus Job Coordinator to a Titus Gateway cache can be described as a sequence of event queues with different processing speeds: A message generated by the event source may be buffered at any stage.

Cache

Cache Latency Traffic Systems

Real user monitoring vs. synthetic monitoring: Understanding best practices

Dynatrace

JUNE 27, 2022

Data collected on page load events, for example, can include navigation start (when performance begins to be measured), request start (right before the user makes a request from the server), and speed index metrics (measure page load speed). RUM, however, has some limitations, including the following: RUM requires traffic to be useful.

Best Practices

Best Practices Monitoring Wireless Traffic

APRA CPS 230 compliance, explained

Dynatrace

NOVEMBER 2, 2023

The good news: even for latecomers to the compliance party, compliance is perfectly doable within the timeframe given the right tools and strategies. Many organisations adopt an observability solution to analyse the significance of events to their operations, making it ideally suited to APRA CPS 230 compliance.

Cloud

Cloud Infrastructure Strategy Open Source

Automated Change Impact Analysis with Site Reliability Guardian

Dynatrace

FEBRUARY 15, 2023

Streamline development and delivery processes Nowadays, digital transformation strategies are executed by almost every organization across all industries. Whether triggered by a test result or a new release deployment, detected events work as a trigger to check the defined objectives and derive an overall status automatically.

DevOps

DevOps Latency Traffic Best Practices

AIOps and digital transformation modernize BT

Dynatrace

OCTOBER 28, 2022

BT has modernized with AIOps at the center of its digital transformation strategy. For BT, AIOps and digital transformation are at the center of the company’s upcoming three-to-five-year digital transformation strategy. As Baldry noted, Davis not only identifies issues but can trigger a chain of events to aid with auto-remediation.

Cloud

Cloud Strategy Technology Technology

TTP-based threat hunting with Dynatrace Security Analytics and Falco Alerts solves alert noise

Dynatrace

AUGUST 9, 2023

Not only that, teams struggle to correlate events and alerts from a wide range of security tools, need to put them into context, and infer their risk for the business. Falco is an open-source, cloud-native security tool that utilizes the Linux kernel technology  eBPF , to generate fine-grained networking, security, and observability events.

Analytics

Analytics AWS Infrastructure Strategy

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

Implementation We decided to implement the strategy through Linux cgroups since they are fully supported by CFS, by modifying each container’s cpuset cgroup based on the desired mapping of containers to hyper-threads. We also want to leverage kernel PMC events to more directly optimize for minimal cache noise.

Cache

Cache Latency Airlines Logistics

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

In databases like MySQL and PostgreSQL, transaction logs are the source of CDC events. Some of DBLog’s features are: Processes captured log events in-order. Interleaves log with dump events, by taking dumps in chunks. No locks on tables are ever acquired, which prevent impacting write traffic on the source database.

Database

Database Traffic Transportation Open Source

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Scalegrid

SEPTEMBER 5, 2024

The key components of automatic failover include the primary server for write operations, standby servers for backup, and a monitor node for health checks and coordination of failover events. Automatic failover is a critical strategy to achieve this. In the event of a failure, it informs Pacemaker.

Availability

Availability Servers Database Open Source

Evolving Regional Evacuation

The Netflix TechBlog

SEPTEMBER 23, 2019

In the event of an isolated failure we first pre-scale microservices in the healthy regions after which we can shift traffic away from the failing one. In 2013 we first developed our multi-regional availability strategy in response to a catalyst that led us to re-architect the way our service operates.

Traffic

Traffic Metrics Mobile Government

The Best Way to Host MongoDB on DigitalOcean

Scalegrid

DECEMBER 16, 2019

In this post, we’ll walk you through the best way to host MongoDB on DigitalOcean, including the best instance types to use, disk types, replication strategy, and managed service providers. MongoDB Replication Strategies. DigitalOcean Advantages for MongoDB. The Best Way to Host #MongoDB on DigitalOcean Click To Tweet.

Azure

Azure AWS Database Latency

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

JULY 26, 2021

Change Data Capture(CDC) source connector reads from studio applications’ database transaction logs and emits the change events. The CDC events are passed on to the Data Mesh enrichment processor, which issues GraphQL queries to Studio Edge to enrich the data. CDC events can also be sent to Data Mesh via a Java Client Producer Library.

Big Data

Big Data Government Processing Analytics

Log auditing and log forensics benefit from converging observability and security data

Dynatrace

APRIL 13, 2023

Log auditing is a cybersecurity practice that involves examining logs generated by various applications, computer systems, and network devices to identify and analyze security-related events. Log auditing is a crucial part of building a comprehensive security program.

Java

Java Analytics Infrastructure Cloud

Safe Updates of Client Applications at Netflix

The Netflix TechBlog

OCTOBER 7, 2021

Deployment Strategies We are all familiar with the advantages of releasing frequently and in smaller chunks. The challenge for clients is that each instance of the application runs on a Netflix member’s device and signals are derived from a firehose of events being sent by devices across the globe. which is not known in advance.

Metrics

Metrics Mobile Testing Strategy

What Is Blue-Green Software Delivery Deployment?

DZone

APRIL 21, 2023

A blue-green deployment model is a software delivery release strategy based on maintaining two separate application environments. As part of testing and validation of the new version of the software, application traffic is gradually re-routed to the green environment. Why Is Blue-Green Deployment Useful?

Software

Software Software Traffic Strategy

Dynatrace Managed feature update, version 1.168

Dynatrace

MAY 15, 2019

Revised cleanup strategy for Elasticsearch snapshots. To address this storage issue for older snapshots, we’ve adapted our cleanup strategy so that monthly snapshots are now removed automatically after 6 months. Event Log now contains successful login entries when users logged in via SSO. Also in this release.

Storage

Storage Metrics Strategy Java

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

HashMap<String, SortedMap<Bytes, Bytes>> For complex data models such as structured Records or time-ordered Events, this two-level approach handles hierarchical structures effectively, allowing related data to be retrieved together. This model supports both simple and complex data models, balancing flexibility and efficiency.

Latency

Latency Storage Cache Servers

RabbitMQ Security and Compliance

Scalegrid

AUGUST 30, 2024

Encryption Strategies for RabbitMQ RabbitMQ implements transport-level security using TLS/SSL encryption to safeguard data during transmission. When persistent messages in RabbitMQ are encrypted, it ensures that even in the event of unsanctioned access to storage hardware, confidential information stays protected and secure.

Transportation

Transportation Virtualization Strategy Storage

Black Friday traffic exposes gaps in observability strategies

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Trending Sources

The keys to selecting a platform for end-to-end observability

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Efficient SLO event integration powers successful AIOps

Title Launch Observability at Netflix Scale

Title Launch Observability at Netflix Scale

Introducing Impressions at Netflix

Best Practices for Scaling RabbitMQ

Ensuring the Successful Launch of Ads on Netflix

Migrating Netflix to GraphQL Safely

Automate CI/CD pipelines with Dynatrace: Part 2, Deploy stage

Six causes of major software outages–And how to avoid them

COVID-19 and Digital Services: An Action Plan for the Unexpected

RabbitMQ vs. Kafka: Key Differences

Unlock end-to-end observability insights with Dynatrace PurePath 4 seamless integration of OpenTracing for Java

A Dynatrace champions guide to get ahead of digital marketing campaigns

Get quick alerts and avoid false positives with the new baseline setting

Introducing Netflix TimeSeries Data Abstraction Layer

Network performance monitoring top of mind for CloudOps teams

Transparent and confident software delivery with Dynatrace Release Analysis

From syslog to AWS Firehose: Dynatrace log management innovations that enhance observability

Customer expectations for retail: Beyond digital experience

What is security analytics?

Noisy Neighbor Detection with eBPF

DBLog: A Generic Change-Data-Capture Framework

Consistent caching mechanism in Titus Gateway

Real user monitoring vs. synthetic monitoring: Understanding best practices

APRA CPS 230 compliance, explained

Automated Change Impact Analysis with Site Reliability Guardian

AIOps and digital transformation modernize BT

TTP-based threat hunting with Dynatrace Security Analytics and Falco Alerts solves alert noise

Predictive CPU isolation of containers at Netflix

DBLog: A Generic Change-Data-Capture Framework

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Evolving Regional Evacuation

The Best Way to Host MongoDB on DigitalOcean

Data Movement in Netflix Studio via Data Mesh

Log auditing and log forensics benefit from converging observability and security data

Safe Updates of Client Applications at Netflix

What Is Blue-Green Software Delivery Deployment?

Dynatrace Managed feature update, version 1.168

Introducing Netflix’s Key-Value Data Abstraction Layer

RabbitMQ Security and Compliance

Stay Connected