AWS, Event and Traffic - Technology Performance Pulse

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

To this end, we developed a Rapid Event Notification System (RENO) to support use cases that require server initiated communication with devices in a scalable and extensible manner. In this blog post, we will give an overview of the Rapid Event Notification System at Netflix and share some of the learnings we gained along the way.

Systems

Systems Traffic Architecture Mobile

From syslog to AWS Firehose: Dynatrace log management innovations that enhance observability

Dynatrace

SEPTEMBER 5, 2024

We’re excited to announce several log management innovations, including native support for Syslog messages, seamless integration with AWS Firehose, an agentless approach using Kubernetes Platform Monitoring solution with Fluent Bit, a new out-of-the-box ingest dashboard, and OpenPipeline ingest improvements.

Innovation

Innovation AWS Analytics Storage

AWS observability: AWS monitoring best practices for resiliency

Dynatrace

NOVEMBER 22, 2021

Visibility into system activity and behavior has become increasingly critical given organizations’ widespread use of Amazon Web Services (AWS) and other serverless platforms. These challenges make AWS observability a key practice for building and monitoring cloud-native applications. What is AWS observability? AWS Lambda.

Best Practices

Best Practices AWS Monitoring Serverless

Dynatrace adds support for AWS Transit Gateway with VPC Flow Logs

Dynatrace

JULY 25, 2022

Dynatrace has added support for the newly introduced Amazon Virtual Private Cloud (VPC) Flow Logs for AWS Transit Gateway. What is AWS Transit Gateway? AWS Transit Gateway is a service offering from Amazon Web Services that connects network resources via a centralized hub. Log Events. What is VPC Flow Logs. Log Metrics.

AWS

AWS Transportation Network Traffic

Ensuring the Successful Launch of Ads on Netflix

The Netflix TechBlog

JUNE 1, 2023

To do this, we devised a novel way to simulate the projected traffic weeks ahead of launch by building upon the traffic migration framework described here. New content or national events may drive brief spikes, but, by and large, traffic is usually smoothly increasing or decreasing.

Traffic

Traffic Best Practices Systems Testing

Using Dynatrace to master the 5 pillars of the AWS Well-Architected Framework (Part 1)

Dynatrace

NOVEMBER 24, 2020

As organizations plan, migrate, transform, and operate their workloads on AWS, it’s vital that they follow a consistent approach to evaluating both the on-premises architecture and the upcoming design for cloud-based architecture. AWS 5-pillars. Dynatrace and AWS. through our AWS integrations and monitoring support.

AWS

AWS Artificial Intelligence Best Practices Lambda

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! We’ve compiled our speaking events below so you know what we’ve been working on. Please stop by our “Living Room” for an opportunity to connect or reconnect with Netflixers. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Stream logs to Dynatrace with Amazon Data Firehose to boost your cloud-native journey

Dynatrace

MAY 3, 2024

Amazon CloudWatch is the most common method of collecting logs across your AWS footprint. As a native tool used by many enterprises, CloudWatch supports a wide range of AWS resources, applications, and services. These already provide a common integration with AWS log sources. Select ingest logs as the scope of the token.

Cloud

Cloud Lambda AWS Analytics

Seeing through hardware counters: a journey to threefold performance increase

The Netflix TechBlog

NOVEMBER 9, 2022

to a larger AWS instance size, from m5.4xl (16 vCPUs) to m5.12xl (48 vCPUs). A quick canary test was free of errors and showed lower latency, which is expected given that our standard canary setup routes an equal amount of traffic to both the baseline running on 4xl and the canary on 12xl. let’s call it GS2?—?to

Hardware

Hardware Cache Performance Latency

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Dynatrace

APRIL 25, 2023

For example, to handle traffic spikes and pay only for what they use. Serverless applications are composed of event-driven functions that run on demand in response to triggers from various sources, such as HTTP requests, messages, or timers. Scale automatically based on the demand and traffic patterns.

Serverless

Serverless Lambda Azure AWS

Detecting RegreSSHion with Dynatrace (CVE-2024-6387)

Dynatrace

JULY 2, 2024

Look for timeout events Exploitation attempts for this vulnerability can be identified by many lines of “Timeout before authentication” in the logs. Using the VPC flow log default pattern available in DPL Architect, we can extract the meaningful fields to see only the network traffic targeting the SSH port.

AWS

AWS Network Traffic Servers

Migrating Netflix to GraphQL Safely

The Netflix TechBlog

JUNE 14, 2023

The control group’s traffic utilized the legacy Falcor stack, while the experiment population leveraged the new GraphQL client and was directed to the GraphQL Shim. This helped us successfully migrate 100% of the traffic on the mobile homepage canvas to GraphQL in 6 months. After validating performance, we slowly built up scope.

Traffic

Traffic Latency Metrics Cache

Easy SLA and SLO reporting for all your API endpoints with public synthetic HTTP monitors

Dynatrace

JUNE 26, 2020

Dynatrace Synthetic Monitoring helps you quickly verify if your application is delivering the expected end user experience by offering an outside-in view of all your applications and services, independent of real traffic. Virginia (AWS) ?, California (AWS), San Jose (Azure), Texas (Azure), Ohio (AWS), Toronto (Azure) ?,

Monitoring

Monitoring Azure AWS Traffic

Dynatrace adds support for VPC Flow Logs to Kinesis Data Firehose

Dynatrace

SEPTEMBER 7, 2022

VPC Flow Logs is an Amazon service that enables IT pros to capture information about the IP traffic that traverses network interfaces in a virtual private cloud, or VPC. By default, each record captures a network internet protocol (IP), a destination, and the source of the traffic flow that occurs within your environment.

Traffic

Traffic AWS Network Cloud

AWS EKS Monitoring as a Self-Service with Dynatrace

Dynatrace

SEPTEMBER 17, 2019

AWS EKS for Integration and Production. When focusing on the LanguageController service we learn that it’s currently deployed in three pods across three EKS nodes across two AWS Availability Zones (AZ). 4 AWS EFS monitoring. Their technology stack looks like this: Spring Boot-based Microservices. NGINX as an API Gateway.

AWS

AWS Monitoring Ecommerce Lambda

Observe syslog with Dynatrace ActiveGate, a secure, trusted edge component

Dynatrace

JULY 15, 2024

You also might be required to capture syslog messages from cloud services on AWS, Azure, and Google Cloud related to resource provisioning, scaling, and security events. ActiveGate also optimizes traffic volume in your network and serves as a secure relay layer in protected networks and DMZs.

Infrastructure

Infrastructure Network Azure Monitoring

Leverage automated and intelligent observability for OpenTelemetry for Go with Dynatrace PurePath 4

Dynatrace

JANUARY 28, 2021

With Dynatrace OneAgent you also benefit from support for traffic routing and traffic control. OneAgent implements network zones to create traffic routing rules and limit cross-data-center traffic. We also recently announced our partnership with AWS to provide observability for custom OpenTelemetry metrics from AWS.

Traffic

Traffic Open Source Servers Cloud

How Dynatrace boosts production resilience with Site Reliability Guardian

Dynatrace

MAY 17, 2023

While the first guardian validates the traffic, the second guardian checks the business transactions generated during the observation period. The Workflows screenshot below shows that a task is triggered by a change event related to the application, execution of the guardians, and final aggregation of the results.

DevOps

DevOps Traffic Latency Best Practices

The Best Way to Host MongoDB on DigitalOcean

Scalegrid

DECEMBER 16, 2019

While you may assume a great majority of the cloud database deployments are run on AWS, Azure, or Google Cloud Platform, small to medium-sized businesses in particular are gravitating towards the developer-friendly cloud provider, DigitalOcean , for their hosting for MongoDB® needs. DigitalOcean Advantages for MongoDB. DigitalOcean Droplets.

Azure

Azure AWS Database Latency

TTP-based threat hunting with Dynatrace Security Analytics and Falco Alerts solves alert noise

Dynatrace

AUGUST 9, 2023

Not only that, teams struggle to correlate events and alerts from a wide range of security tools, need to put them into context, and infer their risk for the business. Falco is an open-source, cloud-native security tool that utilizes the Linux kernel technology  eBPF , to generate fine-grained networking, security, and observability events.

Analytics

Analytics AWS Infrastructure Strategy

Telltale: Netflix Application Monitoring Simplified

The Netflix TechBlog

AUGUST 13, 2020

We also highlight interesting broader events such as regional traffic evacuations and nearby deployments , information that is vital to understanding health holistically. It usually has dependencies, talks to other services, and lives in different AWS regions. Regional traffic evacuations. Infrastructure change events.

Monitoring

Monitoring Tuning Traffic Metrics

Breaking data silos: Liquid Reply’s journey to custom API observability with OpenTelemetry and Dynatrace

Dynatrace

JUNE 20, 2023

Data is proliferating in separate silos from containers and Kubernetes to open source APIs and software to serverless compute services, such as AWS and Azure. The organization needed to ensure the correlation of all events in a complete end-to-end trace. However, they had numerous custom applications with separate APIs.

Open Source

Open Source Google Serverless Azure

Expanding the Cloud – An AWS Region is coming to Hong Kong

All Things Distributed

JUNE 20, 2017

Today, I am very excited to announce our plans to open a new AWS Region in Hong Kong! The new region will give Hong Kong-based businesses, government organizations, non-profits, and global companies with customers in Hong Kong, the ability to leverage AWS technologies from data centers in Hong Kong.

AWS

AWS Logistics Cloud Social Media

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! We’ve compiled our speaking events below so you know what we’ve been working on. Please stop by our “Living Room” for an opportunity to connect or reconnect with Netflixers. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! We’ve compiled our speaking events below so you know what we’ve been working on. Please stop by our “Living Room” for an opportunity to connect or reconnect with Netflixers. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

With traffic growth, a single leader node handling all request volume started becoming overloaded. The path over which data travels from Titus Job Coordinator to a Titus Gateway cache can be described as a sequence of event queues with different processing speeds: A message generated by the event source may be buffered at any stage.

Cache

Cache Latency Traffic Systems

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

In databases like MySQL and PostgreSQL, transaction logs are the source of CDC events. Some of DBLog’s features are: Processes captured log events in-order. Interleaves log with dump events, by taking dumps in chunks. No locks on tables are ever acquired, which prevent impacting write traffic on the source database.

Database

Database Traffic Transportation Open Source

Python at Netflix

The Netflix TechBlog

APRIL 29, 2019

takes place in Amazon Web Services (AWS), whereas everything that happens afterwards (i.e., Demand Engineering Demand Engineering is responsible for Regional Failovers , Traffic Distribution, Capacity Operations and Fleet Efficiency of the Netflix cloud. are you logged in? what plan do you have? what do you want to watch?)

Open Source

Open Source Network Infrastructure Big Data

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

Service Segmentation: The ease of the cloud deployments has led to the organic growth of multiple AWS accounts, deployment practices, interconnection practices, etc. VPC Flow Logs VPC Flow Logs is an AWS feature that captures information about the IP traffic going to and from network interfaces in a VPC. 43416 5001 52.213.180.42

Network

Network Tuning AWS Traffic

Dynatrace Managed release notes version 1.216

Dynatrace

MAY 6, 2021

To improve management of node capabilities , we added Enable/disable Web UI traffic operation for cluster node in Cluster Mission Control UI. To increase security, sign-in cluster events are no longer reported to Cluster Mission Control. Removed AWS Amazon MQ Supporting Service EnqueueTime metrics with Multi aggregation.

Operating System

Operating System AWS Metrics Storage

DBLog: A Generic Change-Data-Capture Framework

The Netflix TechBlog

DECEMBER 17, 2019

In databases like MySQL and PostgreSQL, transaction logs are the source of CDC events. Some of DBLog’s features are: Processes captured log events in-order. Interleaves log with dump events, by taking dumps in chunks. No locks on tables are ever acquired, which prevent impacting write traffic on the source database.

Database

Database Traffic Transportation Open Source

What Adrian Did Next — Part 4 — how I helped Netflix launch on iPad and iPhone — 2007 to 2010

Adrian Cockcroft

JANUARY 27, 2025

I went to the launch event, got an iPhone on day 1, and when Apple finally shipped their SDK in March 2008 I was in the first wave of people who signed up as an iOS developer. In September 2008 Netflix ran an internal hack day event. We simply didnt have enough capacity in our datacenter to run the traffic, so it had to work.

C++

C++ Mobile Hardware Java

Curbing Connection Churn in Zuul

The Netflix TechBlog

AUGUST 16, 2023

It’s built on top of Netty , using event loops for non-blocking execution of requests, one loop per core. To reduce contention among event loops, we created connection pools for each, keeping them completely independent. That’s a significant amount and certainly more than is necessary relative to the traffic on most clusters.

Traffic

Traffic Servers Google Metrics

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Dynatrace

NOVEMBER 22, 2022

For retail organizations, peak traffic can be a mixed blessing. While high-volume traffic often boosts sales, it can also compromise uptimes. Include metrics, event logs, distributed traces, metadata, user experience data, and telemetry data from open source technologies and cloud platforms.

Infrastructure

Infrastructure Availability Systems Retail

Evolving Regional Evacuation

The Netflix TechBlog

SEPTEMBER 23, 2019

In order to achieve this level of availability, we leverage an N+1 architecture where we treat Amazon Web Services (AWS) regions as fault domains, allowing us to withstand single region failures. So, if we evacuate South American traffic to North America, demand for CE and Android DRM won’t grow uniformly.

Traffic

Traffic Metrics Mobile Government

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

All these micro-services are currently operated in AWS cloud infrastructure. Security Events Platform See open source project such as StreamAlert and Siddhi to get some general ideas. Can we adjust our auto-scaling policies to be more efficiency without risking our availability during traffic spikes?

Infrastructure

Infrastructure Cloud Scalability AWS

So many bad takes?—?What is there to learn from the Prime Video microservices to monolith story

Adrian Cockcroft

MAY 6, 2023

Then they tried to scale it to cope with high traffic and discovered that some of the state transitions in their step functions were too frequent, and they had some overly chatty calls between AWS lambda functions and S3. They state in the blog that this was quick to build, which is the point.

Serverless

Serverless Lambda Best Practices Traffic

PyMongo Tutorial: Testing MongoDB Failover in Your Python App

Scalegrid

MAY 2, 2019

It is also recommended that SSL connections be enabled to encrypt the client-database traffic. With MongoDB deployments, failovers aren’t considered major events as they were with traditional database management systems. Testing Failover Behavior.

Testing

Testing Network Database Servers

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Compute: Titus Whereas open-source users of Metaflow rely on AWS Batch or Kubernetes as the compute backend , we rely on our centralized compute-platform, Titus. Explainer flow is event-triggered by an upstream flow, such Model A, B, C flows in the illustration. ETL workflows), as well as downstream (e.g.

Systems

Systems Media Cache Open Source

Evolution of Netflix Conductor:

The Netflix TechBlog

JULY 30, 2019

Our cluster size ranges from 12–18 instances of AWS EC2 m4.4xlarge instances, typically running at ~30% capacity. *?—?Cassandra An event handler can be configured on these queues to trigger workflows or tasks to perform specific actions upon the terminal state of the workflow. Cassandra persistence module is a partial implementation.

Lambda

Lambda Media Open Source Metrics

MySQL High Availability Framework Explained – Part III: Failure Scenarios

Scalegrid

APRIL 16, 2019

Application traffic will be redirected to this new master MySQL node and the slave S2 will start replicating from the new master. Thus, we see that the MySQL HA framework handles split-brain scenarios effectively, ensuring both data consistency and availability in the event the network connectivity breaks between master and slave nodes.

Availability

Availability Network Azure AWS

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

We started seeing signs of scale issues, like: Slowness during peak traffic moments like 12 AM UTC, leading to increased operational burden. As the usage increased, we had to vertically scale the system to keep up and were approaching AWS instance type limits. Meson was based on a single leader architecture with high availability.

Java

Java Scalability Traffic Architecture

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

Dynatrace

MAY 17, 2023

It is available for the major OS and cloud platforms (for example, Windows, Linux, Solaris, AWS, Azure, and more) and only requires the deployment of a single service to monitor its environment. Most importantly, this information does not only cover the server side, but, thanks to RUM, also the client side and events in the browser.

Metrics

Metrics Database Monitoring Network

New IP addresses for Dynatrace Synthetic improve safety and scalability

Dynatrace

OCTOBER 31, 2019

More space for redundancy and additional options for managing any potential cloud vendor issues, or issues caused by external events. Some organizations require adding IP addresses to security configurations to allow traffic from and to them. More resiliency and even safer public synthetic monitoring locations.

Scalability

Scalability Traffic Monitoring Benchmarking

Rapid Event Notification System at Netflix

From syslog to AWS Firehose: Dynatrace log management innovations that enhance observability

Trending Sources

AWS observability: AWS monitoring best practices for resiliency

Dynatrace adds support for AWS Transit Gateway with VPC Flow Logs

Ensuring the Successful Launch of Ads on Netflix

Using Dynatrace to master the 5 pillars of the AWS Well-Architected Framework (Part 1)

Netflix at AWS re:Invent 2019

Stream logs to Dynatrace with Amazon Data Firehose to boost your cloud-native journey

Seeing through hardware counters: a journey to threefold performance increase

Build and operate multicloud FaaS with enhanced, intelligent end-to-end observability

Detecting RegreSSHion with Dynatrace (CVE-2024-6387)

Migrating Netflix to GraphQL Safely

Easy SLA and SLO reporting for all your API endpoints with public synthetic HTTP monitors

Dynatrace adds support for VPC Flow Logs to Kinesis Data Firehose

AWS EKS Monitoring as a Self-Service with Dynatrace

Observe syslog with Dynatrace ActiveGate, a secure, trusted edge component

Leverage automated and intelligent observability for OpenTelemetry for Go with Dynatrace PurePath 4

How Dynatrace boosts production resilience with Site Reliability Guardian

The Best Way to Host MongoDB on DigitalOcean

TTP-based threat hunting with Dynatrace Security Analytics and Falco Alerts solves alert noise

Telltale: Netflix Application Monitoring Simplified

Breaking data silos: Liquid Reply’s journey to custom API observability with OpenTelemetry and Dynatrace

Expanding the Cloud – An AWS Region is coming to Hong Kong

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Consistent caching mechanism in Titus Gateway

DBLog: A Generic Change-Data-Capture Framework

Python at Netflix

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Dynatrace Managed release notes version 1.216

DBLog: A Generic Change-Data-Capture Framework

What Adrian Did Next — Part 4 — how I helped Netflix launch on iPad and iPhone — 2007 to 2010

Curbing Connection Churn in Zuul

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Evolving Regional Evacuation

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

So many bad takes?—?What is there to learn from the Prime Video microservices to monolith story

PyMongo Tutorial: Testing MongoDB Failover in Your Python App

Supporting Diverse ML Systems at Netflix

Evolution of Netflix Conductor:

MySQL High Availability Framework Explained – Part III: Failure Scenarios

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The road to observability demo part 3: Collect, instrument, and analyze telemetry data automatically with Dynatrace

New IP addresses for Dynatrace Synthetic improve safety and scalability

Stay Connected