Availability, Event and Infrastructure - Technology Performance Pulse

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

OpenPipeline: Simplify access to critical business data

Dynatrace

NOVEMBER 4, 2024

Business events: Delivering the best data It’s been two years since we introduced business events , a special class of events designed to support even the most demanding business use cases. Business event ingestion and analysis with log files. OpenPipeline: Simplify access and unify business events from anywhere.

Analytics

Analytics Airlines Metrics Monitoring

New integrations announced at AWS re:Invent enhance cloud performance, security, and automation

Dynatrace

DECEMBER 3, 2024

Streamlining observability with Dynatrace OneAgent on AWS Image Builder In our ongoing collaboration with AWS, we’re excited to make the Dynatrace OneAgent available as a first-class integration on AWS Image Builder via the AWS Marketplace.

AWS

AWS Cloud Performance Innovation

Davis AI now detects infrastructure availability issues as root cause

Dynatrace

JUNE 10, 2020

Having been named as a Leader in the 2020 Gartner APM Magic Quadrant for the 10 th consecutive time proves that Dynatrace is the best-of-breed application performance monitoring tool available. But what happens if a service work perfectly but the underlying infrastructure, such as processes and hosts, experience an outage?

Infrastructure

Infrastructure Availability Monitoring Processing

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Dynatrace

NOVEMBER 22, 2022

The nirvana state of system uptime at peak loads is known as “five-nines availability.” In its pursuit, IT teams hover over system performance dashboards hoping their preparations will deliver five nines—or even four nines—availability. But is five nines availability attainable? What is always-on infrastructure?

Infrastructure

Infrastructure Availability Systems Retail

Reliability indicators that matter to your business: SLOs for all data types

Dynatrace

OCTOBER 31, 2024

This lets you build your SLOs around the indicators that matter to you and your customers—critical metrics related to availability, failure rates, request response times, or select logs and business events. Are you experiencing an increase or degradation in certain events that indicate a rising problem?

Metrics

Metrics Availability Monitoring Scalability

Dynatrace joins the Microsoft Intelligent Security Association

Dynatrace

NOVEMBER 20, 2024

The Dynatrace platform has been recognized for seamlessly integrating with the Microsoft Sentinel cloud-native security information and event management ( SIEM ) solution. These reports are crucial for tracking changes, compliance, and security-relevant events.

Best Practices

Best Practices Innovation Azure Cloud

Gain better visibility into your infrastructure with Windows service availability monitoring

Dynatrace

AUGUST 3, 2020

At Dynatrace, we’re constantly striving to come up with solutions that help you better understand the health of your infrastructure. Windows-based infrastructure monitoring. For example: To provide support, you need a remote desktop service to be available. Easily create availability checks for your Windows services.

Infrastructure

Infrastructure Availability Monitoring Operating System

Monitoring of Kubernetes Infrastructure for day 2 operations

Dynatrace

JULY 8, 2020

One of the promises of container orchestration platforms is to make i t easier for the developers to accelerate the deployment of their app lication s without having to worry about scalability and infrastructure dependencies. Kubernetes events are a type of object providing context on what ’s happening inside a cluster.

Infrastructure

Infrastructure Monitoring Cloud Metrics

Flexible, scalable, self-service Kubernetes native observability now in General Availability

Dynatrace

MAY 17, 2022

To solve this problem , Dynatrace offers a fully automated approach to infrastructure and application observability including Kubernetes control plane, deployments, pods, nodes, and a wide array of cloud-native technologies. None of this complexity is exposed to application and infrastructure teams.

Availability

Availability Scalability Cloud Metrics

Accelerate resolution of network issues with AI-powered event reporting based on SNMP traps

Dynatrace

NOVEMBER 30, 2022

Complexity and data volume for IT infrastructure soars to new heights. The volume of data and events grows in tandem with the rising complexity of IT infrastructure. Monitoring modern IT infrastructure is difficult, sometimes impossible, without advanced network monitoring tools. How SNMP traps help detect problems.

Network

Network Infrastructure Metrics Monitoring

Dynatrace observability now available for Red Hat OpenShift on IBM Z and LinuxONE mainframes

Dynatrace

JULY 24, 2024

Dynatrace with Red Hat OpenShift monitoring stands out for the following reasons: With infrastructure health monitoring and optimization, you can assess the status of your infrastructure at a glance to understand resource consumption and thus optimize resource allocation for cost efficiency.

Availability

Availability Infrastructure Metrics Hardware

Infrastructure Monitoring tools: 3 steps to evolve ITOps into AIOps

Dynatrace

JUNE 28, 2021

Infrastructure monitoring is the process of collecting critical data about your IT environment, including information about availability, performance and resource efficiency. Many organizations respond by adding a proliferation of infrastructure monitoring tools, which in many cases, just adds to the noise. Dynatrace news.

Infrastructure

Infrastructure Monitoring Artificial Intelligence Open Source

Expand application and infrastructure observability with operational insights into Kubernetes pods

Dynatrace

JULY 7, 2020

In Kubernetes environments, operating and successfully running your production applications and microservices requires getting additional insights into your Kubernetes infrastructure including the cluster, nodes, and pods that encapsulate and run the apps. Filtering and alerting on Kubernetes events.

Infrastructure

Infrastructure Metrics Monitoring Cloud

AI-powered infrastructure monitoring for your SAP HANA database (Preview)

Dynatrace

DECEMBER 9, 2020

However, for non-SAP engineers, the amount of available information can be overwhelming—it’s not always clear where to look for answers when you have questions about the performance of your SAP HANA database. It’s easy to create custom events for alerting on any of the SAP HANA metrics that are provided by the extension.

Infrastructure

Infrastructure Database Monitoring Metrics

New event type helps avoid unnecessary alerts for planned host downscaling

Dynatrace

FEBRUARY 20, 2020

Modern service infrastructure depends heavily on IT’s ability to dynamically scale the number of hosts up or down, depending on the expected workload. As the expected behavior of spot instances is that they are shut down within 5 minutes of their creation, the traditional strategy of availability alerting isn’t viable.

AWS

AWS Azure Infrastructure Availability

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Scalegrid

SEPTEMBER 5, 2024

Managing High Availability (HA) in your PostgreSQL hosting is very important to ensuring your database deployment clusters maintain exceptional uptime and strong operational performance so your data is always available to your application. Effective management of failover and switchover operations is crucial for high availability.

Availability

Availability Servers Database Open Source

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

RabbitMQ is designed for flexible routing and message reliability, while Kafka handles high-throughput event streaming and real-time data processing. Kafka is optimized for high-throughput event streaming , excelling in real-time analytics and large-scale data ingestion. What is Apache Kafka?

Latency

Latency Analytics Architecture Storage

Trace, diagnose, resolve: Introducing the Infrastructure & Operations app for streamlined troubleshooting

Dynatrace

FEBRUARY 1, 2024

Infrastructure and operations teams must maintain infrastructure health for IT environments. Based on the topology model, detected dependencies, and thousands of events and metrics, Davis AI can pinpoint the origin of an issue. Start using the Infrastructure & Operations app now to assess the health of your system.

Infrastructure

Infrastructure Metrics Network Monitoring

AI-powered DNS request tracking extends infrastructure observability for high quality network traffic

Dynatrace

OCTOBER 1, 2020

The Dynatrace Software Intelligence Platform gives you a complete Infrastructure Monitoring solution for the monitoring of cloud platforms and virtual infrastructure, along with log monitoring and AIOps. Ensure high quality network traffic by tracking DNS requests out-of-the-box. Average query response time.

Traffic

Traffic Network Infrastructure Artificial Intelligence

Easily monitor your entire infrastructure with Dynatrace Synthetic monitors

Dynatrace

JULY 21, 2020

In those cases, what should you do if you want to be proactive and ensure that your infrastructure is always up and running? Heading up the Platform Extension Services team at Dynatrace, we’re the go-to team for anything that isn’t available out of the box. Easy and flexible infrastructure monitoring. Platform extensions.

Infrastructure

Infrastructure Monitoring Open Source Traffic

Don’t just react: How executives can predict and prevent outages to maximize availability

Dynatrace

OCTOBER 3, 2024

The end goal, of course, is to optimize the availability of organizations’ software. Dynatrace is widely recognized for its AI capabilities’ ability to predict and prevent issues, and automatically identify root causes, maximizing availability. Eventually, the goal is to arrive at self-healing through autonomous cloud operations.

Availability

Availability DevOps Analytics Cloud

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Dynatrace

JUNE 25, 2020

Dynatrace Managed is intrinsically highly available as it stores three copies of all events, user sessions, and metrics across its cluster nodes. Our Premium High Availability comes with the following features: Active-active deployment model for optimum hardware utilization. Dynatrace news.

Availability

Availability Hardware Latency Traffic

Dynatrace Managed now available on all major cloud platforms

Dynatrace

JULY 31, 2020

Dynatrace Managed now available on the Google Cloud Platform. Some time ago we released a quick-start template for deploying Managed clusters on AWS infrastructure and Microsoft Azure is supported as well. You can now automatically leverage the benefits of the Dynatrace Managed offering with your GCP infrastructure.

Cloud

Cloud Availability Azure Google

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Dynatrace

JANUARY 21, 2025

Ensuring smooth operations is no small feat, whether you’re in charge of application performance, IT infrastructure, or business processes. Activate Davis AI to analyze charts within seconds Davis AI can help you expand your dashboards and dive deeper into your available data to extract additional information.

Traffic

Traffic Metrics Analytics Monitoring

How Dynatrace and ServiceNow Event Management provide deep observability and rapid resolution

Dynatrace

FEBRUARY 14, 2020

A utomatic detection of software service and application availability (including microservices and containers) . Application and infrastructure data collection . Automatic detection of service health and performance incidences, which are synchronized into the Event Management Dashboard. . Prioritize event entries .

Website

Website Cloud Infrastructure Metrics

Extend the AI and automation core of Dynatrace with host extensions to resolve infrastructure problems

Dynatrace

MAY 13, 2020

Host extensions enable you to add additional metrics, events, and properties to hosts. All the data bound to hosts is analyzed by the Davis AI causation engine and made available on custom dashboards and events pages. Looking for ways to solve some of your infrastructure-related problems? What’s next.

Infrastructure

Infrastructure Metrics Monitoring Software Engineering

Automate digital excellence with Dynatrace Synthetic Monitoring and Workflows

Dynatrace

JULY 18, 2024

Navigate digital infrastructure complexity In today’s rapidly evolving digital environment, organizations face increasing pressure from customers and competitors to deliver faster, more secure innovations. The effectiveness of this automation relies on the quality of the underlying data.

Monitoring

Monitoring DevOps Infrastructure Games

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

Infrastructure

Infrastructure Big Data Transportation Architecture

Leverage the power of Davis AI with custom time-series events for your specific use cases (Preview)

Dynatrace

SEPTEMBER 4, 2019

Dynatrace Davis automatically analyzes abnormal situations within your IT infrastructure and reports all relevant impacts and root causes. There are two main sources for individual events in Dynatrace: (1) Events that are triggered by a series of measurements (i.e, Info-level events don’t trigger alerts.

Metrics

Metrics Network Monitoring Availability

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

To harness this data effectively, we employ a process of interaction tokenization, ensuring meaningful events are identified and redundancies are minimized. Even with such strategies, interaction histories from active users can span thousands of events, exceeding the capacity of transformer models with standard self attention layers.

Tuning

Tuning Efficiency Latency Strategy

Mastering Kubernetes with Dynatrace

Dynatrace

AUGUST 24, 2020

Logs represent event data in plain-text, structured or binary format. But there are other related components and processes (for example, cloud provider infrastructure) that can cause problems in applications running on Kubernetes. Traces help find the flow of a request through a distributed system. Monitoring your i nfrastructure.

Analytics

Analytics Infrastructure AWS Operating System

Next generation Dynatrace Davis AI becomes the default causation engine

Dynatrace

NOVEMBER 26, 2019

Define custom events that can either trigger deeper analysis or contribute additional contextual information to Davis. The improved configuration workflow for custom event alerting offers a lot of power in terms of defining additional metric-based events for your Dynatrace environment. We opened up the Davis 2.0

Engineering

Engineering Serverless Metrics Code

Dynatrace observability is now available for Red Hat OpenShift on the IBM® Power® architecture

Dynatrace

JULY 11, 2023

Red Hat OpenShift monitoring with Dynatrace stands out due to the following key aspects: Infrastructure health monitoring and optimization: Assess the status of your infrastructure at a glance, understand resource consumption, optimize resource allocation for cost-efficiency, and track software versions running within the Kubernetes environment.

Architecture

Architecture Availability Infrastructure Metrics

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

Challenges The cloud network infrastructure that Netflix utilizes today consists of AWS services such as VPC, DirectConnect, VPC Peering, Transit Gateways, NAT Gateways, etc and Netflix owned devices. These metrics are visualized using Lumen , a self-service dashboarding infrastructure.

Network

Network Transportation AWS Cloud

How Red Hat and Dynatrace intelligently automate your production environment

Dynatrace

MAY 6, 2024

Integration with Red Hat Event-Driven-Ansible will also leverage Red Hat’s flexible rulebook system to map event data, such as problem categories or vulnerability identification, to the correct job template. Dynatrace Davis AI identifies the problem and maps the configuration change event to the root cause and the correct entity.

DevOps

DevOps Software Engineering Games Infrastructure

Keeping an eye on your control plane is critical to ensuring the high availability and health of your self-managed OpenShift Container Platform

Dynatrace

SEPTEMBER 22, 2021

Famous for providing out-of-the-box solutions, automation, and smart context across the entire application infrastructure with our unique Davis AI, Dynatrace now delivers two new extensions to assist teams that face the challenges associated with operating self-managed OCP installations. Control plane.

Availability

Availability Best Practices Infrastructure Monitoring

Ensuring the Successful Launch of Ads on Netflix

The Netflix TechBlog

JUNE 1, 2023

This tier extended existing infrastructure by adding new backend components and a new remote call to our ads partner on the playback path. New content or national events may drive brief spikes, but, by and large, traffic is usually smoothly increasing or decreasing. We used Elasticsearch dashboards to analyze results.

Traffic

Traffic Best Practices Systems Testing

Running the Astronomy Shop OpenTelemetry demo application with Dynatrace

Dynatrace

MARCH 13, 2025

OpenTelemetry provides a common set of tools, APIs, and SDKs to help collect observability signals from applications and infrastructure endpoints. All the needed components are available out of the box in the OpenTelemetry collector contrib distribution, which is included in the demo application. metrics from span data.

Open Source

Open Source Metrics Architecture Infrastructure

From syslog to AWS Firehose: Dynatrace log management innovations that enhance observability

Dynatrace

SEPTEMBER 5, 2024

Native support for syslog messages extends our infrastructure log support to all Linux/Unix systems and network devices. Logs are immediately available for troubleshooting, security investigations, and auditing, becoming integral to the platform alongside traces and metrics.

Innovation

Innovation AWS Analytics Storage

Observe syslog with Dynatrace ActiveGate, a secure, trusted edge component

Dynatrace

JULY 15, 2024

These include traditional on-premises network devices and servers for infrastructure applications like databases, websites, or email. You also might be required to capture syslog messages from cloud services on AWS, Azure, and Google Cloud related to resource provisioning, scaling, and security events.

Infrastructure

Infrastructure Network Azure Monitoring

Transform your operations with Davis AI root cause analysis

Dynatrace

OCTOBER 8, 2024

Modern observability has evolved from simple metric telemetry monitoring to encompass a wide range of data, including logs, traces, events, alerts, and resource attributes. Transform your operations today with the new Problems app and stay ahead in the ever-evolving software and cloud infrastructure landscape.

Infrastructure

Infrastructure Cloud Speed Architecture

The Ultimate Guide to Database High Availability

Percona

JUNE 22, 2023

To make data count and to ensure cloud computing is unabated, companies and organizations must have highly available databases. This guide provides an overview of what high availability means, the components involved, how to measure high availability, and how to achieve it. How does high availability work?

Availability

Availability Database Open Source Hardware

How a data lakehouse brings data insights to life

Dynatrace

OCTOBER 4, 2022

For IT infrastructure managers and site reliability engineers, or SREs , logs provide a treasure trove of data. These traditional approaches to log monitoring and log analytics thwart IT teams’ goal to address infrastructure performance problems, security threats, and user experience issues. where an error occurred at the code level.

Analytics

Analytics Storage Infrastructure Metrics

Netflix’s Distributed Counter Abstraction

OpenPipeline: Simplify access to critical business data

Trending Sources

New integrations announced at AWS re:Invent enhance cloud performance, security, and automation

Davis AI now detects infrastructure availability issues as root cause

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Reliability indicators that matter to your business: SLOs for all data types

Dynatrace joins the Microsoft Intelligent Security Association

Gain better visibility into your infrastructure with Windows service availability monitoring

Monitoring of Kubernetes Infrastructure for day 2 operations

Flexible, scalable, self-service Kubernetes native observability now in General Availability

Accelerate resolution of network issues with AI-powered event reporting based on SNMP traps

Dynatrace observability now available for Red Hat OpenShift on IBM Z and LinuxONE mainframes

Infrastructure Monitoring tools: 3 steps to evolve ITOps into AIOps

Expand application and infrastructure observability with operational insights into Kubernetes pods

AI-powered infrastructure monitoring for your SAP HANA database (Preview)

New event type helps avoid unnecessary alerts for planned host downscaling

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

RabbitMQ vs. Kafka: Key Differences

Trace, diagnose, resolve: Introducing the Infrastructure & Operations app for streamlined troubleshooting

AI-powered DNS request tracking extends infrastructure observability for high quality network traffic

Easily monitor your entire infrastructure with Dynatrace Synthetic monitors

Don’t just react: How executives can predict and prevent outages to maximize availability

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Dynatrace Managed now available on all major cloud platforms

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

How Dynatrace and ServiceNow Event Management provide deep observability and rapid resolution

Extend the AI and automation core of Dynatrace with host extensions to resolve infrastructure problems

Automate digital excellence with Dynatrace Synthetic Monitoring and Workflows

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Leverage the power of Davis AI with custom time-series events for your specific use cases (Preview)

Foundation Model for Personalized Recommendation

Mastering Kubernetes with Dynatrace

Next generation Dynatrace Davis AI becomes the default causation engine

Dynatrace observability is now available for Red Hat OpenShift on the IBM® Power® architecture

How Netflix uses eBPF flow logs at scale for network insight

How Red Hat and Dynatrace intelligently automate your production environment

Keeping an eye on your control plane is critical to ensuring the high availability and health of your self-managed OpenShift Container Platform

Ensuring the Successful Launch of Ads on Netflix

Running the Astronomy Shop OpenTelemetry demo application with Dynatrace

From syslog to AWS Firehose: Dynatrace log management innovations that enhance observability

Observe syslog with Dynatrace ActiveGate, a secure, trusted edge component

Transform your operations with Davis AI root cause analysis

The Ultimate Guide to Database High Availability

How a data lakehouse brings data insights to life

Stay Connected