Availability, Scalability and Systems - Technology Performance Pulse

Flexible, scalable, self-service Kubernetes native observability now in General Availability

Dynatrace

MAY 17, 2022

For years, enterprises managed observability data on a team-by-team basis , using a combination of ticketing systems and configuration management tools. The application consists of several microservices that are available as pod-backed services. Information about each of these topics will be available in upcoming announcements.

Availability

Availability Scalability Cloud Metrics

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

To this end, we developed a Rapid Event Notification System (RENO) to support use cases that require server initiated communication with devices in a scalable and extensible manner. In this blog post, we will give an overview of the Rapid Event Notification System at Netflix and share some of the learnings we gained along the way.

Systems

Systems Traffic Architecture Mobile

Achieving High Availability in CI/CD With Observability

DZone

MARCH 5, 2024

Since most application releases depend on cloud infrastructure, having good continuous integration and continuous delivery (CI/CD) pipelines and end-to-end observability becomes essential for ensuring highly available systems.

Availability

Availability DevOps Infrastructure Scalability

Microsoft Ignite 2024 guide: Cloud observability for AI transformation

Dynatrace

NOVEMBER 18, 2024

The power of cloud observability Modernizing legacy systems can be challenging, and it’s important to do so with purpose—not just to modernize for its own sake. By prioritizing observability, organizations can ensure the availability, performance, and security of business-critical applications.

Cloud

Cloud Azure Artificial Intelligence Innovation

Tailored access management, Part 3: Simplified setup for enterprise-scale access management

Dynatrace

OCTOBER 14, 2024

Manage the complexity of authorization systems Most modern authorization systems provide access management using Attribute-Based Access Control (ABAC). It also supports scalability, making it suitable for organizations of all sizes. High flexibility , adapting to dynamic environments and diverse user needs.

Monitoring

Monitoring Metrics Systems Scalability

Reliability indicators that matter to your business: SLOs for all data types

Dynatrace

OCTOBER 31, 2024

This lets you build your SLOs around the indicators that matter to you and your customers—critical metrics related to availability, failure rates, request response times, or select logs and business events. While the SLO management web UI and API are already available, the dashboard tile will be released within the next weeks.

Metrics

Metrics Availability Monitoring Scalability

Dynatrace extends Synthetic Monitoring capabilities with Network Availability Monitors to validate the availability of infrastructure and services

Dynatrace

JULY 15, 2024

As HTTP and browser monitors cover the application level of the ISO /OSI model , successful executions of synthetic tests indicate that availability and performance meet the expected thresholds of your entire technological stack. Combined with Dynatrace OneAgent ® , you gain a precise view of the status of your systems at a glance.

Availability

Availability Network Monitoring Infrastructure

How To Implement Specific Distributed System Patterns Using Spring Boot: Introduction

DZone

SEPTEMBER 11, 2024

Regarding contemporary software architecture, distributed systems have been widely recognized for quite some time as the foundation for applications with high availability, scalability, and reliability goals. Spring Boot Overview One of the most popular Java EE frameworks for creating apps is Spring.

Systems

Systems Java Software Architecture Programming

Don’t just react: How executives can predict and prevent outages to maximize availability

Dynatrace

OCTOBER 3, 2024

The end goal, of course, is to optimize the availability of organizations’ software. Dynatrace is widely recognized for its AI capabilities’ ability to predict and prevent issues, and automatically identify root causes, maximizing availability.

Availability

Availability DevOps Analytics Cloud

Choreography Pattern: Optimizing Communication in Distributed Systems

DZone

SEPTEMBER 30, 2023

While this architectural approach offers scalability, reusability, and adaptability, it also presents a unique challenge: effectively managing communication between these microservices. There are two popular methodologies available to tackle this challenge. The first, Service Orchestration , was discussed in my previous article.

Systems

Systems Virtualization Architecture Scalability

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

By Ko-Jen Hsiao , Yesu Feng and Sudarshan Lamkhede Motivation Netflixs personalized recommender system is a complex system, boasting a variety of specialized machine learned models each catering to distinct needs including Continue Watching and Todays Top Picks for You. Refer to our recent overview for more details).

Tuning

Tuning Efficiency Latency Strategy

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

Both categories share common requirements, such as high throughput and high availability. Failures in a distributed system are a given, and having the ability to safely retry requests enhances the reliability of the service. The table below provides a detailed overview of the diverse requirements across these two categories.

Latency

Latency Cache Infrastructure Strategy

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Scalegrid

SEPTEMBER 5, 2024

Managing High Availability (HA) in your PostgreSQL hosting is very important to ensuring your database deployment clusters maintain exceptional uptime and strong operational performance so your data is always available to your application. Effective management of failover and switchover operations is crucial for high availability.

Availability

Availability Servers Database Open Source

Dynatrace delivers flexible and scalable Kubernetes native synthetic private locations

Dynatrace

MAY 24, 2023

Because it’s critical that operations teams ensure that all internal resources are available for their users, synthetic monitoring of those resources is important. Global corporations with offices in multiple countries need to ensure that their internal systems are accessible to all employees, regardless of their location.

Scalability

Scalability Virtualization Monitoring Open Source

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Key Takeaways RabbitMQ improves scalability and fault tolerance in distributed systems by decoupling applications, enabling reliable message exchanges.

Best Practices

Best Practices Traffic Strategy Scalability

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

Introduction to Message Brokers Message brokers enable applications, services, and systems to communicate by acting as intermediaries between senders and receivers. This decoupling simplifies system architecture and supports scalability in distributed environments.

Latency

Latency Analytics Architecture Storage

Title Launch Observability at Netflix Scale

The Netflix TechBlog

MARCH 4, 2025

Part 3: System Strategies and Architecture By: VarunKhaitan With special thanks to my stunning colleagues: Mallika Rao , Esmir Mesic , HugoMarques This blog post is a continuation of Part 2 , where we cleared the ambiguity around title launch observability at Netflix. The request schema for the observability endpoint.

Traffic

Traffic Strategy Entertainment Innovation

New analytics capabilities for messaging system-related anomalies

Dynatrace

JANUARY 12, 2022

Messaging systems can significantly improve the reliability, performance, and scalability of the communication processes between applications and services. In serverless and microservices architectures, messaging systems are often used to build asynchronous service-to-service communication. Dynatrace news. This is great!

Analytics

Analytics Systems DevOps Healthcare

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems.

Systems

Systems Media Cache Open Source

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Dynatrace

JANUARY 21, 2025

Activate Davis AI to analyze charts within seconds Davis AI can help you expand your dashboards and dive deeper into your available data to extract additional information. Forecasting can identify potential anomalies in node performance, helping to prevent issues before they impact the system.

Traffic

Traffic Metrics Analytics Monitoring

How to observe logs with Journald and Dynatrace

Dynatrace

APRIL 4, 2025

Journald provides unified structured logging for systems, services, and applications, eliminating the need for custom parsing for severity or details. For forensic log analytics use cases, the Security Investigator app benefits from the scalability and analytics power of Dynatrace Grail.

Analytics

Analytics Operating System Scalability Infrastructure

Simplify log onboarding: From zero to observability in minutes

Dynatrace

MARCH 5, 2025

The log ingestion wizard offers support for all log ingestion methods available in Dynatrace Hub Get started with Logs: The OneAgent advantage For most scenarios, Dynatrace OneAgent is your best friend for getting started with Dynatrace log ingestion. Different log ingestion methods are available to address various needs.

Open Source

Open Source IoT Cloud Azure

Globalizing Productions with Netflix’s Media Production Suite

The Netflix TechBlog

MARCH 31, 2025

As file sizes grow and workflows become more complex, these issues are magnified, leading to inefficiencies that slow down post-production and reduce the available time spent on creativework. Depending on the market, or production budget, cutting-edge technology might not be available or affordable. So what isit?

Media

Media Logistics Innovation Cloud

What is log management? How to tame distributed cloud system complexities

Dynatrace

SEPTEMBER 8, 2022

Log management is an organization’s rules and policies for managing and enabling the creation, transmission, analysis, storage, and other tasks related to IT systems’ and applications’ log data. Distributed cloud systems are complex, dynamic, and difficult to manage without the proper tools. What is log management?

Cloud

Cloud Systems Analytics DevOps

Grafana Loki Fundamentals and Architecture

DZone

FEBRUARY 28, 2025

Grafana Loki is a horizontally scalable, highly available log aggregation system. Created by Grafana Labs in 2018, Loki has rapidly emerged as a compelling alternative to traditional logging systems, particularly for cloud-native and Kubernetes environments. It is designed for simplicity and cost-efficiency.

Architecture

Architecture Scalability Efficiency Cloud

Implementing a Self-Healing Infrastructure With Kubernetes and Prometheus

DZone

MAY 3, 2023

In today's world, the need for highly available and fault-tolerant systems is more important than ever. Kubernetes provides a highly scalable and flexible platform for managing containerized applications. Kubernetes provides two types of self-healing mechanisms: liveness probes and readiness probes.

Infrastructure

Infrastructure Open Source Scalability Monitoring

Engineering dependability and fault tolerance in a distributed system

High Scalability

FEBRUARY 19, 2021

Availability and Reliability are forms of dependability. Availability The degree to which a product or service is available for use when required. This means a system that is not merely available but is also engineered with extensive redundant measures to continue to work as its users expect.

Engineering

Engineering Systems Availability Scalability

Storage Types Used on Cloud Computing Platforms

DZone

JANUARY 24, 2024

Because of the emergence of cloud services, a broad range of storage choices are now easily available to fulfill the different demands of both organizations and people. These storage alternatives have been designed to meet a range of requirements, including performance, scalability, durability, and price.

Storage

Storage Cloud Scalability Design

PostgreSQL vs. Oracle: Difference in Costs, Ease of Use & Functionality

Scalegrid

JULY 13, 2020

Oracle Database is a commercial, proprietary multi-model database management system produced by Oracle Corporation, and the largest relational database management system (RDBMS) in the world. Compare PostgreSQL vs. Oracle functionality across available tools, capabilities and services. Not available. Not available.

Open Source

Open Source Tuning C++ Database

Celebrating innovation: Top Custom Solutions from the 2024 Dynatrace Partner App Competition

Dynatrace

FEBRUARY 14, 2025

By providing accessible telemetry data and scalable analytics, MS Teams Observability empowers helpdesk and operations teams to efficiently manage and resolve MS Teams performance issues and restore normal operations. Spica Solution’s CMDB app secured second place because it effectively addresses a significant business need.

Innovation

Innovation Government Operating System Efficiency

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. This guide delves into how these systems work, the challenges they solve, and their essential role in businesses and technology.

Storage

Storage Systems Big Data Azure

The Ultimate Guide to Database High Availability

Percona

JUNE 22, 2023

To make data count and to ensure cloud computing is unabated, companies and organizations must have highly available databases. This guide provides an overview of what high availability means, the components involved, how to measure high availability, and how to achieve it. Some disruption might occur, but it will be minimal.

Availability

Availability Database Open Source Hardware

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum uses an MPP database design that can help you develop a scalable, high performance deployment. The MPP system leverages a shared-nothing architecture to handle multiple operations in parallel. Typically an MPP system has one leader node and one or many compute nodes. At a glance – TLDR. Greenplum Advantages.

Big Data

Big Data Database Artificial Intelligence Open Source

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Behind the scenes, a myriad of systems and services are involved in orchestrating the product experience. These backend systems are consistently being evolved and optimized to meet and exceed customer and product expectations. It provides a good read on the availability and latency ranges under different production conditions.

Traffic

Traffic Latency Tuning Systems

Dynatrace supports Amazon Linux 2023 as an AWS launch partner

Dynatrace

MARCH 14, 2023

The Dynatrace Software Intelligence Platform accelerates cloud operations, helping organizations achieve service-level objectives (SLOs) with automated intelligence and unmatched scalability. AL2023 is supported by Dynatrace on day one and has been thoroughly tested by our installations team.

AWS

AWS Lambda Serverless Virtualization

Part 1: A Survey of Analytics Engineering Work at Netflix

The Netflix TechBlog

DECEMBER 17, 2024

Analytics Engineers deliver these insights by establishing deep business and product partnerships; translating business challenges into solutions that unblock critical decisions; and designing, building, and maintaining end-to-end analytical systems. DJ has a strong pedigreethere are several prior semantic layers in the industry (e.g.

Analytics

Analytics Engineering Entertainment Metrics

AWS serverless services: Exploring your options

Dynatrace

OCTOBER 7, 2021

This means you no longer have to provision, scale, and maintain servers to run your applications, databases, and storage systems. Scalability. Finally, there’s scalability. Serverless architecture shifts application hosting functions away from local servers onto those managed by providers.

Serverless

Serverless AWS Lambda Storage

What Is Cloud Testing: Everything You Need To Know

DZone

AUGUST 6, 2021

It involved sharing computing resources on different platforms, acted as a tool to improve scalability, and enabled effective IT administration and cost reduction. This primarily helps the QA teams to deal with the challenges like limited availability of devices, browsers, and operating systems.

Cloud

Cloud Testing Internet Internet

Best PostgreSQL GUI [2024]

Scalegrid

OCTOBER 18, 2024

Due to its versatility for storing information in both structured and unstructured formats, PostgreSQL is the fourth most used standard in modern database management systems (DBMS) worldwide 1. Offering comprehensive access to files, software features, and the operating system in a more user-friendly manner to ensure control.

Open Source

Open Source Database Cloud Operating System

Distributed tracing with Dynatrace just got even better

Dynatrace

MARCH 11, 2025

The Dynatrace platform now enables comprehensive data exploration and interactive analytics across data sets (trace, logs, events, and metrics)empowering you to solve complex use cases, handle any observability scenario, and gain unprecedented visibility into your systems.

Games

Games Analytics Innovation Metrics

Ready-to-Use High Availability Architectures for MySQL and PostgreSQL

Percona

JUNE 12, 2023

When it comes to access to their applications, users demand instant, reliable, and secure interactions — and that means databases must be highly available. With database high availability (HA), services are largely uninterrupted, and end users are largely satisfied. The obvious answer is this: To achieve high availability.

Architecture

Architecture Availability Open Source Hardware

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Smashing Magazine

JANUARY 7, 2025

The good news is that you can maximize availability and prevent website crashes by designing websites specifically for these events. For example, you can switch to a scalable cloud-based web host, or compress/optimize images to save bandwidth. You can often do this using built-in apps on your operating system.

Traffic

Traffic Website Design Cache

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

Network Availability: The expected continued growth of our ecosystem makes it difficult to understand our network bottlenecks and potential limits we may be reaching. availability, performance, and security), to ensure applications can effectively deliver their data payload across a globally dispersed cloud-based ecosystem.

Network

Network Transportation AWS Cloud

Flexible, scalable, self-service Kubernetes native observability now in General Availability

Rapid Event Notification System at Netflix

Trending Sources

Achieving High Availability in CI/CD With Observability

Microsoft Ignite 2024 guide: Cloud observability for AI transformation

Tailored access management, Part 3: Simplified setup for enterprise-scale access management

Reliability indicators that matter to your business: SLOs for all data types

Dynatrace extends Synthetic Monitoring capabilities with Network Availability Monitors to validate the availability of infrastructure and services

How To Implement Specific Distributed System Patterns Using Spring Boot: Introduction

Don’t just react: How executives can predict and prevent outages to maximize availability

Choreography Pattern: Optimizing Communication in Distributed Systems

Foundation Model for Personalized Recommendation

Netflix’s Distributed Counter Abstraction

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Dynatrace delivers flexible and scalable Kubernetes native synthetic private locations

Best Practices for Scaling RabbitMQ

RabbitMQ vs. Kafka: Key Differences

Title Launch Observability at Netflix Scale

New analytics capabilities for messaging system-related anomalies

Supporting Diverse ML Systems at Netflix

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

How to observe logs with Journald and Dynatrace

Simplify log onboarding: From zero to observability in minutes

Globalizing Productions with Netflix’s Media Production Suite

What is log management? How to tame distributed cloud system complexities

Grafana Loki Fundamentals and Architecture

Implementing a Self-Healing Infrastructure With Kubernetes and Prometheus

Engineering dependability and fault tolerance in a distributed system

Top PostgreSQL 17 New Features

Storage Types Used on Cloud Computing Platforms

PostgreSQL vs. Oracle: Difference in Costs, Ease of Use & Functionality

Celebrating innovation: Top Custom Solutions from the 2024 Dynatrace Partner App Competition

What is a Distributed Storage System

The Ultimate Guide to Database High Availability

What is Greenplum Database? Intro to the Big Data Database

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Dynatrace supports Amazon Linux 2023 as an AWS launch partner

Part 1: A Survey of Analytics Engineering Work at Netflix

AWS serverless services: Exploring your options

What Is Cloud Testing: Everything You Need To Know

Best PostgreSQL GUI [2024]

Distributed tracing with Dynatrace just got even better

Ready-to-Use High Availability Architectures for MySQL and PostgreSQL

How To Design For High-Traffic Events And Prevent Your Website From Crashing

How Netflix uses eBPF flow logs at scale for network insight

Stay Connected