Availability, Design and Systems - Technology Performance Pulse

Build resilient IT systems and manage regulatory requirements with compliance and resilience capabilities from Dynatrace

Dynatrace

JANUARY 15, 2025

They offer a comprehensive end-to-end solution to these challenges, providing functionalities designed to enhance compliance and resilience in IT environments. Understand the complexity of IT systems in real time Dynatrace helps you comprehensively map the entire IT environment in real time.

Systems

Systems DevOps Monitoring Analytics

Real-Time Presence Platform System Design

DZone

MAY 12, 2023

The system design of the Presence Platform depends on the design of the Real-Time Platform. I highly recommend reading the related article to improve your system design skills. The presence status represents the availability of the client for communication on a chat application or a social network.

Design

Design Systems Network Website

Hawkins: Diving into the Reasoning Behind our Design System

The Netflix TechBlog

FEBRUARY 10, 2021

Stranger Things imagery showcasing the inspiration for the Hawkins Design System by Hawkins team member Joshua Godi ; with art contributions by Wiki Chaves Hawkins may be the name of a fictional town in Indiana, most widely known as the backdrop for one of Netflix’s most popular TV series “Stranger Things,” but the name is so much more.

Design

Design Systems Engineering Entertainment

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

To this end, we developed a Rapid Event Notification System (RENO) to support use cases that require server initiated communication with devices in a scalable and extensible manner. In this blog post, we will give an overview of the Rapid Event Notification System at Netflix and share some of the learnings we gained along the way.

Systems

Systems Traffic Architecture Mobile

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

OpenPipeline: Simplify access to critical business data

Dynatrace

NOVEMBER 4, 2024

There’s a goldmine of business data traversing your IT systems, yet most of it remains untapped. Business events: Delivering the best data It’s been two years since we introduced business events , a special class of events designed to support even the most demanding business use cases. Easy to access.

Analytics

Analytics Airlines Metrics Monitoring

Dynatrace EdgeConnect securely connects your local systems to Dynatrace SaaS

Dynatrace

OCTOBER 10, 2023

EdgeConnect provides a secure bridge for SaaS-heavy companies like Dynatrace, which hosts numerous systems and data behind VPNs. In this hybrid world, IT and business processes often span across a blend of on-premises and SaaS systems, making standardization and automation necessary for efficiency.

Systems

Systems Efficiency Internet Internet

Tailored access management, Part 3: Simplified setup for enterprise-scale access management

Dynatrace

OCTOBER 14, 2024

Manage the complexity of authorization systems Most modern authorization systems provide access management using Attribute-Based Access Control (ABAC). The system demands significant effort to design, manage, and maintain, especially as an organization’s needs evolve.

Monitoring

Monitoring Metrics Systems Scalability

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Scalegrid

SEPTEMBER 5, 2024

Managing High Availability (HA) in your PostgreSQL hosting is very important to ensuring your database deployment clusters maintain exceptional uptime and strong operational performance so your data is always available to your application. Effective management of failover and switchover operations is crucial for high availability.

Availability

Availability Servers Database Open Source

How To Implement Specific Distributed System Patterns Using Spring Boot: Introduction

DZone

SEPTEMBER 11, 2024

Regarding contemporary software architecture, distributed systems have been widely recognized for quite some time as the foundation for applications with high availability, scalability, and reliability goals. Spring Boot's default codes and annotation setup lessen the time it takes to design an application.

Systems

Systems Java Software Architecture Programming

Title Launch Observability at Netflix Scale

The Netflix TechBlog

MARCH 4, 2025

Part 3: System Strategies and Architecture By: VarunKhaitan With special thanks to my stunning colleagues: Mallika Rao , Esmir Mesic , HugoMarques This blog post is a continuation of Part 2 , where we cleared the ambiguity around title launch observability at Netflix. The request schema for the observability endpoint.

Traffic

Traffic Strategy Entertainment Innovation

Choreography Pattern: Optimizing Communication in Distributed Systems

DZone

SEPTEMBER 30, 2023

Successfully coordinating messages among these services is a fundamental aspect of their design. There are two popular methodologies available to tackle this challenge. The first, Service Orchestration , was discussed in my previous article. In this article, we will dig into the second methodology: Choreography.

Systems

Systems Virtualization Architecture Scalability

Dynatrace observability now available for Red Hat OpenShift on IBM Z and LinuxONE mainframes

Dynatrace

JULY 24, 2024

IBM Z and LinuxONE mainframes running the Linux operating system enable you to respond faster to business demands, protect data from core to cloud, and streamline insights and automation. Dynatrace is designed to scale easily across the entire Kubernetes stack. Dynatrace observability is available for Red Hat OpenShift on IBM Power.

Availability

Availability Infrastructure Metrics Monitoring

Dynatrace joins the Microsoft Intelligent Security Association

Dynatrace

NOVEMBER 20, 2024

This rising risk amplifies the need for reliable security solutions that integrate with existing systems. Dynatrace, available as an Azure-native service , has a longstanding partnership with Microsoft, deeply rooted in a strong “build with” approach to deliver seamless user experience.

Best Practices

Best Practices Innovation Azure Cloud

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Smashing Magazine

JANUARY 7, 2025

How To Design For High-Traffic Events And Prevent Your Website From Crashing How To Design For High-Traffic Events And Prevent Your Website From Crashing Saad Khan 2025-01-07T14:00:00+00:00 2025-01-07T22:04:48+00:00 This article is sponsored by Cloudways Product launches and sales typically attract large volumes of traffic.

Traffic

Traffic Website Design Cache

Dare to debug production with Dynatrace Live Debugger

Dynatrace

FEBRUARY 4, 2025

Live snapshot Visual Studio Code IDE Live snapshot IntelliJ IDE Get even more context and telemetries from Dynatrace into your IDE Having real-time code-level data available within the IDE is only the beginning. Dynatrace Live Debugger is currently in preview and will be generally available within the next 90 days.

Open Source

Open Source Code Engineering Best Practices

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

By Ko-Jen Hsiao , Yesu Feng and Sudarshan Lamkhede Motivation Netflixs personalized recommender system is a complex system, boasting a variety of specialized machine learned models each catering to distinct needs including Continue Watching and Todays Top Picks for You. Refer to our recent overview for more details).

Tuning

Tuning Efficiency Latency Strategy

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Dynatrace

JUNE 25, 2020

Dynatrace Managed is intrinsically highly available as it stores three copies of all events, user sessions, and metrics across its cluster nodes. Our Premium High Availability comes with the following features: Active-active deployment model for optimum hardware utilization. Dynatrace news. Minimized cross-data center network traffic.

Availability

Availability Hardware Latency Traffic

Data privacy by design: How an observability platform protects data security

Dynatrace

APRIL 19, 2023

Creating an ecosystem that facilitates data security and data privacy by design can be difficult, but it’s critical to securing information. When organizations focus on data privacy by design, they build security considerations into cloud systems upfront rather than as a bolt-on consideration. API access management.

Design

Design Storage Programming Analytics

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems. ETL workflows), as well as downstream (e.g.

Systems

Systems Media Cache Open Source

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

RabbitMQ is designed for flexible routing and message reliability, while Kafka handles high-throughput event streaming and real-time data processing. Introduction to Message Brokers Message brokers enable applications, services, and systems to communicate by acting as intermediaries between senders and receivers.

Latency

Latency Analytics Architecture Storage

Simplify log onboarding: From zero to observability in minutes

Dynatrace

MARCH 5, 2025

The log ingestion wizard offers support for all log ingestion methods available in Dynatrace Hub Get started with Logs: The OneAgent advantage For most scenarios, Dynatrace OneAgent is your best friend for getting started with Dynatrace log ingestion. Different log ingestion methods are available to address various needs.

Open Source

Open Source IoT Cloud Azure

Build automated self-healing systems with xMatters and Dynatrace (Part 1 of 3)

Dynatrace

JULY 25, 2019

Flow Designer for more consistency in the delivery cycle. At this year’s Google Cloud Next conference, xMatters introduced Flow Designer , a visual designer that enables users to resolve issues without writing a single line of code. Flow Designer then connects the tools for you. How is this done? Slow microservices.

Systems

Systems DevOps Design Google

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Key Takeaways RabbitMQ improves scalability and fault tolerance in distributed systems by decoupling applications, enabling reliable message exchanges.

Best Practices

Best Practices Traffic Strategy Efficiency

Power Dashboarding, Part I: Start your exploration journey with Dashboards

Dynatrace

FEBRUARY 6, 2025

Whether you’re a seasoned IT expert or a marketing professional looking to improve business performance, understanding the data available to you is essential. As you went through these steps, you likely noticed some of the chart options available. Welcome, data enthusiasts! That’s where Dynatrace dashboards come in.

Metrics

Metrics Infrastructure Monitoring Best Practices

Dynatrace Cost & Carbon Optimization certified for accuracy and transparency

Dynatrace

MARCH 5, 2025

Integration with existing systems and processes : Integration with existing IT infrastructure, observability solutions, and workflows often requires significant investment and customization. The certification results are now publicly available.

Energy

Energy Analytics Traffic Cloud

Grafana Loki Fundamentals and Architecture

DZone

FEBRUARY 28, 2025

Grafana Loki is a horizontally scalable, highly available log aggregation system. It is designed for simplicity and cost-efficiency. Created by Grafana Labs in 2018, Loki has rapidly emerged as a compelling alternative to traditional logging systems, particularly for cloud-native and Kubernetes environments.

Architecture

Architecture Scalability Efficiency Cloud

Dynatrace Observability for Developers saves time with real-time data

Dynatrace

FEBRUARY 4, 2025

Building the dream package Observability for Developers, the newly introduced offering from Dynatrace, is designed to cater to developers’ specific needs and challenges. In a single view, developers get an instant overview of application performance, system health, logs, problems, deployment status, user interactions, and much more.

Development

Development Analytics Code Architecture

Best PostgreSQL GUI [2024]

Scalegrid

OCTOBER 18, 2024

Due to its versatility for storing information in both structured and unstructured formats, PostgreSQL is the fourth most used standard in modern database management systems (DBMS) worldwide 1. Offering comprehensive access to files, software features, and the operating system in a more user-friendly manner to ensure control.

Open Source

Open Source Database Cloud Operating System

My code::dive talk video is available: New Q&A

Sutter's Mill

DECEMBER 11, 2024

This is the year we put erroneous behavior [in] and eliminated uninitialized locals in the standard, this is the year that we design-approved reflection for the standard both for C++26 and hopefully they’ll both get in. I use this world’s banking system. I rely on this world’s hospital system.

Code

Code C++ Availability Energy

Part 2: A Survey of Analytics Engineering Work at Netflix

The Netflix TechBlog

JANUARY 2, 2025

To better guide the design and budgeting of future campaigns, we are developing an Incremental Return on Investment model. Ideally, we would have causal estimates from an A/B test to use for validation, but since that is not available, we use another causal inference design as one of our ensemble of validation approaches.

Analytics

Analytics Engineering Games Entertainment

Introducing Configurable Metaflow

The Netflix TechBlog

DECEMBER 19, 2024

Many of these projects are under constant development by dedicated teams with their own business goals and development best practices, such as the system that supports our content decision makers , or the system that ranks which language subtitles are most valuable for a specific piece ofcontent.

Best Practices

Best Practices Cache Metrics Code

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

It’s architecture was specially designed to manage large-scale data warehouses and business intelligence workloads by giving you the ability to spread your data out across a multitude of servers. Greenplum uses an MPP database design that can help you develop a scalable, high performance deployment. Greenplum Architectural Design.

Big Data

Big Data Database Artificial Intelligence Open Source

The Ultimate Guide to Database High Availability

Percona

JUNE 22, 2023

To make data count and to ensure cloud computing is unabated, companies and organizations must have highly available databases. This guide provides an overview of what high availability means, the components involved, how to measure high availability, and how to achieve it. Some disruption might occur, but it will be minimal.

Availability

Availability Database Open Source Hardware

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. This guide delves into how these systems work, the challenges they solve, and their essential role in businesses and technology.

Storage

Storage Systems Big Data Azure

Building a Media Understanding Platform for ML Innovations

The Netflix TechBlog

MARCH 14, 2023

We must quickly surface the most stand-out highlights from the titles available on our service in the form of images and videos in the member experience. We implemented a batch processing system for users to submit their requests and wait for the system to generate the output. Maintaining disparate systems posed a challenge.

Media

Media Innovation Energy Architecture

Building Resiliency With Effective Error Management

DZone

JANUARY 23, 2022

Building resilient systems requires comprehensive error management. Errors could occur in any part of the system / or its ecosystem and there are different ways of handling these e.g. Datacenter - data center failure where the whole DC could become unavailable due to power failure, network connectivity failure, environmental catastrophe, etc.

Hardware

Hardware DevOps Network Storage

Storage Types Used on Cloud Computing Platforms

DZone

JANUARY 24, 2024

Because of the emergence of cloud services, a broad range of storage choices are now easily available to fulfill the different demands of both organizations and people. These storage alternatives have been designed to meet a range of requirements, including performance, scalability, durability, and price.

Storage

Storage Cloud Scalability Design

Resilience Pattern: Circuit Breaker

DZone

NOVEMBER 16, 2023

In this article, we will explore one of the most common and useful resilience patterns in distributed systems: the circuit breaker. The circuit breaker is a design pattern that prevents cascading failures and improves the overall availability and performance of a system. What Is a Circuit Breaker?

Latency

Latency Network Database Monitoring

Introduction to the Circuit Breaker Pattern

DZone

JANUARY 10, 2021

If the backend service is not available for some time, then what kind of fail-proof system should you implement? This is where the Circuit Breaker design pattern comes in. Consider that you’re running a web service that requires input and delivers it to another backend service.

Design

Design Availability Systems Performance

The Dynatrace Platform Subscription model enables broad Infrastructure Monitoring

Dynatrace

JUNE 27, 2023

This subscription model offers the flexibility to deploy Dynatrace even more broadly to gain greater visibility into system performance, improve the ability to detect and prevent bottlenecks, and quickly detect and diagnose problems. With DPS, metrics are available as a pool per tenant.

Infrastructure

Infrastructure Monitoring Metrics Network

Simplify complex cloud-native environments with AI-driven observability

Dynatrace

OCTOBER 3, 2024

Monitor your cloud OpenPipeline ™ is the Dynatrace platform data-handling solution designed to seamlessly ingest and process data from any source, regardless of scale or format. Furthermore, OpenPipeline is designed to collect and process data securely and in compliance with industry standards.

Cloud

Cloud Lambda AWS Analytics

Getting Started With Prometheus Workshop: Exploring Basic Queries

DZone

APRIL 14, 2023

This workshop is for you, designed to expand your knowledge and understanding of open-source observability tooling that is available to you today. Prometheus is an open-source systems monitoring and alerting tool kit that enables you to hit the ground running with discovering, collecting, and querying your observability today.

Open Source

Open Source Metrics Monitoring Design

Mastering Kubernetes with Dynatrace

Dynatrace

AUGUST 24, 2020

To make this possible, the application code should be instrumented with telemetry data for deep insights, including: Metrics to find out how the behavior of a system has changed over time. Traces help find the flow of a request through a distributed system. Logs represent event data in plain-text, structured or binary format.

Analytics

Analytics Infrastructure AWS Operating System

Build resilient IT systems and manage regulatory requirements with compliance and resilience capabilities from Dynatrace

Real-Time Presence Platform System Design

Trending Sources

Hawkins: Diving into the Reasoning Behind our Design System

Rapid Event Notification System at Netflix

Netflix’s Distributed Counter Abstraction

OpenPipeline: Simplify access to critical business data

Dynatrace EdgeConnect securely connects your local systems to Dynatrace SaaS

Tailored access management, Part 3: Simplified setup for enterprise-scale access management

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

How To Implement Specific Distributed System Patterns Using Spring Boot: Introduction

Title Launch Observability at Netflix Scale

Choreography Pattern: Optimizing Communication in Distributed Systems

Dynatrace observability now available for Red Hat OpenShift on IBM Z and LinuxONE mainframes

Dynatrace joins the Microsoft Intelligent Security Association

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Dare to debug production with Dynatrace Live Debugger

Foundation Model for Personalized Recommendation

Dynatrace Managed turnkey Premium High Availability for globally distributed data centers (Early Adopter)

Data privacy by design: How an observability platform protects data security

Supporting Diverse ML Systems at Netflix

RabbitMQ vs. Kafka: Key Differences

Simplify log onboarding: From zero to observability in minutes

Build automated self-healing systems with xMatters and Dynatrace (Part 1 of 3)

Best Practices for Scaling RabbitMQ

Power Dashboarding, Part I: Start your exploration journey with Dashboards

Dynatrace Cost & Carbon Optimization certified for accuracy and transparency

Grafana Loki Fundamentals and Architecture

Dynatrace Observability for Developers saves time with real-time data

Best PostgreSQL GUI [2024]

My code::dive talk video is available: New Q&A

Part 2: A Survey of Analytics Engineering Work at Netflix

Introducing Configurable Metaflow

What is Greenplum Database? Intro to the Big Data Database

The Ultimate Guide to Database High Availability

What is a Distributed Storage System

Building a Media Understanding Platform for ML Innovations

Building Resiliency With Effective Error Management

Storage Types Used on Cloud Computing Platforms

Resilience Pattern: Circuit Breaker

Introduction to the Circuit Breaker Pattern

The Dynatrace Platform Subscription model enables broad Infrastructure Monitoring

Simplify complex cloud-native environments with AI-driven observability

Getting Started With Prometheus Workshop: Exploring Basic Queries

Mastering Kubernetes with Dynatrace

Stay Connected