Systems - Technology Performance Pulse

Network Guardians: Crafting a Spring Boot-Driven Anomaly Detection System

DZone

OCTOBER 11, 2024

This three-part article series will take you through the process of developing a network anomaly detection system using the Spring Boot framework in a robust manner. The series is organized as follows: Part 1: We’ll concentrate on the foundation and basic structure of our detection system, which has to be created.

Network

Network Systems Monitoring Technology

How OpenAI’s Downtime Incident Teaches Us to Build More Resilient Systems

DZone

DECEMBER 25, 2024

In this article, I will describe the technical aspects of the incident, break down the root causes, and explore key lessons that developers and organizations managing distributed systems can take away from this event.

Systems

Systems Efficiency Development

Build resilient IT systems and manage regulatory requirements with compliance and resilience capabilities from Dynatrace

Dynatrace

JANUARY 15, 2025

Here’s how Dynatrace can help automate up to 80% of technical tasks required to manage compliance and resilience: Understand the complexity of IT systems in real time Proactively prevent, prioritize, and efficiently manage performance and security incidents Automate manual and routine tasks to increase your productivity 1.

Systems

Systems DevOps Analytics Monitoring

System Design of an Audio Streaming Service

DZone

FEBRUARY 11, 2025

The system design of an audio streaming app is unique in how it deals with idiosyncratic business needs. Typically, audio streaming requires a large amount of data to be transferred within the limited bandwidth of the network communication channel.

Design

Design Systems Network

Challenges and Solutions in Developing Real-Time Messaging Systems

DZone

DECEMBER 24, 2024

Yet, building a real-time messaging system is anything but simple. Real-time interactions accelerate growth and foster user engagement, making messaging features pivotal for any business to succeed online.

Social Media

Social Media Systems Development Media

A Step-by-Step Guide to Write a System Design Document

DZone

FEBRUARY 26, 2025

Have you ever wondered how large-scale systems handle millions of requests seamlessly while ensuring speed, reliability, and scalability? Behind every high-performing application whether its a search engine, an e-commerce platform, or a real-time messaging service lies a well-thought-out system design.

Design

Design Systems Scalability Speed

Evolution of Recommendation Systems: From Legacy Rules Engines to Machine Learning

DZone

JANUARY 20, 2025

One of the most visible implementations of personalization is through recommendation systems, which provide users with tailored content, products, or experiences based on their interactions and preferences. This article explores how legacy rules-based systems operate, their limitations, and how machine learning has disrupted this space.

Systems

Systems Engineering Technology Technology

Backpressure in Distributed Systems

DZone

SEPTEMBER 26, 2024

Learn more about backpressure, a technique in distributed systems to prevent overload and cascading failures by controlling the flow of requests.

Systems

Decompose Legacy System Into Microservices: Part 2

DZone

NOVEMBER 29, 2023

This is particularly relevant in the domain of reimbursement calculation systems. The Monolithic Challenge Imagine a scenario where you have a large-scale, monolithic system - possibly a bulky C# console application or an extensive SQL Server stored procedure.

Systems

Systems C++ Scalability Architecture

Architecting for Resilience: Strategies for Fault-Tolerant Systems

DZone

DECEMBER 14, 2023

That means it's important that software systems are dependable, robust, and resilient. Resilient systems can withstand failures or errors without completely crashing. It lets systems keep working properly even when problems occur. We'll also discuss core principles and strategies for building fault-tolerant systems.

Strategy

Strategy Systems Serverless Cloud

A Look Into Netflix System Architecture

DZone

JULY 1, 2024

Netflix's system architecture emphasizes how important it is to determine how content is shaped in the future. Ever wondered how Netflix keeps you glued to your screen with uninterrupted streaming bliss? Netflix Architecture is responsible for the smooth streaming experience that attracts viewers worldwide behind the scenes.

Architecture

Architecture Systems Entertainment

Congestion Control in Cloud Scale Distributed Systems

DZone

DECEMBER 19, 2023

Distributed systems are composed of multiple systems that are wired together to provide a specific functionality. Systems that operate at a cloud scale can get expected or unexpected surges of traffic from one or multiple callers and are expected to perform in a predictable manner.

Systems

Systems Cloud Traffic Performance

Strategies for Building Self-Healing Software Systems

DZone

JUNE 20, 2024

In the vast realm of software development, there's a pursuit for software systems that are not only robust and efficient but can also "heal" themselves. Self-healing software systems represent a significant stride towards automation and resilience. 4 Key Strategies for Building Self-Healing Software Systems 1.

Strategy

Strategy Systems Software Software

Overcoming the Retry Dilemma in Distributed Systems

DZone

AUGUST 27, 2024

This was manifested in systems designs as well where we pushed these biases when designing systems. “Insanity is doing the same thing over and over again, but expecting different results” - Source unknown As you can see in the quote above, humans have this tendency to retry things even when results are not going to change.

Systems

Systems Design

Elevating System Management: The Role of Monitoring and Observability in DevOps

DZone

JUNE 21, 2023

In the ever-evolving world of DevOps , the ability to gain deep insights into system behavior, diagnose issues, and improve overall performance is one of the top priorities. Monitoring and observability are two key concepts that facilitate this process, offering valuable visibility into the health and performance of systems.

DevOps

DevOps Systems Monitoring Metrics

Build systems more reliably with Dynatrace: Chaos Engineering

Dynatrace

AUGUST 21, 2024

These releases often assumed ideal conditions such as zero latency, infinite bandwidth, and no network loss, as highlighted in Peter Deutsch’s eight fallacies of distributed systems. With Dynatrace, teams can seamlessly monitor the entire system, including network switches, database storage, and third-party dependencies.

Engineering

Engineering Systems Latency Metrics

A Comprehensive Guide to Database Sharding: Building Scalable Systems

DZone

OCTOBER 2, 2024

By the end of this guide, you’ll have a comprehensive understanding of database sharding, enabling you to implement it effectively in your systems. This section will provide insights into the architecture and strategies to ensure efficient query processing in a sharded environment.

Database

Database Systems Scalability Traffic

How To Implement Specific Distributed System Patterns Using Spring Boot: Introduction

DZone

SEPTEMBER 11, 2024

Regarding contemporary software architecture, distributed systems have been widely recognized for quite some time as the foundation for applications with high availability, scalability, and reliability goals. Spring Boot Overview One of the most popular Java EE frameworks for creating apps is Spring.

Systems

Systems Java Software Architecture Programming

Distributed Cloud Architecture for Resilient Systems

DZone

NOVEMBER 20, 2023

This is an article from DZone's 2023 Observability and Application Performance Trend Report. For more: Read the Report Employing cloud services can incur a great deal of risk if not planned and designed correctly. In fact, this is really no different than the challenges that are inherit within a single on-premises data center implementation.

Cloud

Cloud Architecture Systems Network

Data Integration in Real-Time Systems

DZone

NOVEMBER 7, 2023

In the rapidly evolving digital landscape, the role of data has shifted from being merely a byproduct of business to becoming its lifeblood. With businesses constantly in the race to stay ahead, the process of integrating this data becomes crucial. However, it's no longer enough to assimilate data in isolated, batch-oriented processes.

Systems

Systems Analytics Architecture Engineering

Catching up with OpenTelemetry in 2025

Dynatrace

FEBRUARY 27, 2025

In fact, observability is essential for shaping how we design smarter, more resilient systems for the future. As an open-source project, OpenTelemetry sets standards for telemetry data sets and works with a wide range of systems and platforms to collect and export telemetry data to backend systems. OpenTelemetry Collector 1.0

Tuning

Tuning Open Source Innovation Monitoring

Article: Transforming Legacy Healthcare Systems: A Journey to Cloud-Native Architecture

InfoQ

NOVEMBER 18, 2024

Discover how Livi navigated the complexities of transitioning MJog, a legacy healthcare system, to a cloud-native architecture, sharing valuable insights for successful tech modernization. Our experience illustrates that transitioning from legacy systems to cloud-based microservices is not a one-time project but an ongoing journey.

Healthcare

Healthcare Architecture Cloud Systems

Title Launch Observability at Netflix Scale

The Netflix TechBlog

MARCH 4, 2025

Part 3: System Strategies and Architecture By: VarunKhaitan With special thanks to my stunning colleagues: Mallika Rao , Esmir Mesic , HugoMarques This blog post is a continuation of Part 2 , where we cleared the ambiguity around title launch observability at Netflix. The request schema for the observability endpoint.

Traffic

Traffic Strategy Entertainment Innovation

OpenPipeline: Simplify access to critical business data

Dynatrace

NOVEMBER 4, 2024

There’s a goldmine of business data traversing your IT systems, yet most of it remains untapped. Other data sources, including APIs and log files — are used to expand access, often to external or proprietary systems. In fact, it’s likely that some of your critical business systems already write business data to log files.

Analytics

Analytics Airlines Metrics Monitoring

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

Failures in a distributed system are a given, and having the ability to safely retry requests enhances the reliability of the service. Implementing idempotency would likely require using an external system for such keys, which can further degrade performance or cause race conditions.

Latency

Latency Cache Infrastructure Strategy

Low-Maintenance Backend Architectures for Scalable Applications

DZone

JANUARY 10, 2025

My own journey of redesigning numerous systems and optimizing their performance has taught me time and again that creating a truly low-maintenance backend is an art that goes far beyond simple technical implementation. Developers could understand and manage the entire systems intricacies.

Architecture

Architecture Scalability Software Engineering Cloud

Tailored access management, Part 3: Simplified setup for enterprise-scale access management

Dynatrace

OCTOBER 14, 2024

Manage the complexity of authorization systems Most modern authorization systems provide access management using Attribute-Based Access Control (ABAC). The system demands significant effort to design, manage, and maintain, especially as an organization’s needs evolve.

Monitoring

Monitoring Metrics Systems Scalability

Title Launch Observability at Netflix Scale

The Netflix TechBlog

JANUARY 6, 2025

In this case, the main stakeholders are: - Title Launch Operators Role: Responsible for setting up the title and its metadata into our systems. In this context, were focused on developing systems that ensure successful title launches, build trust between content creators and our brand, and reduce engineering operational overhead.

Scalability

Scalability Cache Engineering Systems

Optimizing SQL Server Performance With AI: Automating Query Optimization and Predictive Maintenance

DZone

JANUARY 10, 2025

SQL Server is a powerful relational database management system (RDBMS), but as datasets grow in size and complexity, optimizing their performance becomes critical. Leveraging AI can revolutionize query optimization and predictive maintenance, ensuring the database remains efficient, secure, and responsive.

Servers

Servers Performance Database Efficiency

How to Implement Client-Side Load Balancing With Spring Cloud

DZone

OCTOBER 21, 2024

It is common for microservice systems to run more than one instance of each service. This is needed to enforce resiliency. It is therefore important to distribute the load between those instances. The component that does this is the load balancer. Spring provides a Spring Cloud Load Balancer library.

Cloud

Cloud Servers Systems

Automating Twilio Recording Exports for Quality Purposes: Python Implementation Guidelines

DZone

JANUARY 7, 2025

Twilio is a call management system that provides excellent call recording capabilities, but often organizations are in need of automatically downloading and storing these recordings locally or in their preferred cloud storage. Use Cases When working with call management systems like Twilio , we might need to:

Storage

Storage Efficiency Cloud Systems

Best Practices for Designing Resilient APIs for Scalability and Reliability

DZone

JANUARY 8, 2025

API resilience is about creating systems that can recover gracefully from disruptions, such as network outages or sudden traffic spikes, ensuring they remain reliable and secure. This has become critical since APIs serve as the backbone of todays interconnected systems.

Best Practices

Best Practices Design Scalability Architecture

Kubernetes in the Cloud: A Guide to Observability

DZone

JANUARY 3, 2025

This is where observability comes into play, offering critical insights into how your system is performing and why issues occur. But the way containers are continuously created and destroyed can sometimes present challenges with monitoring.

Cloud

Cloud Monitoring Systems Performance

Optimizing Database Performance in Middleware Applications

DZone

FEBRUARY 14, 2025

In the realm of modern software architecture, middleware plays a pivotal role in connecting various components of distributed systems. Efficient database operations in middleware can dramatically improve overall system performance, reduce latency, and enhance user experience.

Database

Database Performance Software Architecture Latency

Designing and Maintaining Event-Driven Architectures

DZone

MARCH 19, 2025

Event-driven architecture (EDA) gives your system the ability to receive and respond to changes in real time, making it easier to scale. This approach makes systems reactive, scalable, and resilient to failures. This design keeps the components independent of each other, making the system easier to scale and maintain.

Architecture

Architecture Design Scalability Monitoring

The keys to selecting a platform for end-to-end observability

Dynatrace

DECEMBER 2, 2024

Clearly, continuing to depend on siloed systems, disjointed monitoring tools, and manual analytics is no longer sustainable. AI systems that can explain the reasons for their recommendations grounded in causal AI can go a long way in resolving general distrust of AI models.

Artificial Intelligence

Artificial Intelligence DevOps Architecture Cloud

Helping customers unlock the Power of Possible

Dynatrace

OCTOBER 29, 2024

By automating root-cause analysis, TD Bank reduced incidents, speeding up resolution times and maintaining system reliability. To improve this, they turned to Dynatrace for AI-driven automation to accelerate problem detection and resolution. The result? This ability to innovate faster has given TD Bank a competitive edge in a complex market.

Innovation

Innovation Cloud Strategy AWS

Three Habits of Highly Effective Observability Teams

DZone

OCTOBER 17, 2024

It makes sense: in a world where developers – rather than operations teams – are keeping applications up and running, and where systems are highly distributed, ephemeral, and interconnected, how can you take the same approach you have in the past?

Open Source

Open Source Architecture Technology Technology

VMware Security Advisory VMSA-2025-0004: Quickly find, remediate, and automate

Dynatrace

MARCH 19, 2025

Heres more about the VMware security advisory and how you can quickly find affected systems using Dynatrace so you canautomate remediation efforts. With a TOCTOU vulnerability, an attacker can manipulate a system between the time a resource’s state is checked and when it’s used, also known as a race condition.

Virtualization

Virtualization Database Systems Operating System

Implementing LSM Trees in Golang: A Comprehensive Guide

DZone

OCTOBER 30, 2024

In this guide, we’ll walk through the implementation of an LSM tree in Golang , discuss features such as Write-Ahead Logging ( WAL ), block compression, and BloomFilters , and compare it with more traditional key-value storage systems and indexing strategies.

Strategy

Strategy Storage Efficiency Database

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

To achieve this, we are committed to building robust systems that deliver comprehensive observability, enabling us to take full accountability for every title on ourservice. Each title represents countless hours of effort and creativity, and our systems need to honor that uniqueness. Yet, these pages couldnt be more different.

Traffic

Traffic Scalability Strategy Monitoring

Monitoring Kubernetes Service Topology Changes in Real-Time

DZone

NOVEMBER 1, 2024

When deployed on bare-metal clusters or cloud VMs, database administrators are responsible for adding and removing nodes in a clustered system, planning the changes at times of low load to minimize disruption to production workloads.

Monitoring

Monitoring Scalability Database Cloud

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Scaling RabbitMQ ensures your system can handle growing traffic and maintain high performance. Key Takeaways RabbitMQ improves scalability and fault tolerance in distributed systems by decoupling applications, enabling reliable message exchanges.

Best Practices

Best Practices Traffic Strategy Efficiency

Understanding the Two Schools of Unit Testing

DZone

JANUARY 30, 2025

Unit tests help to check the correctness of newly written logic as well as prevent a system from regression by testing old logic every time (preferably with every build). Unit testing is an essential part of software development. However, there are two different approaches (or schools) to writing unit tests: Classical (a.k.a

Testing

Testing Software Software Systems

Network Guardians: Crafting a Spring Boot-Driven Anomaly Detection System

How OpenAI’s Downtime Incident Teaches Us to Build More Resilient Systems

Trending Sources

Build resilient IT systems and manage regulatory requirements with compliance and resilience capabilities from Dynatrace

System Design of an Audio Streaming Service

Challenges and Solutions in Developing Real-Time Messaging Systems

A Step-by-Step Guide to Write a System Design Document

Evolution of Recommendation Systems: From Legacy Rules Engines to Machine Learning

Backpressure in Distributed Systems

Decompose Legacy System Into Microservices: Part 2

Architecting for Resilience: Strategies for Fault-Tolerant Systems

A Look Into Netflix System Architecture

Congestion Control in Cloud Scale Distributed Systems

Strategies for Building Self-Healing Software Systems

Overcoming the Retry Dilemma in Distributed Systems

Elevating System Management: The Role of Monitoring and Observability in DevOps

Build systems more reliably with Dynatrace: Chaos Engineering

A Comprehensive Guide to Database Sharding: Building Scalable Systems

How To Implement Specific Distributed System Patterns Using Spring Boot: Introduction

Distributed Cloud Architecture for Resilient Systems

Data Integration in Real-Time Systems

Catching up with OpenTelemetry in 2025

Article: Transforming Legacy Healthcare Systems: A Journey to Cloud-Native Architecture

Title Launch Observability at Netflix Scale

OpenPipeline: Simplify access to critical business data

Netflix’s Distributed Counter Abstraction

Low-Maintenance Backend Architectures for Scalable Applications

Tailored access management, Part 3: Simplified setup for enterprise-scale access management

Title Launch Observability at Netflix Scale

Optimizing SQL Server Performance With AI: Automating Query Optimization and Predictive Maintenance

How to Implement Client-Side Load Balancing With Spring Cloud

Automating Twilio Recording Exports for Quality Purposes: Python Implementation Guidelines

Best Practices for Designing Resilient APIs for Scalability and Reliability

Kubernetes in the Cloud: A Guide to Observability

Optimizing Database Performance in Middleware Applications

Designing and Maintaining Event-Driven Architectures

The keys to selecting a platform for end-to-end observability

Helping customers unlock the Power of Possible

Three Habits of Highly Effective Observability Teams

VMware Security Advisory VMSA-2025-0004: Quickly find, remediate, and automate

Implementing LSM Trees in Golang: A Comprehensive Guide

Title Launch Observability at Netflix Scale

Monitoring Kubernetes Service Topology Changes in Real-Time

Best Practices for Scaling RabbitMQ

Understanding the Two Schools of Unit Testing

Stay Connected