Systems - Technology Performance Pulse

Backpressure in Distributed Systems

DZone

SEPTEMBER 26, 2024

Learn more about backpressure, a technique in distributed systems to prevent overload and cascading failures by controlling the flow of requests.

Systems

Network Guardians: Crafting a Spring Boot-Driven Anomaly Detection System

DZone

OCTOBER 11, 2024

This three-part article series will take you through the process of developing a network anomaly detection system using the Spring Boot framework in a robust manner. The series is organized as follows: Part 1: We’ll concentrate on the foundation and basic structure of our detection system, which has to be created.

Network

Network Systems Monitoring Technology

Decompose Legacy System Into Microservices: Part 2

DZone

NOVEMBER 29, 2023

This is particularly relevant in the domain of reimbursement calculation systems. The Monolithic Challenge Imagine a scenario where you have a large-scale, monolithic system - possibly a bulky C# console application or an extensive SQL Server stored procedure.

Systems

Systems C++ Scalability Architecture

Architecting for Resilience: Strategies for Fault-Tolerant Systems

DZone

DECEMBER 14, 2023

That means it's important that software systems are dependable, robust, and resilient. Resilient systems can withstand failures or errors without completely crashing. It lets systems keep working properly even when problems occur. We'll also discuss core principles and strategies for building fault-tolerant systems.

Strategy

Strategy Systems Serverless Cloud

Congestion Control in Cloud Scale Distributed Systems

DZone

DECEMBER 19, 2023

Distributed systems are composed of multiple systems that are wired together to provide a specific functionality. Systems that operate at a cloud scale can get expected or unexpected surges of traffic from one or multiple callers and are expected to perform in a predictable manner.

Systems

Systems Cloud Traffic Performance

Strategies for Building Self-Healing Software Systems

DZone

JUNE 20, 2024

In the vast realm of software development, there's a pursuit for software systems that are not only robust and efficient but can also "heal" themselves. Self-healing software systems represent a significant stride towards automation and resilience. 4 Key Strategies for Building Self-Healing Software Systems 1.

Strategy

Strategy Systems Software Software

Overcoming the Retry Dilemma in Distributed Systems

DZone

AUGUST 27, 2024

This was manifested in systems designs as well where we pushed these biases when designing systems. “Insanity is doing the same thing over and over again, but expecting different results” - Source unknown As you can see in the quote above, humans have this tendency to retry things even when results are not going to change.

Systems

Systems Design

Elevating System Management: The Role of Monitoring and Observability in DevOps

DZone

JUNE 21, 2023

In the ever-evolving world of DevOps , the ability to gain deep insights into system behavior, diagnose issues, and improve overall performance is one of the top priorities. Monitoring and observability are two key concepts that facilitate this process, offering valuable visibility into the health and performance of systems.

DevOps

DevOps Systems Monitoring Metrics

A Look Into Netflix System Architecture

DZone

JULY 1, 2024

Netflix's system architecture emphasizes how important it is to determine how content is shaped in the future. Ever wondered how Netflix keeps you glued to your screen with uninterrupted streaming bliss? Netflix Architecture is responsible for the smooth streaming experience that attracts viewers worldwide behind the scenes.

Architecture

Architecture Systems Entertainment

A Comprehensive Guide to Database Sharding: Building Scalable Systems

DZone

OCTOBER 2, 2024

By the end of this guide, you’ll have a comprehensive understanding of database sharding, enabling you to implement it effectively in your systems. This section will provide insights into the architecture and strategies to ensure efficient query processing in a sharded environment.

Database

Database Systems Scalability Traffic

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems. ETL workflows), as well as downstream (e.g.

Systems

Systems Media Cache Open Source

Designing a Scalable and Fault-Tolerant Messaging System for Distributed Applications

DZone

JANUARY 26, 2024

Building a strong messaging system is critical in the world of distributed systems for seamless communication between multiple components. A messaging system serves as a backbone, allowing information transmission between different services or modules in a distributed architecture.

Scalability

Scalability Design Systems Architecture

How OpenAI’s Downtime Incident Teaches Us to Build More Resilient Systems

DZone

DECEMBER 25, 2024

In this article, I will describe the technical aspects of the incident, break down the root causes, and explore key lessons that developers and organizations managing distributed systems can take away from this event.

Systems

Systems Efficiency Development

Distributed Cloud Architecture for Resilient Systems

DZone

NOVEMBER 20, 2023

This is an article from DZone's 2023 Observability and Application Performance Trend Report. For more: Read the Report Employing cloud services can incur a great deal of risk if not planned and designed correctly. In fact, this is really no different than the challenges that are inherit within a single on-premises data center implementation.

Cloud

Cloud Architecture Systems Network

How To Implement Specific Distributed System Patterns Using Spring Boot: Introduction

DZone

SEPTEMBER 11, 2024

Regarding contemporary software architecture, distributed systems have been widely recognized for quite some time as the foundation for applications with high availability, scalability, and reliability goals. Spring Boot Overview One of the most popular Java EE frameworks for creating apps is Spring.

Systems

Systems Java Software Architecture Programming

Zabbix as Universal Monitoring System for IT Company: Tips for Effective DevOps Monitoring

DZone

OCTOBER 26, 2023

My first encounter with this monitoring system was in 2014 when I joined a project where Zabbix was already in use for monitoring network devices (routers, switches). Over the course of five years, while working on the project, we went through several system upgrades until we finally transitioned to Zabbix 4.0

Monitoring

Monitoring Systems DevOps Virtualization

Data Integration in Real-Time Systems

DZone

NOVEMBER 7, 2023

In the rapidly evolving digital landscape, the role of data has shifted from being merely a byproduct of business to becoming its lifeblood. With businesses constantly in the race to stay ahead, the process of integrating this data becomes crucial. However, it's no longer enough to assimilate data in isolated, batch-oriented processes.

Systems

Systems Analytics Architecture Engineering

Choreography Pattern: Optimizing Communication in Distributed Systems

DZone

SEPTEMBER 30, 2023

In today's rapidly evolving technology landscape, it's common for applications to migrate to the cloud to embrace the microservice architecture. While this architectural approach offers scalability, reusability, and adaptability, it also presents a unique challenge: effectively managing communication between these microservices.

Systems

Systems Virtualization Architecture Scalability

Build systems more reliably with Dynatrace: Chaos Engineering

Dynatrace

AUGUST 21, 2024

These releases often assumed ideal conditions such as zero latency, infinite bandwidth, and no network loss, as highlighted in Peter Deutsch’s eight fallacies of distributed systems. With Dynatrace, teams can seamlessly monitor the entire system, including network switches, database storage, and third-party dependencies.

Engineering

Engineering Systems Latency Metrics

Build resilient IT systems and manage regulatory requirements with compliance and resilience capabilities from Dynatrace

Dynatrace

JANUARY 15, 2025

Here’s how Dynatrace can help automate up to 80% of technical tasks required to manage compliance and resilience: Understand the complexity of IT systems in real time Proactively prevent, prioritize, and efficiently manage performance and security incidents Automate manual and routine tasks to increase your productivity 1.

Systems

Systems DevOps Analytics Monitoring

Challenges and Solutions in Developing Real-Time Messaging Systems

DZone

DECEMBER 24, 2024

Yet, building a real-time messaging system is anything but simple. Real-time interactions accelerate growth and foster user engagement, making messaging features pivotal for any business to succeed online.

Social Media

Social Media Systems Development Media

Evolution of Recommendation Systems: From Legacy Rules Engines to Machine Learning

DZone

JANUARY 20, 2025

One of the most visible implementations of personalization is through recommendation systems, which provide users with tailored content, products, or experiences based on their interactions and preferences. This article explores how legacy rules-based systems operate, their limitations, and how machine learning has disrupted this space.

Engineering

Engineering Systems Technology Technology

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Behind the scenes, a myriad of systems and services are involved in orchestrating the product experience. These backend systems are consistently being evolved and optimized to meet and exceed customer and product expectations. This technique facilitates validation on multiple fronts.

Traffic

Traffic Latency Tuning Systems

Ensuring the Successful Launch of Ads on Netflix

The Netflix TechBlog

JUNE 1, 2023

In this blog post, we’ll discuss the methods we used to ensure a successful launch, including: How we tested the system Netflix technologies involved Best practices we developed Realistic Test Traffic Netflix traffic ebbs and flows throughout the day in a sinusoidal pattern. Basic with ads was launched worldwide on November 3rd.

Traffic

Traffic Best Practices Systems Testing

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

The Netflix TechBlog

JUNE 13, 2023

This is where large-scale system migrations come into play. By tracking metrics only at the level of service being updated, we might miss capturing deviations in broader end-to-end system functionality. Canaries and sticky canaries are valuable tools in the system migration process.

Traffic

Traffic Metrics Systems Strategy

Building a Media Understanding Platform for ML Innovations

The Netflix TechBlog

MARCH 14, 2023

We implemented a batch processing system for users to submit their requests and wait for the system to generate the output. This limited pilot system greatly reduced the time spent by our users to manually analyze the content. Maintaining disparate systems posed a challenge. Processing took several hours to complete.

Media

Media Innovation Energy Architecture

Scaling Media Machine Learning at Netflix

The Netflix TechBlog

FEBRUARY 13, 2023

This feature store is equipped with a data replication system that enables copying data to different storage solutions depending on the required access patterns. Training Performance Media model training poses multiple system challenges in storage, network, and GPUs.

Media

Media Storage Infrastructure Systems

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

As Kubernetes adoption increases and it continues to advance technologically, Kubernetes has emerged as the “operating system” of the cloud. Kubernetes is emerging as the “operating system” of the cloud. Kubernetes is emerging as the “operating system” of the cloud. Kubernetes moved to the cloud in 2022.

Open Source

Open Source Java Operating System Programming

Implementing a Self-Healing Infrastructure With Kubernetes and Prometheus

DZone

MAY 3, 2023

In today's world, the need for highly available and fault-tolerant systems is more important than ever. It includes features such as automatic scaling, rolling updates, and self-healing, making it an ideal choice for building highly available systems.

Infrastructure

Infrastructure Open Source Scalability Monitoring

5 DNS Troubleshooting Tips for Network Teams

DZone

APRIL 16, 2023

“Set it and forget it” is the approach that most network teams follow with their authoritative Domain Name System (DNS). If the system is working and end-users find network connections to revenue-generating applications, services, and content, then administrators will generally say that you shouldn’t mess with success.

Network

Network Strategy Systems Performance

Low-Maintenance Backend Architectures for Scalable Applications

DZone

JANUARY 10, 2025

My own journey of redesigning numerous systems and optimizing their performance has taught me time and again that creating a truly low-maintenance backend is an art that goes far beyond simple technical implementation. Developers could understand and manage the entire systems intricacies.

Architecture

Architecture Scalability Software Engineering Cloud

Effective Communication Strategies Between Microservices: Techniques and Real-World Examples

DZone

MARCH 5, 2024

Building scalable systems using microservices architecture is a strategic approach to developing complex applications. This step-by-step guide outlines the process of creating a microservices-based system, complete with detailed examples.

Strategy

Strategy Scalability Architecture Systems

Achieving High Availability in CI/CD With Observability

DZone

MARCH 5, 2024

Since most application releases depend on cloud infrastructure, having good continuous integration and continuous delivery (CI/CD) pipelines and end-to-end observability becomes essential for ensuring highly available systems.

Availability

Availability DevOps Infrastructure Scalability

Redefining Artifact Storage: Preparing for Tomorrow's Binary Management Needs

DZone

SEPTEMBER 23, 2024

As software pipelines evolve, so do the demands on binary and artifact storage systems. The Current Landscape: Artifact and Package Manager Solutions There are several leading artifact and package management systems today, each with its own strengths and limitations. Let’s explore the key players:

Storage

Storage Innovation Scalability Infrastructure

11 Observability Tools You Should Know

DZone

MARCH 7, 2023

When organizations move toward the cloud, their systems also lean toward distributed architectures. You need to find the right tools to monitor, track and trace these systems by analyzing outputs through metrics, logs, and traces. One of the most common examples is the adoption of microservices.

Architecture

Architecture Metrics Cloud Monitoring

CrowdStrike BSOD: Quickly find machines impacted by the CrowdStrike issue

Dynatrace

JULY 19, 2024

For the CrowdStrike issue, one can use both monitored Windows System logs and the Dynatrace entity model to find out what servers are impacted. The following is an example of a query using the Dynatrace Query Language (DQL) to find out when BSOD issues are being written to Windows System logs.

Airlines

Airlines Servers Retail Monitoring

The Guide to SRE Principles

DZone

NOVEMBER 30, 2023

Site reliability engineering (SRE) is a discipline in which automated software systems are built to manage the development operations (DevOps) of a product or service. In other words, SRE automates the functions of an operations team via software systems.

DevOps

DevOps Engineering Software Software

Optimizing SQL Server Performance With AI: Automating Query Optimization and Predictive Maintenance

DZone

JANUARY 10, 2025

SQL Server is a powerful relational database management system (RDBMS), but as datasets grow in size and complexity, optimizing their performance becomes critical. Leveraging AI can revolutionize query optimization and predictive maintenance, ensuring the database remains efficient, secure, and responsive.

Servers

Servers Performance Database Efficiency

Visual Network Mapping Your K8s Clusters To Assess Performance

DZone

JANUARY 17, 2023

Building performant services and systems is at the core of every business. Growing organizations, in the process of upscaling their services, unintentionally introduce complexities into the system. Tons of technologies emerge daily, promising capabilities that help you surpass your performance benchmarks.

Network

Network Benchmarking Performance Infrastructure

Title Launch Observability at Netflix Scale

The Netflix TechBlog

JANUARY 6, 2025

In this case, the main stakeholders are: - Title Launch Operators Role: Responsible for setting up the title and its metadata into our systems. In this context, were focused on developing systems that ensure successful title launches, build trust between content creators and our brand, and reduce engineering operational overhead.

Scalability

Scalability Engineering Cache Systems

Microservices vs. Monolith at a Startup: Making the Choice

DZone

JANUARY 31, 2024

The appeal of building a system that's inherently designed to grow and adapt as the startup evolves is undeniable. This approach offers many advantages, particularly in enabling teams to update and deploy individual components without disrupting the entire system.

Architecture

Architecture Scalability Design Engineering

Automating Twilio Recording Exports for Quality Purposes: Python Implementation Guidelines

DZone

JANUARY 7, 2025

Twilio is a call management system that provides excellent call recording capabilities, but often organizations are in need of automatically downloading and storing these recordings locally or in their preferred cloud storage. Use Cases When working with call management systems like Twilio , we might need to:

Storage

Storage Efficiency Cloud Systems

Kubernetes in the Cloud: A Guide to Observability

DZone

JANUARY 3, 2025

This is where observability comes into play, offering critical insights into how your system is performing and why issues occur. But the way containers are continuously created and destroyed can sometimes present challenges with monitoring.

Cloud

Cloud Monitoring Systems Performance

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

Failures in a distributed system are a given, and having the ability to safely retry requests enhances the reliability of the service. Implementing idempotency would likely require using an external system for such keys, which can further degrade performance or cause race conditions.

Latency

Latency Cache Infrastructure Strategy

Backpressure in Distributed Systems

Network Guardians: Crafting a Spring Boot-Driven Anomaly Detection System

Trending Sources

Decompose Legacy System Into Microservices: Part 2

Architecting for Resilience: Strategies for Fault-Tolerant Systems

Congestion Control in Cloud Scale Distributed Systems

Strategies for Building Self-Healing Software Systems

Overcoming the Retry Dilemma in Distributed Systems

Elevating System Management: The Role of Monitoring and Observability in DevOps

A Look Into Netflix System Architecture

A Comprehensive Guide to Database Sharding: Building Scalable Systems

Supporting Diverse ML Systems at Netflix

Designing a Scalable and Fault-Tolerant Messaging System for Distributed Applications

How OpenAI’s Downtime Incident Teaches Us to Build More Resilient Systems

Distributed Cloud Architecture for Resilient Systems

How To Implement Specific Distributed System Patterns Using Spring Boot: Introduction

Zabbix as Universal Monitoring System for IT Company: Tips for Effective DevOps Monitoring

Data Integration in Real-Time Systems

Choreography Pattern: Optimizing Communication in Distributed Systems

Build systems more reliably with Dynatrace: Chaos Engineering

Build resilient IT systems and manage regulatory requirements with compliance and resilience capabilities from Dynatrace

Challenges and Solutions in Developing Real-Time Messaging Systems

Evolution of Recommendation Systems: From Legacy Rules Engines to Machine Learning

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Ensuring the Successful Launch of Ads on Netflix

Migrating Critical Traffic At Scale with No Downtime?—?Part 2

Building a Media Understanding Platform for ML Innovations

Scaling Media Machine Learning at Netflix

Kubernetes in the wild report 2023

Implementing a Self-Healing Infrastructure With Kubernetes and Prometheus

5 DNS Troubleshooting Tips for Network Teams

Low-Maintenance Backend Architectures for Scalable Applications

Effective Communication Strategies Between Microservices: Techniques and Real-World Examples

Achieving High Availability in CI/CD With Observability

Redefining Artifact Storage: Preparing for Tomorrow's Binary Management Needs

11 Observability Tools You Should Know

CrowdStrike BSOD: Quickly find machines impacted by the CrowdStrike issue

The Guide to SRE Principles

Optimizing SQL Server Performance With AI: Automating Query Optimization and Predictive Maintenance

Visual Network Mapping Your K8s Clusters To Assess Performance

Title Launch Observability at Netflix Scale

Microservices vs. Monolith at a Startup: Making the Choice

Automating Twilio Recording Exports for Quality Purposes: Python Implementation Guidelines

Kubernetes in the Cloud: A Guide to Observability

Netflix’s Distributed Counter Abstraction

Stay Connected