Design, Efficiency and Systems - Technology Performance Pulse

Designing a Scalable and Fault-Tolerant Messaging System for Distributed Applications

DZone

JANUARY 26, 2024

Building a strong messaging system is critical in the world of distributed systems for seamless communication between multiple components. A messaging system serves as a backbone, allowing information transmission between different services or modules in a distributed architecture.

Scalability

Scalability Design Systems Architecture

Hawkins: Diving into the Reasoning Behind our Design System

The Netflix TechBlog

FEBRUARY 10, 2021

Stranger Things imagery showcasing the inspiration for the Hawkins Design System by Hawkins team member Joshua Godi ; with art contributions by Wiki Chaves Hawkins may be the name of a fictional town in Indiana, most widely known as the backdrop for one of Netflix’s most popular TV series “Stranger Things,” but the name is so much more.

Design

Design Systems Engineering Entertainment

Energy Efficient Distributed Systems

DZone

DECEMBER 18, 2023

Energy efficiency has become a paramount concern in the design and operation of distributed systems due to the increasing demand for sustainable and environmentally friendly computing solutions.

Energy

Energy Efficiency Systems IoT

Strategies for Building Self-Healing Software Systems

DZone

JUNE 20, 2024

In the vast realm of software development, there's a pursuit for software systems that are not only robust and efficient but can also "heal" themselves. Self-healing software systems represent a significant stride towards automation and resilience. 4 Key Strategies for Building Self-Healing Software Systems 1.

Strategy

Strategy Systems Software Software

Efficient Message Distribution Using AWS SNS Fanout

DZone

FEBRUARY 29, 2024

In the world of cloud computing and event-driven applications, efficiency and flexibility are absolute necessities. A smooth flow of messages in an event-driven application is the key to its performance and efficiency. A critical component of such an application is message distribution.

Efficiency

Efficiency AWS Scalability Architecture

Security by design enhanced by unified observability and security

Dynatrace

OCTOBER 23, 2023

At financial services company, Soldo, efficiency and security by design are paramount goals. Since 2015, the Soldo business spend management platform has provided companies with a simple and efficient way to better spend and control company money. What is security by design?

Design

Design Innovation DevOps Open Source

Dynatrace EdgeConnect securely connects your local systems to Dynatrace SaaS

Dynatrace

OCTOBER 10, 2023

EdgeConnect provides a secure bridge for SaaS-heavy companies like Dynatrace, which hosts numerous systems and data behind VPNs. EdgeConnect facilitates seamless interaction, ensuring data security and operational efficiency. Efficiency and control EdgeConnect boasts a range of features designed for efficiency and control.

Systems

Systems Efficiency Internet Internet

Architectural Insights: Designing Efficient Multi-Layered Caching With Instagram Example

DZone

FEBRUARY 27, 2024

This article will explore the concept of multi-layered caching from both architectural and development perspectives, focusing on real-world applications like Instagram, and provide insights into designing and implementing an efficient multi-layered cache system.

Cache

Cache Efficiency Architecture Design

A Step-by-Step Guide to Write a System Design Document

DZone

FEBRUARY 26, 2025

Have you ever wondered how large-scale systems handle millions of requests seamlessly while ensuring speed, reliability, and scalability? Behind every high-performing application whether its a search engine, an e-commerce platform, or a real-time messaging service lies a well-thought-out system design.

Design

Design Systems Scalability Speed

Efficient SLO event integration powers successful AIOps

Dynatrace

APRIL 5, 2024

Details of the root cause The developer deems it appropriate to either exclude or designate this error as acceptable during the patch release to prevent being overwhelmed with false positive alerts. Conclusion An effective Service Level Objective (SLO) holds more value than numerous alerts, reducing unnecessary noise in monitoring systems.

Efficiency

Efficiency Traffic Tuning Metrics

Real-Time Operating Systems (RTOS) in Embedded Systems

DZone

FEBRUARY 26, 2024

Embedded systems have become an integral part of our daily lives, from smartphones and home appliances to medical devices and industrial machinery. These systems are designed to perform specific tasks efficiently, often in real-time, without the complexities of a general-purpose computer.

Operating System

Operating System Systems Automotive Design

API Design Principles for Optimal Performance and Scalability

DZone

JUNE 22, 2023

It involves a combination of techniques and best practices aimed at reducing latency, improving user experience, and increasing the overall efficiency of the system. API performance optimization is the process of improving the speed, scalability, and reliability of APIs.

Scalability

Scalability Design Best Practices Performance

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems. ETL workflows), as well as downstream (e.g.

Systems

Systems Media Cache Open Source

Build resilient IT systems and manage regulatory requirements with compliance and resilience capabilities from Dynatrace

Dynatrace

JANUARY 15, 2025

They offer a comprehensive end-to-end solution to these challenges, providing functionalities designed to enhance compliance and resilience in IT environments. Understand the complexity of IT systems in real time Dynatrace helps you comprehensively map the entire IT environment in real time.

Systems

Systems DevOps Analytics Monitoring

Catching up with OpenTelemetry in 2025

Dynatrace

FEBRUARY 27, 2025

In fact, observability is essential for shaping how we design smarter, more resilient systems for the future. As an open-source project, OpenTelemetry sets standards for telemetry data sets and works with a wide range of systems and platforms to collect and export telemetry data to backend systems.

Tuning

Tuning Open Source Innovation Monitoring

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

The Netflix TechBlog

SEPTEMBER 29, 2022

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support for Non-Parallelizable Workloads by Kostas Christidis Introduction Timestone is a high-throughput, low-latency priority queueing system we built in-house to support the needs of Cosmos , our media encoding platform. Over the past 2.5

Latency

Latency Systems Serverless Media

Key Advantages of DBMS for Efficient Data Management

Scalegrid

JANUARY 5, 2024

Enhanced data security, better data integrity, and efficient access to information. If you’re considering a database management system, understanding these benefits is crucial. Understanding Database Management Systems (DBMS) A Database Management System (DBMS) assists users in creating and managing databases.

Efficiency

Efficiency Storage Database Scalability

IoT Device Management: Streamlining Connectivity in a Connected World

DZone

OCTOBER 10, 2023

Enter IoT device management — the suite of tools and practices designed to monitor, maintain, and update these interconnected devices. Efficient device management allows organizations to handle this vast network without hitches. As these devices multiply, so does the complexity of managing them.

IoT

IoT Internet Internet Network

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

It’s architecture was specially designed to manage large-scale data warehouses and business intelligence workloads by giving you the ability to spread your data out across a multitude of servers. Greenplum uses an MPP database design that can help you develop a scalable, high performance deployment. Greenplum Architectural Design.

Big Data

Big Data Database Artificial Intelligence Open Source

What is DevSecOps?

Dynatrace

JANUARY 26, 2021

This includes custom, built-in-house apps designed for a single, specific purpose, API-driven connections that bridge the gap between legacy systems and new services, and innovative apps that leverage open-source code to streamline processes. Each has its own role to play in successfully implementing this tactical trifecta at scale.

DevOps

DevOps Best Practices Open Source Tuning

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. This guide delves into how these systems work, the challenges they solve, and their essential role in businesses and technology.

Storage

Storage Systems Big Data Azure

Google Cloud Next 2024: AI innovation for Google Cloud

Dynatrace

MARCH 29, 2024

AI innovation elevates efficiency and performance of Google Cloud AI adoption is increasingly critical for any organization. Learn to boost system reliability through proactive issue detection. To compete, it’s critical for organizations to streamline operations, minimize downtime, and drive continuous innovation in the cloud.

Google

Google Innovation Cloud Analytics

Top Ten Lightweight Linux Distributions

DZone

AUGUST 7, 2023

Linux is a popular open-source operating system that offers various distributions to suit every need. Lightweight Linux distributions are specifically designed to be low-resource, efficient, and quick, offering users a smooth and responsive user experience.

Hardware

Hardware Open Source Operating System Efficiency

Engineering dependability and fault tolerance in a distributed system

High Scalability

FEBRUARY 19, 2021

In this article, we discuss the concepts of dependability and fault tolerance in detail and explain how the Ably platform is designed with fault tolerant approaches to uphold its dependability guarantees. Fault tolerant design approaches address these shortfalls to provide continuity both to business and to the user experience.

Engineering

Engineering Systems Availability Scalability

Building Resiliency With Effective Error Management

DZone

JANUARY 23, 2022

Building resilient systems requires comprehensive error management. Errors could occur in any part of the system / or its ecosystem and there are different ways of handling these e.g. Datacenter - data center failure where the whole DC could become unavailable due to power failure, network connectivity failure, environmental catastrophe, etc.

Hardware

Hardware DevOps Network Storage

Optimizing InfiniBand Bandwidth Utilization for NVIDIA DGX Systems Using Software RAID Solutions

DZone

JUNE 24, 2024

Traditional enterprise storage or HPC-focused parallel file systems are costly and challenging to manage for AI-scale deployments. High-performance storage systems can significantly reduce AI model training time. Delays in data access can also impact AI model accuracy, highlighting the critical role of storage performance.

Systems

Systems Software Software Storage

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

In this post, we dive deep into how Netflix’s KV abstraction works, the architectural principles guiding its design, the challenges we faced in scaling diverse use cases, and the technical innovations that have allowed us to achieve the performance and reliability required by Netflix’s global operations.

Latency

Latency Storage Cache Efficiency

Building Scalable Software Solutions for Display Manufacturing Automation

DZone

JUNE 25, 2024

Scalable software architectures are the backbone of efficient and flexible production lines, enabling manufacturers to meet the increasing demands for innovative display technologies. As display manufacturing continues to evolve, the demand for scalable software solutions to support automation has become more critical than ever.

Scalability

Scalability Software Software Software Architecture

Apollo Router Performance Monitoring with OpenTelemetry and Splunk APM

DZone

APRIL 13, 2023

The Apollo router is a powerful routing solution designed to replace the GraphQL Gateway. This self-hosted graph routing solution is highly configurable, making it an ideal choice for developers who require a high-performance routing system.

Monitoring

Monitoring Performance Traffic Efficiency

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

This is a set of best practices and guidelines that help you design and operate reliable, secure, efficient, cost-effective, and sustainable systems in the cloud. The framework comprises six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability.

AWS

AWS Efficiency Azure Cloud

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

OpenPipeline: Simplify access to critical business data

Dynatrace

NOVEMBER 4, 2024

There’s a goldmine of business data traversing your IT systems, yet most of it remains untapped. Business events: Delivering the best data It’s been two years since we introduced business events , a special class of events designed to support even the most demanding business use cases. Easy to access.

Analytics

Analytics Airlines Metrics Monitoring

Dynatrace observability now available for Red Hat OpenShift on IBM Z and LinuxONE mainframes

Dynatrace

JULY 24, 2024

IBM Z and LinuxONE mainframes running the Linux operating system enable you to respond faster to business demands, protect data from core to cloud, and streamline insights and automation. Dynatrace is designed to scale easily across the entire Kubernetes stack.

Availability

Availability Infrastructure Metrics Hardware

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

Infrastructure

Infrastructure Big Data Transportation Architecture

From syslog to AWS Firehose: Dynatrace log management innovations that enhance observability

Dynatrace

SEPTEMBER 5, 2024

These developments open up new use cases, allowing Dynatrace customers to harness even more data for comprehensive AI-driven insights, faster troubleshooting, and improved operational efficiency. Native support for syslog messages extends our infrastructure log support to all Linux/Unix systems and network devices.

Innovation

Innovation AWS Analytics Storage

Seven benefits of AIOps to transform your business operations

Dynatrace

JULY 5, 2022

In fact, according to a Forrester Consulting report , implementing an AIOps approach that provides proactive visibility helped companies improve operational efficiency and reduce false-positive alerts by 95%. Like the development and design phases, these applications generate massive data volumes that offer relevant and actionable insights.

Artificial Intelligence

Artificial Intelligence Cloud Innovation Strategy

AWS serverless services: Exploring your options

Dynatrace

OCTOBER 7, 2021

This means you no longer have to provision, scale, and maintain servers to run your applications, databases, and storage systems. Instead of worrying about infrastructure management functions, such as capacity provisioning and hardware maintenance, teams can focus on application design, deployment, and delivery. Compute services.

Serverless

Serverless AWS Lambda Storage

Six causes of major software outages–And how to avoid them

Dynatrace

AUGUST 8, 2024

Ransomware encrypts essential data, locking users out of systems and halting operations until a ransom is paid. Remote code execution (RCE) vulnerabilities, such as the Log4Shell incident in 2021, allow attackers to run malicious code on a remote system without requiring authentication or user interaction.

Software

Software Software Infrastructure Network

Black Hat 2024: Observability for DevSecOps and scaled security posture management

Dynatrace

JULY 29, 2024

AI is also crucial for securing data privacy, as it can more efficiently detect patterns, anomalies, and indicators of compromise. From the Log4Shell attack in 2021 to the recent OpenSSH vulnerability in July, organizations have been struggling to maintain secure, compliant systems amidst a broadened attack surface.

Analytics

Analytics Government DevOps Efficiency

Privacy Spotlight: Easily comply with data subject rights in Dynatrace

Dynatrace

MAY 2, 2024

Rising consumer expectations for transparency and control over their data, combined with increasing data volumes, contribute to the importance of swift and efficient management of privacy rights requests. 2] — Nader Henein, VP Analyst, Gartner The Privacy Rights app is designed to streamline this process in Dynatrace.

Tuning

Tuning Scalability Efficiency Processing

7 Best Performance Testing Tools to Look Out for in 2021

DZone

DECEMBER 28, 2020

The system could work efficiently with a specific number of concurrent users; however, it may get dysfunctional with extra loads during peak traffic. It is almost a part of the wider performance engineering portrait, concentrating on performance glitches in the architecture and design of any software.

Performance Testing

Performance Testing Testing Tools Testing Performance

Service Mesh and Management Practices in Microservices

DZone

OCTOBER 27, 2023

In the dynamic world of microservices architecture, efficient service communication is the linchpin that keeps the system running smoothly. This dedicated infrastructure layer is designed to cater to service-to-service communication, offering essential features like load balancing, security, monitoring, and resilience.

Traffic

Traffic Best Practices Architecture Network

Dynatrace adds monitoring support for Microsoft Azure Kubernetes Service deployments using Azure Linux container host

Dynatrace

MAY 24, 2023

Dynatrace is proud to provide deep monitoring support for Azure Linux as a container host operating system (OS) platform for Azure Kubernetes Services (AKS) to enable customers to operate efficiently and innovate faster. Microsoft initially designed the OS for internal use to develop and manage Azure services.

Azure

Azure Monitoring Operating System Virtualization

What is observability? Not just logs, metrics and traces

Dynatrace

OCTOBER 1, 2021

As dynamic systems architectures increase in complexity and scale, IT teams face mounting pressure to track and respond to conditions and issues across their multi-cloud environments. An advanced observability solution can also be used to automate more processes, increasing efficiency and innovation among Ops and Apps teams.

Metrics

Metrics Open Source Monitoring Cloud

Designing a Scalable and Fault-Tolerant Messaging System for Distributed Applications

Hawkins: Diving into the Reasoning Behind our Design System

Trending Sources

Energy Efficient Distributed Systems

Strategies for Building Self-Healing Software Systems

Efficient Message Distribution Using AWS SNS Fanout

Security by design enhanced by unified observability and security

Dynatrace EdgeConnect securely connects your local systems to Dynatrace SaaS

Architectural Insights: Designing Efficient Multi-Layered Caching With Instagram Example

A Step-by-Step Guide to Write a System Design Document

Efficient SLO event integration powers successful AIOps

Real-Time Operating Systems (RTOS) in Embedded Systems

API Design Principles for Optimal Performance and Scalability

Supporting Diverse ML Systems at Netflix

Build resilient IT systems and manage regulatory requirements with compliance and resilience capabilities from Dynatrace

Catching up with OpenTelemetry in 2025

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

Key Advantages of DBMS for Efficient Data Management

IoT Device Management: Streamlining Connectivity in a Connected World

What is Greenplum Database? Intro to the Big Data Database

What is DevSecOps?

What is a Distributed Storage System

Google Cloud Next 2024: AI innovation for Google Cloud

Top Ten Lightweight Linux Distributions

Engineering dependability and fault tolerance in a distributed system

Building Resiliency With Effective Error Management

Optimizing InfiniBand Bandwidth Utilization for NVIDIA DGX Systems Using Software RAID Solutions

Introducing Netflix’s Key-Value Data Abstraction Layer

Building Scalable Software Solutions for Display Manufacturing Automation

Apollo Router Performance Monitoring with OpenTelemetry and Splunk APM

Implementing AWS well-architected pillars with automated workflows

Netflix’s Distributed Counter Abstraction

OpenPipeline: Simplify access to critical business data

Dynatrace observability now available for Red Hat OpenShift on IBM Z and LinuxONE mainframes

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

From syslog to AWS Firehose: Dynatrace log management innovations that enhance observability

Seven benefits of AIOps to transform your business operations

AWS serverless services: Exploring your options

Six causes of major software outages–And how to avoid them

Black Hat 2024: Observability for DevSecOps and scaled security posture management

Privacy Spotlight: Easily comply with data subject rights in Dynatrace

7 Best Performance Testing Tools to Look Out for in 2021

Service Mesh and Management Practices in Microservices

Dynatrace adds monitoring support for Microsoft Azure Kubernetes Service deployments using Azure Linux container host

What is observability? Not just logs, metrics and traces

Stay Connected