Engineering, Infrastructure and Systems - Technology Performance Pulse

Stress Testing for Resilience in Modern Infrastructure

DZone

DECEMBER 24, 2024

Today, users' expectations of seamless performance mean the system cannot afford downtime or disruption that might turn into losses in revenue and reputation. Therefore, no one can underestimate the role of stress testing in ensuring that the systems are resilient against unfortunate events and failures.

Infrastructure

Infrastructure Testing Innovation Engineering

Sustainability: Thoughts from a software engineer

Dynatrace

MARCH 17, 2025

How to achieve sustainable IT practices Use observability tools The first step in driving improvements is to obtain a comprehensive view of your IT infrastructure’s climate impact. Scale to zero Scaling systems to match current demand prevents underutilized machines from consuming significant energy while idling.

Software Engineering

Software Engineering Engineering Software Software

Build systems more reliably with Dynatrace: Chaos Engineering

Dynatrace

AUGUST 21, 2024

These releases often assumed ideal conditions such as zero latency, infinite bandwidth, and no network loss, as highlighted in Peter Deutsch’s eight fallacies of distributed systems. Chaos engineering is a practice that extends beyond traditional failure testing by identifying unpredictable issues.

Engineering

Engineering Systems Latency Metrics

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Dynatrace

NOVEMBER 22, 2022

The nirvana state of system uptime at peak loads is known as “five-nines availability.” In its pursuit, IT teams hover over system performance dashboards hoping their preparations will deliver five nines—or even four nines—availability. How can IT teams deliver system availability under peak loads that will satisfy customers?

Infrastructure

Infrastructure Availability Systems Retail

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

a Netflix member via Twitter This is an example of a question our on-call engineers need to answer to help resolve a member issue?—?which which is difficult when troubleshooting distributed systems. Now let’s look at how we designed the tracing infrastructure that powers Edgar.

Infrastructure

Infrastructure Transportation Storage Open Source

Cloud infrastructure monitoring in action: Dynatrace on Dynatrace

Dynatrace

SEPTEMBER 29, 2020

On one hand, they enable our engineers to get their latest enhancements deployed into production. It was on August 25 th at 14:00 when Davis initially alerted on a disk write latency issues to Elastic File System (EFS) on one of our EC2 instances in AWS’s Sydney Data Center. Sydney, we have a disk write latency problem!

Infrastructure

Infrastructure Cloud Monitoring AWS

The keys to selecting a platform for end-to-end observability

Dynatrace

DECEMBER 2, 2024

On average, organizations use 10 different tools to monitor applications, infrastructure, and user experiences across these environments. Such fragmented approaches fall short of giving teams the insights they need to run IT and site reliability engineering operations effectively.

Artificial Intelligence

Artificial Intelligence DevOps Architecture Cloud

Unlock the Power of DevSecOps with Newly Released Kubernetes Experience for Platform Engineering

Dynatrace

NOVEMBER 7, 2023

Platform engineering is on the rise. According to leading analyst firm Gartner, “80% of software engineering organizations will establish platform teams as internal providers of reusable services, components, and tools for application delivery…” by 2026. Automation, automation, automation.

Engineering

Engineering DevOps Best Practices Infrastructure

How observability, application security, and AI enhance DevOps and platform engineering maturity

Dynatrace

APRIL 18, 2024

DevOps and platform engineering are essential disciplines that provide immense value in the realm of cloud-native technology and software delivery. Observability of applications and infrastructure serves as a critical foundation for DevOps and platform engineering, offering a comprehensive view into system performance and behavior.

DevOps

DevOps Engineering Artificial Intelligence Infrastructure

A Kubernetes platform engineering strategy tames Kubernetes complexity

Dynatrace

JULY 25, 2024

In fact, 76% of technology leaders say the dynamic nature of Kubernetes makes it more difficult to maintain visibility of their infrastructure compared with traditional technology stacks. “Our development teams relied heavily on logs to understand what was going on with our systems,” he said. billion. .

Strategy

Strategy Engineering Open Source Java

Infrastructure Monitoring tools: 3 steps to evolve ITOps into AIOps

Dynatrace

JUNE 28, 2021

Infrastructure monitoring is the process of collecting critical data about your IT environment, including information about availability, performance and resource efficiency. Many organizations respond by adding a proliferation of infrastructure monitoring tools, which in many cases, just adds to the noise. Dynatrace news.

Infrastructure

Infrastructure Monitoring Artificial Intelligence Open Source

Site reliability engineering: 5 things you need to know

Dynatrace

FEBRUARY 4, 2021

What is site reliability engineering? Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. Dynatrace news. SRE bridges the gap between Dev and Ops teams.

Engineering

Engineering DevOps Government Latency

AI-powered infrastructure monitoring for your SAP HANA database (Preview)

Dynatrace

DECEMBER 9, 2020

If you’re running SAP, you’re likely already familiar with the HANA relational database management system. However, if you’re an operations engineer who’s been tasked with migrating to HANA from a legacy database system, you’ll need to get up to speed quickly.

Infrastructure

Infrastructure Database Monitoring Metrics

Key Elements of Site Reliability Engineering (SRE)

DZone

MARCH 14, 2023

Site Reliability Engineering (SRE) is a systematic and data-driven approach to improving the reliability, scalability, and efficiency of systems. It combines principles of software engineering, operations, and quality assurance to ensure that systems meet performance goals and business objectives.

Engineering

Engineering Software Engineering Scalability Efficiency

Dynatrace joins the Microsoft Intelligent Security Association

Dynatrace

NOVEMBER 20, 2024

This rising risk amplifies the need for reliable security solutions that integrate with existing systems. This latest integration with Microsoft Sentinel expands our partnership, providing joint customers with a holistic view of their entire cloud environment; from application to infrastructure, data, and security. “As

Best Practices

Best Practices Innovation Azure Cloud

Vulnerability assessment: key to protecting applications and infrastructure

Dynatrace

OCTOBER 13, 2021

Protecting IT infrastructure, applications, and data requires that you understand security weaknesses attackers can exploit. Vulnerability assessment is the process of identifying, quantifying, and prioritizing the cybersecurity vulnerabilities in a given IT system. Dynatrace news. Identify vulnerabilities. Assess risk.

Infrastructure

Infrastructure Open Source Virtualization Operating System

DevOps engineer tools: Deploy, test, evaluate, repeat

Dynatrace

DECEMBER 8, 2022

As cloud-native, distributed architectures proliferate, the need for DevOps technologies and DevOps platform engineers has increased as well. DevOps engineer tools can help ease the pressure as environment complexity grows. ” What does a DevOps platform engineer do? .” What are DevOps engineer tools and platforms.

DevOps

DevOps Engineering Testing Open Source

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Berg , Romain Cledat , Kayla Seeley , Shashank Srikanth , Chaoying Wang , Darin Yu Netflix uses data science and machine learning across all facets of the company, powering a wide range of business applications from our internal infrastructure and content demand modeling to media understanding.

Systems

Systems Media Cache Open Source

The platform engineer role: A game-changer or just hype?

Dynatrace

SEPTEMBER 21, 2023

Site reliability engineering first emerged to address cloud computing’s new performance needs. Today, the platform engineer role is gaining speed as the newest byproduct of scaling DevOps in the emerging but complex cloud-native world. Understanding the platform engineer role DevOps is a constantly evolving discipline.

Games

Games Engineering DevOps Education

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

Dynatrace

JUNE 29, 2022

More than 90% of enterprises now rely on a hybrid cloud infrastructure to deliver innovative digital services and capture new markets. That’s because cloud platforms offer flexibility and extensibility for an organization’s existing infrastructure. Dynatrace news. With public clouds, multiple organizations share resources.

Infrastructure

Infrastructure Cloud Azure AWS

Site reliability engineering: 5 things to you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. Dynatrace news. SRE bridges the gap between Dev and Ops teams. SRE focuses on automation.

Engineering

Engineering DevOps Government Latency

Who will watch the watchers? Extended infrastructure observability for WSO2 API Manager

Dynatrace

SEPTEMBER 18, 2020

Sure, cloud infrastructure requires comprehensive performance visibility, as Dynatrace provides , but the services that leverage cloud infrastructures also require close attention. Extend infrastructure observability to WSO2 API Manager. Cloud-based application architectures commonly leverage microservices. What’s next?

Infrastructure

Infrastructure Latency Metrics Cloud

Power Dashboarding, Part I: Start your exploration journey with Dashboards

Dynatrace

FEBRUARY 6, 2025

With Dashboards , you can monitor business performance, user interactions, security vulnerabilities, IT infrastructure health, and so much more, all in real time. Even if infrastructure metrics aren’t your thing, you’re welcome to join us on this creative journey simply swap out the suggested metrics for ones that interest you.

Metrics

Metrics Infrastructure Monitoring Best Practices

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing One approach to such a challenging scenario is stream processing, a computing paradigm and software architectural style for data-intensive software systems that emerged to cope with requirements for near real-time processing of massive amounts of data. We designed experimental scenarios inspired by chaos engineering.

Engineering

Engineering Tuning Latency Open Source

Path to NoOps part 2: How infrastructure as code makes cloud automation attainable—and repeatable—at scale

Dynatrace

NOVEMBER 29, 2022

Infrastructure as code is a way to automate infrastructure provisioning and management. In this blog, I explore how Dynatrace has made cloud automation attainable—and repeatable—at scale by embracing the principles of infrastructure as code. Infrastructure-as-code. But how does it work in practice?

Infrastructure

Infrastructure Code Cloud DevOps

Demystifying Interviewing for Backend Engineers @ Netflix

The Netflix TechBlog

FEBRUARY 1, 2022

By Karen Casella, Director of Engineering, Access & Identity Management Have you ever experienced one of the following scenarios while looking for your next role? Most backend engineering teams follow a process very similar to what is shown below. If so, we invite you to begin the interview process.

Engineering

Engineering Games Entertainment Innovation

Enhancing Kubernetes cluster management key to platform engineering success

Dynatrace

MARCH 29, 2024

As organizations continue to modernize their technology stacks, many turn to Kubernetes , an open source container orchestration system for automating software deployment, scaling, and management. Five of the most common include cluster instability, resource and cost management, security, observability, and stress on engineering teams.

Engineering

Engineering DevOps Operating System Cloud

New analytics capabilities for messaging system-related anomalies

Dynatrace

JANUARY 12, 2022

Messaging systems can significantly improve the reliability, performance, and scalability of the communication processes between applications and services. In serverless and microservices architectures, messaging systems are often used to build asynchronous service-to-service communication. Dynatrace news. This is great!

Analytics

Analytics Systems DevOps Healthcare

Dynatrace extends Synthetic Monitoring capabilities with Network Availability Monitors to validate the availability of infrastructure and services

Dynatrace

JULY 15, 2024

Combined with Dynatrace OneAgent ® , you gain a precise view of the status of your systems at a glance. Whether necessary as part of deep root-cause analyses of issues faced by your users that impact your business or if you’re an engineer responsible for the infrastructure hosting your applications and network paths.

Availability

Availability Network Monitoring Infrastructure

OpenPipeline: Simplify access to critical business data

Dynatrace

NOVEMBER 4, 2024

There’s a goldmine of business data traversing your IT systems, yet most of it remains untapped. Other data sources, including APIs and log files — are used to expand access, often to external or proprietary systems. In fact, it’s likely that some of your critical business systems already write business data to log files.

Analytics

Analytics Airlines Metrics Monitoring

Site Reliability Engineering

DZone

JANUARY 19, 2024

In the dynamic world of online services, the concept of site reliability engineering (SRE) has risen as a pivotal discipline, ensuring that large-scale systems maintain their performance and reliability.

Engineering

Engineering Tuning Software Engineering Internet

How Netflix Content Engineering makes a federated graph searchable

The Netflix TechBlog

APRIL 12, 2022

By Alex Hutter , Falguni Jhaveri and Senthil Sayeebaba Over the past few years Content Engineering at Netflix has been transitioning many of its services to use a federated GraphQL platform. it began to power a significant portion of the user experience for many applications within Content Engineering.

Engineering

Engineering Architecture Java Infrastructure

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly

MARCH 25, 2025

The system is inconsistent, slow, hallucinatingand that amazing demo starts collecting digital dust. Two big things: They bring the messiness of the real world into your system through unstructured data. When your system is both ingesting messy real-world data AND producing nondeterministic outputs, you need a different approach.

Systems

Systems Development Tuning Monitoring

Reliability indicators that matter to your business: SLOs for all data types

Dynatrace

OCTOBER 31, 2024

It doesn’t matter if you need typically used failure-rate or response-time metrics to ensure your system’s availability and performance or if you need to rely on abnormal log drops to gain insights into raising problems—SLOs leveraged with Grail provide all the information you need.

Metrics

Metrics Availability Monitoring Scalability

Introducing Configurable Metaflow

The Netflix TechBlog

DECEMBER 19, 2024

Many of these projects are under constant development by dedicated teams with their own business goals and development best practices, such as the system that supports our content decision makers , or the system that ranks which language subtitles are most valuable for a specific piece ofcontent. cluster=sandbox, workflow.id=demo.branch_demox.EXP_01.training

Best Practices

Best Practices Cache Metrics Code

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

To achieve this, we are committed to building robust systems that deliver comprehensive observability, enabling us to take full accountability for every title on ourservice. Each title represents countless hours of effort and creativity, and our systems need to honor that uniqueness. Yet, these pages couldnt be more different.

Traffic

Traffic Scalability Strategy Monitoring

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

Dynatrace

DECEMBER 18, 2023

For busy site reliability engineers, ensuring system reliability, scalability, and overall health is an imperative that’s getting harder to achieve in ever-expanding, cloud-native, container-based environments. Because of its adaptability, Prometheus has become an essential tool for observability engineering.

Metrics

Metrics Engineering Energy Tuning

How to Prepare for Your DevOps Interview

DZone

SEPTEMBER 5, 2019

Over the past decade, DevOps has emerged as a new tech culture and career that marries the rapid iteration desired by software development with the rock-solid stability of the infrastructure operations team. As of August 2019, there are currently over 50,000 LinkedIn DevOps job listings in the United States alone.

DevOps

DevOps Software Engineering Infrastructure Engineering

Achieving High Availability in CI/CD With Observability

DZone

MARCH 5, 2024

Forbes estimates that cloud budgets will break all previous records as businesses will spend over $1 trillion on cloud computing infrastructure in 2024. By integrating observability tools in CI/CD pipelines, organizations can increase deployment frequency, minimize risks, and build highly available systems.

Availability

Availability DevOps Infrastructure Scalability

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Findings provide insights into Kubernetes practitioners’ infrastructure preferences and how they use advanced Kubernetes platform technologies. As Kubernetes adoption increases and it continues to advance technologically, Kubernetes has emerged as the “operating system” of the cloud. Kubernetes moved to the cloud in 2022.

Open Source

Open Source Java Operating System Programming

Ensuring the Successful Launch of Ads on Netflix

The Netflix TechBlog

JUNE 1, 2023

This tier extended existing infrastructure by adding new backend components and a new remote call to our ads partner on the playback path. Replay traffic enabled us to test our new systems and algorithms at scale before launch, while also making the traffic as realistic as possible.

Traffic

Traffic Best Practices Systems Testing

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Dynatrace

JANUARY 21, 2025

Ensuring smooth operations is no small feat, whether you’re in charge of application performance, IT infrastructure, or business processes. Forecasting can identify potential anomalies in node performance, helping to prevent issues before they impact the system. This ensures optimal resource utilization and cost efficiency.

Traffic

Traffic Metrics Analytics Monitoring

Automate digital excellence with Dynatrace Synthetic Monitoring and Workflows

Dynatrace

JULY 18, 2024

Navigate digital infrastructure complexity In today’s rapidly evolving digital environment, organizations face increasing pressure from customers and competitors to deliver faster, more secure innovations. The effectiveness of this automation relies on the quality of the underlying data.

Monitoring

Monitoring DevOps Infrastructure Games

Mastering Kubernetes with Dynatrace

Dynatrace

AUGUST 24, 2020

To make this possible, the application code should be instrumented with telemetry data for deep insights, including: Metrics to find out how the behavior of a system has changed over time. Traces help find the flow of a request through a distributed system. Logs represent event data in plain-text, structured or binary format.

Analytics

Analytics Infrastructure AWS Operating System

Stress Testing for Resilience in Modern Infrastructure

Sustainability: Thoughts from a software engineer

Trending Sources

Build systems more reliably with Dynatrace: Chaos Engineering

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Building Netflix’s Distributed Tracing Infrastructure

Cloud infrastructure monitoring in action: Dynatrace on Dynatrace

The keys to selecting a platform for end-to-end observability

Unlock the Power of DevSecOps with Newly Released Kubernetes Experience for Platform Engineering

How observability, application security, and AI enhance DevOps and platform engineering maturity

A Kubernetes platform engineering strategy tames Kubernetes complexity

Infrastructure Monitoring tools: 3 steps to evolve ITOps into AIOps

Site reliability engineering: 5 things you need to know

AI-powered infrastructure monitoring for your SAP HANA database (Preview)

Key Elements of Site Reliability Engineering (SRE)

Dynatrace joins the Microsoft Intelligent Security Association

Vulnerability assessment: key to protecting applications and infrastructure

DevOps engineer tools: Deploy, test, evaluate, repeat

Supporting Diverse ML Systems at Netflix

The platform engineer role: A game-changer or just hype?

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

Site reliability engineering: 5 things to you need to know

Who will watch the watchers? Extended infrastructure observability for WSO2 API Manager

Power Dashboarding, Part I: Start your exploration journey with Dashboards

Why applying chaos engineering to data-intensive applications matters

Path to NoOps part 2: How infrastructure as code makes cloud automation attainable—and repeatable—at scale

Demystifying Interviewing for Backend Engineers @ Netflix

Enhancing Kubernetes cluster management key to platform engineering success

New analytics capabilities for messaging system-related anomalies

Dynatrace extends Synthetic Monitoring capabilities with Network Availability Monitors to validate the availability of infrastructure and services

OpenPipeline: Simplify access to critical business data

Site Reliability Engineering

How Netflix Content Engineering makes a federated graph searchable

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Reliability indicators that matter to your business: SLOs for all data types

Introducing Configurable Metaflow

Title Launch Observability at Netflix Scale

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

How to Prepare for Your DevOps Interview

Achieving High Availability in CI/CD With Observability

Kubernetes in the wild report 2023

Ensuring the Successful Launch of Ads on Netflix

Better dashboarding with Dynatrace Davis AI: Instant meaningful insights

Automate digital excellence with Dynatrace Synthetic Monitoring and Workflows

Mastering Kubernetes with Dynatrace

Stay Connected