Design, Engineering and Infrastructure - Technology Performance Pulse

Supercharge your end-to-end infrastructure and operations observability experience

Dynatrace

OCTOBER 17, 2024

Dynatrace introduced numerous powerful features to its Infrastructure & Operations app, addressing the emerging requirement for enhanced end-to-end infrastructure observability. These enhancements are designed to empower IT operations and SRE teams with more comprehensive visibility and increased efficiency at any time.

Infrastructure

Infrastructure Network DevOps Metrics

Part 1: A Survey of Analytics Engineering Work at Netflix

The Netflix TechBlog

DECEMBER 17, 2024

This article is the first in a multi-part series sharing a breadth of Analytics Engineering work at Netflix, recently presented as part of our annual internal Analytics Engineering conference. Subsequent posts will detail examples of exciting analytic engineering domain applications and aspects of the technical craft.

Analytics

Analytics Engineering Entertainment Metrics

Behind the Streams: Live at Netflix. Part 1

The Netflix TechBlog

JULY 15, 2025

What began with an engineering plan to pave the path towards our first Live comedy special, Chris Rock: Selective Outrage , has since led to hundreds of Live events ranging from the biggest comedy shows and NFL Christmas Games to record-breaking boxing fights and becoming the home of WWE.

Entertainment

Entertainment Traffic AWS Latency

Sustainability: Thoughts from a software engineer

Dynatrace

MARCH 17, 2025

How to achieve sustainable IT practices Use observability tools The first step in driving improvements is to obtain a comprehensive view of your IT infrastructure’s climate impact. Platform engineers can set defaults for development teams, such as the number of replicas a service should have or whether it scales automatically.

Software Engineering

Software Engineering Engineering Software Software

Netflix Tudum Architecture: from CQRS with Kafka to CQRS with RAW Hollow

The Netflix TechBlog

JULY 10, 2025

Attracting over 20 million members each month, Tudum is designed to enrich the viewing experience by offering additional context and insights into the content available on Netflix. Client applications like web, mobile, and TV devices, act as rendering engines for SDUI data.

Architecture

Architecture Cache Latency Database

Analyze query performance: The next level of database performance optimization

Dynatrace

NOVEMBER 4, 2024

While infrastructure-level monitoring provides valuable insights, it might not reveal the root causes of database-related slowdowns. Query execution plans offer even more insights into how a database engine executes queries. Query execution plan Execution plans provide a roadmap for how the database engine executes queries.

Database

Database Performance Metrics Monitoring

Dynatrace joins the Microsoft Intelligent Security Association

Dynatrace

NOVEMBER 20, 2024

This latest integration with Microsoft Sentinel expands our partnership, providing joint customers with a holistic view of their entire cloud environment; from application to infrastructure, data, and security. “As The Davis AI engine automatically and continuously delivers actionable insights based on an environment’s current state.

Best Practices

Best Practices Innovation Azure Cloud

Introducing Configurable Metaflow

The Netflix TechBlog

DECEMBER 19, 2024

This has been a guiding design principle with Metaflow since its inception. Subsequent versions of the model will result from experimenting with hyper parameters, tweaking feature engineering, or conducting feature diets. demo.branch_demox.demo_features_f workflows/demo.main.sch.yaml (binding=default): cluster=sandbox, workflow.id=demo.branch_demox.main

Best Practices

Best Practices Cache Metrics Code

Dynatrace delivers Full-Stack Observability for AI with NVIDIA Blackwell and NVIDIA NIM

Dynatrace

MAY 18, 2025

NVIDIA Blackwell systems provide high-performance infrastructure for enterprise AI, and now, thanks to the Dynatrace integration with the NVIDIA Enterprise AI Factory reference design, enterprises can add Dynatrace Full-Stack Observability to NVIDIA Blackwell infrastructure.

Healthcare

Healthcare Monitoring Infrastructure Metrics

Power Dashboarding, Part I: Start your exploration journey with Dashboards

Dynatrace

FEBRUARY 6, 2025

With Dashboards , you can monitor business performance, user interactions, security vulnerabilities, IT infrastructure health, and so much more, all in real time. Even if infrastructure metrics aren’t your thing, you’re welcome to join us on this creative journey simply swap out the suggested metrics for ones that interest you.

Metrics

Metrics Infrastructure Monitoring Best Practices

OpenPipeline: Simplify access to critical business data

Dynatrace

NOVEMBER 4, 2024

Business events: Delivering the best data It’s been two years since we introduced business events , a special class of events designed to support even the most demanding business use cases. Our Business Analytics solution is a prominent beneficiary of this commitment. Business process monitoring and optimization.

Analytics

Analytics Airlines Metrics Monitoring

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

Key insights from this shiftinclude: A Data-Centric Approach : Shifting focus from model-centric strategies, which heavily rely on feature engineering, to a data-centric one. This approach prioritizes the accumulation of large-scale, high-quality data and, where feasible, aims for end-to-end learning.

Tuning

Tuning Efficiency Latency Strategy

Driving Content Delivery Efficiency Through Classifying Cache Misses

The Netflix TechBlog

JULY 2, 2025

The inherent latencies of data traveling across physical links, compounded by Internet infrastructure components like routers and network stacks, can disrupt a seamless viewing experience. Our custom-built servers, known as Open Connect Appliances (OCAs), are designed for both efficiency and cost-effectiveness.

Cache

Cache Efficiency Traffic Latency

Scaling Systems for Travel Tuesday: Surviving Billion-Event Spikes

DZone

JULY 28, 2025

Handling this surge is like facing a self-induced DDoS attack, and the question is: can your infrastructure handle the stampede or will it buckle under pressure? As a seasoned engineer might say, these mega-sale events are the ultimate scalability test. Smart design can ensure your system handles sudden load gracefully:

Systems

Systems Logistics Architecture Strategy

Why Core Web Vitals are crucial for optimizing digital experience

Dynatrace

MAY 22, 2025

A slow-loading page, unexpected layout shifts, or unresponsive interactions can frustrate potential customerscausing higher bounce rates, abandoned carts, and low search engine rankings. To combat these issues, Google introduced Core Web Vitals (CWVs): a set of metrics designed to measure and improve the user experience of websites.

Google

Google Website Metrics Monitoring

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

The Challenge of Title Launch Observability As engineers, were wired to track system metrics like error rates, latencies, and CPU utilizationbut what about metrics that matter to a titlessuccess? How can we design systems that recognize these nuances and empower every title to shine and bring joy to ourmembers?

Traffic

Traffic Scalability Strategy Monitoring

Turning my MCP content into a blog post on Platform Engineering

Adrian Cockcroft

JUNE 13, 2025

There are some crude keyword tags and summaries in the MCP server and eventually I asked Cursor/Claude to “Summarize the content related to platform engineering, including links to videos, medium posts etc.” The Evolution of Platform Engineering The journey from traditional operations to modern platform engineering has been remarkable.

Engineering

Engineering DevOps Serverless AWS

Citus for PostgreSQL: How to Scale Your Database Horizontally

Scalegrid

JULY 25, 2025

Its PostgreSQL-native design ensures compatibility with popular extensions like PostGIS, making it highly versatile for modern use cases. It introduces a comprehensive set of features designed to empower high-performance distributed PostgreSQL: Real-time Sharding: Allows users to shard existing tables with live data and minimal downtime.

Database

Database Azure Analytics Open Source

Pivot the perspective of your investigative queries with Security Investigator

Dynatrace

JUNE 19, 2025

The pivoting queries concept allows engineers to quickly change the investigation context by switching the scope of a query using available pivoting dimensions. It’s designed for evidence-driven security use cases based on the logs, metrics, and traces ingested into the Dynatracer Grail® data lakehouse.

Latency

Latency Speed Cloud Infrastructure

Unlocking sovereignty with Dynatrace and Deloitte

Dynatrace

JUNE 4, 2025

The concept of cloud sovereignty minimizing external dependencies and exerting full control over data, applications, and infrastructure has become a critical business imperative. Embrace sovereignty by design, powered by observability. PurePath is at the heart of the Dynatrace platforms DNA. group of companies.

Healthcare

Healthcare Strategy Cloud Best Practices

Simplify SAP Monitoring with PowerConnect and Dynatrace

Dynatrace

MAY 27, 2025

Seamlessly monitor SAP systems with Dynatrace The latest version of the Dynatrace PowerConnect app for SAP Monitoring gives you a streamlined, intuitive experience thats designed to make SAP observability faster to provision and easier to use. This is achieved by leveraging standard capabilities of the Dynatrace platform to observe SAP.

Monitoring

Monitoring Analytics Metrics Cloud

Presentation: Beyond Durability: Database Resilience and Entropy Reduction with Write-Ahead Logging at Netflix

InfoQ

JUNE 26, 2025

They discuss how WAL addresses critical challenges like data loss, corruption, multi-partition mutations, and replication, showcasing its architecture and the strategic trade-offs considered for a resilient data infrastructure at Netflix's immense scale. By Prudhviraj Karumanchi, Vidhya Arvind

Database

Database Architecture Infrastructure Systems

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

The architecture of RabbitMQ is meticulously designed for complex message routing, enabling dynamic and flexible interactions between producers and consumers. Configuring Quorum Queues Quorum queues in RabbitMQ are designed to maintain functionality as long as most replicas are operational.

Best Practices

Best Practices Traffic Strategy Scalability

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly

MARCH 25, 2025

This creates a whole new set of challenges that traditional software development approaches simply weren’t designed to handle. With the advent of generative AI, therell be significant opportunities for product managers, designers, executives, and more traditional software engineers to contribute to and build AI-powered software.

Systems

Systems Development Tuning Software

QCon London 2025: Insights from 20+ Years in Mission-Critical Infrastructure

InfoQ

APRIL 10, 2025

Matthew Liste, Head of Infrastructure at American Express, shared insights at QCon London 2025 on building robust cloud platforms in financial services. By Steef-Jan Wiggers

Infrastructure

Infrastructure Scalability Cloud Engineering

Tech services firms are aggressively applying AI in delivery. They aren’t ready for the consequences of cannibalizing their business model.

The Agile Manager

MARCH 31, 2025

Tech services firms are instead applying AI to their own offerings to do things like accelerate the reverse-engineering of existing code and expedite forward engineering of new solutions. McKinsey last week posted a blog touting their proprietary AI platform for rejuvenating legacy infrastructure). Not the customers business.

Technology

Technology Technology Design Engineering

Best Free and Paid MySQL Monitoring Tools [2025]

Scalegrid

APRIL 28, 2025

Best for On-Premise Monitoring Zabbix for MySQL Agent-based tracking, configurable alerts, strong monitoring for in-house infrastructure. Tools like Paessler PRTG Network Monitor provide comprehensive solutions by combining database monitoring with broader IT infrastructure oversight.

Monitoring

Monitoring Open Source Analytics Metrics

A new chapter, and thoughts on a pivotal year for C++

Sutter's Mill

NOVEMBER 11, 2024

I’ve known folks at CitSec for many years now (including some who participate in WG 21) and have long known it to be a great organization with some of the brightest minds in engineering and beyond. To all of those experts: Again, thank you !

C++

C++ Lambda Retail Code

AI’s Future: Not Always Bigger

O'Reilly

MARCH 11, 2025

Nothing is more discouraging than the idea that it will take tens of millions of dollars to train a model and billions of dollars to build the infrastructure necessary to operate it. What about computing infrastructure? Jevons paradox has a big impact on what kind of data infrastructure is needed to support the growing AI industry.

Artificial Intelligence

Artificial Intelligence Infrastructure Government Hardware

Extension management made simple with the new Dynatrace Extensions

Dynatrace

FEBRUARY 13, 2025

Refreshed look and feel The Dynatrace Community has long wished for a modern design, and thanks to the Dynatrace platform, its finally here. Analyzing status changes over time can exclude temporary infrastructure issues or find patterns. Once again, Extensions comes in handy. There is no easier way to enter it than by the front door.

Transportation

Transportation Monitoring Database Metrics

What Comes After the LLM: Human-Centered AI, Spatial Intelligence, and the Future of Practice

O'Reilly

JUNE 6, 2025

This has serious implications for how we design, deploy, and govern AI systems across institutions, economies, and everyday life. Even with well-intentioned design, these systems can easily cross into overreach if they’re not built with human experience in mind. Fei-Fei doesn’t describe AI as a feature or even an industry.

Education

Education Government Healthcare Transportation

Monitoring Distributed Systems

Dotcom-Montior

MARCH 3, 2025

Distributed systems are designed to improve performance, provide redundancy, and support scalability. Despite their complexity, distributed systems are designed to provide transparency to present a unified interface to users without exposing underlying intricacies. Automated alerts and failure recovery mechanisms are essential.

Systems

Systems Monitoring Latency Blockchain

MCP: What It Is and Why It Matters—Part 3

O'Reilly

JUNE 5, 2025

Essentially youre designing the interface that the AI will see. By documenting it, you also help AI prompt engineers know how to prompt the model. Theres active research and engineering going into making AI agents more reliable (techniques like better prompt chaining, feedback loops, or fine-tuning on tool use).

Servers

Servers Tuning Transportation Latency

How Synthetic Monitoring Can Warm Up Your CDN (and Why It Matters)

Dotcom-Montior

JULY 26, 2025

For organizations operating at global scale, Content Delivery Networks (CDNs) have become indispensable infrastructure for delivering fast, reliable user experiences. During high-traﬃc periods or after cache purges, this can create signiﬁcant load on origin infrastructure, potentially leading to cascading performance issues or even outages.

Monitoring

Monitoring Cache Strategy Metrics

Investigation of a Workbench UI Latency Issue

The Netflix TechBlog

OCTOBER 14, 2024

Symptom Machine Learning engineer Luca Pozzi reported to our Data Platform team that their JupyterLab UI on their workbench becomes slow and unresponsive when running some of their Notebooks. Be part of shaping the future of Data Security and Infrastructure, Data Developer Experience, Analytics Infrastructure and Enablement, and more.

Latency

Latency Virtualization Traffic Processing

Vibe Coding is so “Last Month…” — My First Agent Swarm Experience with claude-flow

Adrian Cockcroft

JUNE 26, 2025

This is the kind of thing the terminal shows as it was coding the conciousness engine. Let me update my todo and spawn the agents: ● Update Todos ⎿ ☒ Read consciousness-engine-guide.md to understand requirements ☒ Analyze current project structure and dependencies ☐ Execute: work through plans/consciousness-engine-guide.md

Code

Code IoT Engineering Testing

Fireside Chat with my Persona at Soopra 2.0 Launch

Adrian Cockcroft

MAY 20, 2025

New forms of interaction will take a long time to develop, and the improvements in infrastructure are dwarfed by the improvements in model efficiency. Persona answer: As I see it, the next major leap in AI will likely be driven by a combination of better models, new forms of human-AI interaction, and infrastructure innovation.

AWS

AWS Education Serverless Innovation

Rift Between Junior and Senior Developers

O'Reilly

OCTOBER 22, 2024

We no longer need to spend loads of time training developers; we can train them to be “prompt engineers” (which makes me think of developers who arrive on time), and they will ask the AI for the code, and it will deliver. As AI improves, it will probably even give you an answer that works. This is great!

Development

Development Code Google Engineering

A Field Guide to Rapidly Improving AI Products

O'Reilly

APRIL 15, 2025

Teams with thoughtfully designed data viewers iterate 10x faster than those without them. Their product manager, a learning design expert, would create detailed PowerPoint decks explaining pedagogical principles and example dialogues. Shed present these to the engineering team, who would then translate her expertise into prompts.

Metrics

Metrics Testing Infrastructure Systems

“Death by 1000 Pilots”

O'Reilly

APRIL 29, 2025

Our work focuses on the challenges that come with bringing PoCs to production, such as scaling AI infrastructure, improving AI system reliability, and producing business value. As Steve Yegge says, you have to demand that the AI writes code that meets your quality standards as an engineer.

Google

Google Software Architecture Hardware Programming

Designing Instagram

High Scalability

JANUARY 11, 2022

Machine Learning Engineer at Amazon and has led several machine-learning initiatives across the Amazon ecosystem. Design a photo-sharing platform similar to Instagram where users can upload their photos and share it with their followers. High Level Design. Component Design. API Design. Problem Statement.

Design

Design Media Storage Logistics

What is platform engineering?

Dynatrace

NOVEMBER 3, 2023

In response to this shift, platform engineering is growing in popularity. The practice of platform engineering has evolved alongside the increasing complexity of cloud environments. A platform encompasses a set of tools, services, and infrastructure that enables developers to build, test, and deploy software applications.

Engineering

Engineering DevOps Software Engineering Scalability

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

a Netflix member via Twitter This is an example of a question our on-call engineers need to answer to help resolve a member issue?—?which Now let’s look at how we designed the tracing infrastructure that powers Edgar. Now let’s look at how we designed the tracing infrastructure that powers Edgar.

Infrastructure

Infrastructure Transportation Storage Open Source

Unlock the Power of DevSecOps with Newly Released Kubernetes Experience for Platform Engineering

Dynatrace

NOVEMBER 7, 2023

Platform engineering is on the rise. According to leading analyst firm Gartner, “80% of software engineering organizations will establish platform teams as internal providers of reusable services, components, and tools for application delivery…” by 2026. Automation, automation, automation.

Engineering

Engineering DevOps Best Practices Infrastructure

Supercharge your end-to-end infrastructure and operations observability experience

Part 1: A Survey of Analytics Engineering Work at Netflix

Trending Sources

Behind the Streams: Live at Netflix. Part 1

Sustainability: Thoughts from a software engineer

Netflix Tudum Architecture: from CQRS with Kafka to CQRS with RAW Hollow

Analyze query performance: The next level of database performance optimization

Dynatrace joins the Microsoft Intelligent Security Association

Introducing Configurable Metaflow

Dynatrace delivers Full-Stack Observability for AI with NVIDIA Blackwell and NVIDIA NIM

Power Dashboarding, Part I: Start your exploration journey with Dashboards

OpenPipeline: Simplify access to critical business data

Foundation Model for Personalized Recommendation

Driving Content Delivery Efficiency Through Classifying Cache Misses

Scaling Systems for Travel Tuesday: Surviving Billion-Event Spikes

Why Core Web Vitals are crucial for optimizing digital experience

Title Launch Observability at Netflix Scale

Turning my MCP content into a blog post on Platform Engineering

Citus for PostgreSQL: How to Scale Your Database Horizontally

Pivot the perspective of your investigative queries with Security Investigator

Unlocking sovereignty with Dynatrace and Deloitte

Simplify SAP Monitoring with PowerConnect and Dynatrace

Presentation: Beyond Durability: Database Resilience and Entropy Reduction with Write-Ahead Logging at Netflix

Best Practices for Scaling RabbitMQ

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

QCon London 2025: Insights from 20+ Years in Mission-Critical Infrastructure

Tech services firms are aggressively applying AI in delivery. They aren’t ready for the consequences of cannibalizing their business model.

Best Free and Paid MySQL Monitoring Tools [2025]

A new chapter, and thoughts on a pivotal year for C++

AI’s Future: Not Always Bigger

Extension management made simple with the new Dynatrace Extensions

What Comes After the LLM: Human-Centered AI, Spatial Intelligence, and the Future of Practice

Monitoring Distributed Systems

MCP: What It Is and Why It Matters—Part 3

How Synthetic Monitoring Can Warm Up Your CDN (and Why It Matters)

Investigation of a Workbench UI Latency Issue

Vibe Coding is so “Last Month…” — My First Agent Swarm Experience with claude-flow

Fireside Chat with my Persona at Soopra 2.0 Launch

Rift Between Junior and Senior Developers

A Field Guide to Rapidly Improving AI Products

“Death by 1000 Pilots”

Designing Instagram

What is platform engineering?

Building Netflix’s Distributed Tracing Infrastructure

Unlock the Power of DevSecOps with Newly Released Kubernetes Experience for Platform Engineering

Stay Connected