Availability and Engineering - Technology Performance Pulse

Part 2: A Survey of Analytics Engineering Work at Netflix

The Netflix TechBlog

JANUARY 2, 2025

This article is the second in a multi-part series sharing a breadth of Analytics Engineering work at Netflix, recently presented as part of our annual internal Analytics Engineering conference. Need to catch up? Check out Part 1. Because games differ from series/films, its crucial to validate this estimation method for games.

Analytics

Analytics Engineering Games Entertainment

Chaos Engineering With Litmus: A CNCF Incubating Project

DZone

FEBRUARY 6, 2025

The problems with degraded service availability along with revenue impact occur mainly because of Kubernetes pod crashes along with resource exhaustion and network disruptions that hit during peak shopping seasons.

Engineering

Engineering Traffic Architecture Network

Next generation Dynatrace Davis AI becomes the default causation engine

Dynatrace

NOVEMBER 26, 2019

Back during Perform 2019, we introduced the next generation of the Dynatrace AI causation engine , also known as Davis. becomes the default causation engine and will replace the previous version as the default for all new environments. as the default AI engine. AI causation engine. All existing Davis 1.0

Engineering

Engineering Serverless Metrics Code

What is platform engineering?

Dynatrace

NOVEMBER 3, 2023

In response to this shift, platform engineering is growing in popularity. The practice of platform engineering has evolved alongside the increasing complexity of cloud environments. Platform engineers design and implement these platforms, as well as ensure their security, scalability, and reliability.

Engineering

Engineering DevOps Software Engineering Scalability

Achieving High Availability in CI/CD With Observability

DZone

MARCH 5, 2024

Since most application releases depend on cloud infrastructure, having good continuous integration and continuous delivery (CI/CD) pipelines and end-to-end observability becomes essential for ensuring highly available systems.

Availability

Availability DevOps Infrastructure Scalability

Life of a Netflix Partner Engineer?—?The case of extra 40 ms

The Netflix TechBlog

DECEMBER 14, 2020

Life of a Netflix Partner Engineer?—?The The case of the extra 40 ms By: John Blair , Netflix Partner Engineering The Netflix application runs on hundreds of smart TVs, streaming sticks and pay TV set top boxes. The role of a Partner Engineer at Netflix is to help device manufacturers launch the Netflix application on their devices.

Engineering

Engineering Code Open Source Hardware

Unlock the Power of DevSecOps with Newly Released Kubernetes Experience for Platform Engineering

Dynatrace

NOVEMBER 7, 2023

Platform engineering is on the rise. According to leading analyst firm Gartner, “80% of software engineering organizations will establish platform teams as internal providers of reusable services, components, and tools for application delivery…” by 2026.

Engineering

Engineering DevOps Best Practices Infrastructure

Build systems more reliably with Dynatrace: Chaos Engineering

Dynatrace

AUGUST 21, 2024

To enhance reliability, testing the software under these conditions is crucial to prepare for potential issues by leveraging chaos engineering or similar tools. Chaos engineering is a practice that extends beyond traditional failure testing by identifying unpredictable issues. It forms the cornerstone of chaos engineering experiments.

Engineering

Engineering Systems Latency Metrics

A Five-Step Methodology for Maximizing Efficiency in Software Engineering Meetings

DZone

JANUARY 6, 2024

Meetings are a crucial aspect of software engineering , serving as a collaboration, communication, and decision-making platform. In this article, we will delve deeper into the issues associated with meetings in software engineering and explore the available data.

Software Engineering

Software Engineering Efficiency Engineering Software

Dynatrace extends Synthetic Monitoring capabilities with Network Availability Monitors to validate the availability of infrastructure and services

Dynatrace

JULY 15, 2024

As HTTP and browser monitors cover the application level of the ISO /OSI model , successful executions of synthetic tests indicate that availability and performance meet the expected thresholds of your entire technological stack. Our script, available on GitHub , provides details. into NAM test definitions.

Availability

Availability Network Monitoring Infrastructure

Flexible, scalable, self-service Kubernetes native observability now in General Availability

Dynatrace

MAY 17, 2022

The application consists of several microservices that are available as pod-backed services. Only Dynatrace provides this level of depth and breadth across Kubernetes clusters , from infrastructure level information needed by operations teams, all the way down to code-level inefficiencies that are best handled by application engineers.

Availability

Availability Scalability Cloud Metrics

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Dynatrace

NOVEMBER 22, 2022

The nirvana state of system uptime at peak loads is known as “five-nines availability.” In its pursuit, IT teams hover over system performance dashboards hoping their preparations will deliver five nines—or even four nines—availability. But is five nines availability attainable? Downtime per year. 90% (one nine).

Infrastructure

Infrastructure Availability Systems Retail

Dynatrace observability now available for Red Hat OpenShift on IBM Z and LinuxONE mainframes

Dynatrace

JULY 24, 2024

Dynatrace full stack observability for Red Hat OpenShift Dynatrace enhances software quality and operational efficiency, which drives innovation by unifying application, operation, and platform engineering teams on a single platform. Learn more about the new Kubernetes Experience for Platform Engineering.

Availability

Availability Infrastructure Metrics Hardware

Dare to debug production with Dynatrace Live Debugger

Dynatrace

FEBRUARY 4, 2025

At Dynatrace, we understand your challenges when dealing with external packageswhether you’re hustling with reverse engineering, automatically fetching open source code, or playing the guessing game. Source code is loaded only on an engineers workstation, using the engineers privileges.

Open Source

Open Source Code Engineering Best Practices

The platform engineer role: A game-changer or just hype?

Dynatrace

SEPTEMBER 21, 2023

Site reliability engineering first emerged to address cloud computing’s new performance needs. Today, the platform engineer role is gaining speed as the newest byproduct of scaling DevOps in the emerging but complex cloud-native world. Understanding the platform engineer role DevOps is a constantly evolving discipline.

Games

Games Engineering DevOps Education

Don’t just react: How executives can predict and prevent outages to maximize availability

Dynatrace

OCTOBER 3, 2024

The end goal, of course, is to optimize the availability of organizations’ software. Dynatrace is widely recognized for its AI capabilities’ ability to predict and prevent issues, and automatically identify root causes, maximizing availability. Eventually, the goal is to arrive at self-healing through autonomous cloud operations.

Availability

Availability DevOps Analytics Cloud

How platform engineering and IDP observability can accelerate developer velocity

Dynatrace

MARCH 6, 2024

As organizations look to expand DevOps maturity, improve operational efficiency, and increase developer velocity, they are embracing platform engineering as a key driver. Platform engineering: Build for self-service Self-service deployment is a key attribute of platform engineering. “It makes them more productive.

Engineering

Engineering Development DevOps Infrastructure

HTTP monitors on the latest Dynatrace platform extend insights into the health of your API endpoints and simplify test management

Dynatrace

DECEMBER 18, 2024

Thanks to the power of Grail, those details are available for all executions stored for the entire retention period during which synthetic results are kept. It now fully supports not only Network Availability Monitors but also HTTP synthetic monitors. Details of requests sent during each monitor execution are also available.

Monitoring

Monitoring Testing Metrics Analytics

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing enables software engineers to model their applications’ business logic as high-level representations in a directed acyclic graph without explicitly defining a physical execution plan. We designed experimental scenarios inspired by chaos engineering. Chaos scenario: Random pods executing worker instances are deleted.

Engineering

Engineering Tuning Latency Open Source

DevOps engineer tools: Deploy, test, evaluate, repeat

Dynatrace

DECEMBER 8, 2022

As cloud-native, distributed architectures proliferate, the need for DevOps technologies and DevOps platform engineers has increased as well. DevOps engineer tools can help ease the pressure as environment complexity grows. ” What does a DevOps platform engineer do? .” What are DevOps engineer tools and platforms.

DevOps

DevOps Engineering Testing Open Source

Reliability indicators that matter to your business: SLOs for all data types

Dynatrace

OCTOBER 31, 2024

This lets you build your SLOs around the indicators that matter to you and your customers—critical metrics related to availability, failure rates, request response times, or select logs and business events. While the SLO management web UI and API are already available, the dashboard tile will be released within the next weeks.

Metrics

Metrics Availability Monitoring Scalability

Transform data into insights with Dynatrace Dashboards and Notebooks

Dynatrace

OCTOBER 16, 2024

Kickstart your creation journey using ready-made dashboards and notebooks Creating dashboards and notebooks from scratch can take time, particularly when figuring out available data and how to best use it. This feature lets you explore any available metric and add it to Notebooks or Dashboards.

Social Media

Social Media Metrics Network Analytics

Automating Success: Building a better developer experience with platform engineering

Dynatrace

FEBRUARY 12, 2024

When it comes to platform engineering, not only does observability play a vital role in the success of organizations’ transformation journeys—it’s key to successful platform engineering initiatives. The various presenters in this session aligned platform engineering use cases with the software development lifecycle.

Engineering

Engineering Development Infrastructure Cloud

How Netflix Content Engineering makes a federated graph searchable

The Netflix TechBlog

APRIL 12, 2022

By Alex Hutter , Falguni Jhaveri and Senthil Sayeebaba Over the past few years Content Engineering at Netflix has been transitioning many of its services to use a federated GraphQL platform. it began to power a significant portion of the user experience for many applications within Content Engineering.

Engineering

Engineering Architecture Java Infrastructure

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions. Our audits would detect this and alert the on-call data engineer (DE).

Data Engineering

Data Engineering Engineering Processing Games

Title Launch Observability at Netflix Scale

The Netflix TechBlog

MARCH 4, 2025

This standardization enhances adoption within the personalization stack, simplifies the system, and improves understanding and debuggability for engineers. They must also provide enough information for partner engineers to identify the problem with the underlying service in cases of system-level issues.

Traffic

Traffic Strategy Entertainment Innovation

HDR10+ Now Streaming on Netflix

The Netflix TechBlog

MARCH 24, 2025

AV1 is one of the most efficient codecs available today. Title must be available in HDR10+format 3. We would like to extend our thanks to the following teams for their crucial roles in thislaunch: The various Client and Partner Engineering teams at Netflix that manage the Netflix experience across different device platforms.

Innovation

Innovation Mobile Media Efficiency

Dynatrace joins the Microsoft Intelligent Security Association

Dynatrace

NOVEMBER 20, 2024

Dynatrace, available as an Azure-native service , has a longstanding partnership with Microsoft, deeply rooted in a strong “build with” approach to deliver seamless user experience. The Davis AI engine automatically and continuously delivers actionable insights based on an environment’s current state.

Best Practices

Best Practices Innovation Azure Cloud

Dynatrace Observability for Developers saves time with real-time data

Dynatrace

FEBRUARY 4, 2025

Enterprise adoption with self-service: To facilitate enterprise adoption while minimizing tool sprawl and data silos, Dynatrace allows observability teams and platform engineers to implement a self-service model for developers.

Development

Development Analytics Code Architecture

SRE Best Practices for Java Applications

DZone

MARCH 12, 2025

Site reliability engineering (SRE) plays a vital role in ensuring Java applications' high availability, performance, and scalability. This discipline merges software engineering and operations, aiming to create a robust infrastructure that supports seamless user experiences.

Best Practices

Best Practices Java Software Engineering Scalability

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

Dynatrace

DECEMBER 18, 2023

For busy site reliability engineers, ensuring system reliability, scalability, and overall health is an imperative that’s getting harder to achieve in ever-expanding, cloud-native, container-based environments. Because of its adaptability, Prometheus has become an essential tool for observability engineering. Jolly good!

Metrics

Metrics Engineering Energy Tuning

New Distributed Tracing app provides effortless trace insights

Dynatrace

OCTOBER 23, 2024

Automatic data capture and display: More data, including span attributes, is available for out-of-the-box analysis, with no additional configuration necessary. As soon as the new Distributed Tracing Experience is available for your environment, you’ll see a teaser banner in your classic Distributed Traces app.

Tuning

Tuning Website Availability Performance

Unmatched scalability and security of Dynatrace extensions now available for all supported technologies: 7 reasons to migrate your JMX and Python plugins

Dynatrace

NOVEMBER 3, 2023

address these limitations and brings new monitoring and analytical capabilities that weren’t available to Extensions 1.0: What’s available now and what’s coming later We’ve already started to migrate Dynatrace-developed Extensions 1.0 available, and more are in the pipeline. Extensions 2.0 to the Extension Framework 2.0.

Technology

Technology Technology Scalability Availability

Reduce incident response time with case templates

Dynatrace

MARCH 7, 2025

Repetitive tasks in incident response waste time When investigating incidents in production, engineers typically start each investigation with similar queries to understand what happened and where to look next, though the specifics can vary. Case templates provide engineers with a boilerplate for their investigation.

Speed

Speed Engineering Availability

Build resilient IT systems and manage regulatory requirements with compliance and resilience capabilities from Dynatrace

Dynatrace

JANUARY 15, 2025

Site Reliability Guardian provides an automated change impact analysis to validate service availability, performance, and capacity objectives across various systems. Leveraging code-level insights and transaction analysis, Dynatrace Runtime Application Protection automatically detects attacks on applications in your environment.

Systems

Systems DevOps Analytics Monitoring

5 powerful use cases beyond debugging for Dynatrace Live Debugger

Dynatrace

MARCH 25, 2025

Following are some of the coolest things weve seen engineers do with Live Debugger. Performance benchmarking Performance benchmarking is one of the unresolved mysteries of software engineering. White box testing The nicest thing about deploying UI changes to production is that you can immediately see the changes in action.

Benchmarking

Benchmarking Code Open Source Engineering

OpenPipeline: Simplify access to critical business data

Dynatrace

NOVEMBER 4, 2024

Most of the use cases in these two broad categories benefit from the flexibility that comes from multiple available sources of business data. Log data is then processed accordingly, stored in Dynatrace Grail™ causational data lakehouse, and available for your Business Analytics use cases.

Analytics

Analytics Airlines Metrics Monitoring

Demo: Transform OpenTelemetry data into actionable insights with the Dynatrace Distributed Tracing app

Dynatrace

OCTOBER 29, 2024

Note that the developers of the respective services need to make these metrics available by exposing them via, for example, a Prometheus endpoint that can be used by the OpenTelemetry collector to ingest them and forward them to your Dynatrace tenant.

Metrics

Metrics Tuning Monitoring Availability

Core Web Vitals for Search Engine Optimisation: What Do We Need to Know?

CSS Wizardry

JULY 23, 2023

I am available to help you find and fix your site-speed issues through performance audits , training and workshops , consultancy , and more. But for many queries, there is lots of helpful content available. I’m available for hire to help you out with workshops , consultancy , advice , and development. . You should get in touch.

Engineering

Engineering Google Speed Mobile

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Every image you hover over isnt just a visual placeholder; its a critical data point that fuels our sophisticated personalization engine. This dual availability ensures immediate processing capabilities alongside comprehensive long-term data retention.

Tuning

Tuning Latency Efficiency Storage

Kubernetes health at a glance: One experience to rule it all

Dynatrace

FEBRUARY 1, 2024

The complexity and numerous moving parts of Kubernetes multicloud clusters mean that when monitoring the health of these clusters—which is critical for ensuring reliable and efficient operation of the application—platform engineers often find themselves without an easy and efficient solution.

Engineering

Engineering Efficiency Azure Monitoring

SLOs for Kubernetes clusters: Optimize resource utilization of Kubernetes clusters with service-level objectives

Dynatrace

NOVEMBER 11, 2024

A Kubernetes SLO that continuously evaluates CPU, memory usage, and capacity and compares these available resources to the requested and utilized memory of Kubernetes workloads makes potential resource waste visible, revealing opportunities for countermeasures.

Efficiency

Efficiency Best Practices Monitoring Cloud

The Guide to SRE Principles

DZone

NOVEMBER 30, 2023

Site reliability engineering (SRE) is a discipline in which automated software systems are built to manage the development operations (DevOps) of a product or service. In other words, SRE automates the functions of an operations team via software systems.

DevOps

DevOps Engineering Software Software

Power Dashboarding, Part I: Start your exploration journey with Dashboards

Dynatrace

FEBRUARY 6, 2025

Whether you’re a seasoned IT expert or a marketing professional looking to improve business performance, understanding the data available to you is essential. As you went through these steps, you likely noticed some of the chart options available. Also, explore additional dashboards available on the Dynatrace Playground.

Metrics

Metrics Infrastructure Monitoring Best Practices

Part 2: A Survey of Analytics Engineering Work at Netflix

Chaos Engineering With Litmus: A CNCF Incubating Project

Trending Sources

Next generation Dynatrace Davis AI becomes the default causation engine

What is platform engineering?

Achieving High Availability in CI/CD With Observability

Life of a Netflix Partner Engineer?—?The case of extra 40 ms

Unlock the Power of DevSecOps with Newly Released Kubernetes Experience for Platform Engineering

Build systems more reliably with Dynatrace: Chaos Engineering

A Five-Step Methodology for Maximizing Efficiency in Software Engineering Meetings

Dynatrace extends Synthetic Monitoring capabilities with Network Availability Monitors to validate the availability of infrastructure and services

Flexible, scalable, self-service Kubernetes native observability now in General Availability

Five-nines availability: Always-on infrastructure delivers system availability during the holidays’ peak loads

Dynatrace observability now available for Red Hat OpenShift on IBM Z and LinuxONE mainframes

Dare to debug production with Dynatrace Live Debugger

The platform engineer role: A game-changer or just hype?

Don’t just react: How executives can predict and prevent outages to maximize availability

How platform engineering and IDP observability can accelerate developer velocity

HTTP monitors on the latest Dynatrace platform extend insights into the health of your API endpoints and simplify test management

Why applying chaos engineering to data-intensive applications matters

DevOps engineer tools: Deploy, test, evaluate, repeat

Reliability indicators that matter to your business: SLOs for all data types

Transform data into insights with Dynatrace Dashboards and Notebooks

Automating Success: Building a better developer experience with platform engineering

How Netflix Content Engineering makes a federated graph searchable

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Title Launch Observability at Netflix Scale

HDR10+ Now Streaming on Netflix

Dynatrace joins the Microsoft Intelligent Security Association

Dynatrace Observability for Developers saves time with real-time data

SRE Best Practices for Java Applications

Observability engineering: Getting Prometheus metrics right for Kubernetes with Dynatrace and Kepler

New Distributed Tracing app provides effortless trace insights

Unmatched scalability and security of Dynatrace extensions now available for all supported technologies: 7 reasons to migrate your JMX and Python plugins

Reduce incident response time with case templates

Build resilient IT systems and manage regulatory requirements with compliance and resilience capabilities from Dynatrace

5 powerful use cases beyond debugging for Dynatrace Live Debugger

OpenPipeline: Simplify access to critical business data

Demo: Transform OpenTelemetry data into actionable insights with the Dynatrace Distributed Tracing app

Core Web Vitals for Search Engine Optimisation: What Do We Need to Know?

Introducing Impressions at Netflix

Kubernetes health at a glance: One experience to rule it all

SLOs for Kubernetes clusters: Optimize resource utilization of Kubernetes clusters with service-level objectives

The Guide to SRE Principles

Power Dashboarding, Part I: Start your exploration journey with Dashboards

Stay Connected