Latency, Software and Software Engineering - Technology Performance Pulse

Queuing Theory for Software Engineers

DZone

JUNE 4, 2024

“Start With Why” Queues are a built-in mechanism everywhere in today's software. Not being familiar with the basics of queuing theory will prevent you from understanding the relations between latency and throughput , high-level capacity estimations, and workload optimization. Queues Are Everywhere!

Software Engineering

Software Engineering Engineering Software Software

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing One approach to such a challenging scenario is stream processing, a computing paradigm and software architectural style for data-intensive software systems that emerged to cope with requirements for near real-time processing of massive amounts of data. This significantly increases event latency.

Engineering

Engineering Tuning Latency Open Source

SRE vs DevOps: What you need to know

Dynatrace

FEBRUARY 24, 2021

Cloud-native environments bring speed and agility to software development and operations (DevOps) practices. DevOps is focused on optimizing software development and delivery, and SRE is focused on operations processes. DevOps is best thought of as a practical approach to speeding up new software development and delivery.

DevOps

DevOps Software Engineering Speed Google

Software engineering for machine learning: a case study

The Morning Paper

JULY 7, 2019

Software engineering for machine learning: a case study Amershi et al., More specifically, we’ll be looking at the results of an internal study with over 500 participants designed to figure out how product development and software engineering is changing at Microsoft with the rise of AI and ML. ICSE’19.

Software Engineering

Software Engineering Engineering Software Software

Site reliability engineering: 5 things you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. ” According to Google, “SRE is what you get when you treat operations as a software problem.”

Engineering

Engineering DevOps Government Latency

Site reliability engineering: 5 things to you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. ” According to Google, “SRE is what you get when you treat operations as a software problem.”

Engineering

Engineering DevOps Government Latency

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

This shift is leading more organizations to hire site reliability engineers to guarantee the reliability and resiliency of their services. How site reliability engineering affects organizations’ bottom line SRE applies the disciplines of software engineering to infrastructure management, both on-premises and in the cloud.

Best Practices

Best Practices DevOps Latency Metrics

Designing Instagram

High Scalability

JANUARY 11, 2022

When a user requests for feed then there will be two parallel threads involved in fetching the user feeds to optimize for latency. FUN FACT : In this talk , Dikang Gu, a software engineer at Instagram core infra team has mentioned about how they use Cassandra to serve critical usecases, high scalability requirements, and some pain points.

Design

Design Media Storage Logistics

Automated observability, security, and reliability at scale

Dynatrace

JULY 18, 2023

Dynatrace Configuration as Code enables complete automation of the Dynatrace platform’s configuration, ensuring that software is secure and reliable. As software development grows more complex, managing components using an automated onboarding process becomes increasingly important.

Best Practices

Best Practices Code Infrastructure Latency

Application observability meets developer observability: Unlock a 360º view of your environment

Dynatrace

NOVEMBER 6, 2023

In a recent webinar , Dynatrace DevOps activist Andi Grabner and senior software engineer Yarden Laifenfeld explored developer observability. Dynatrace enables teams to specify SLOs, such as latency, uptime, availability, and more. I think Dynatrace and Rookout together are going to enable this future.”

Development

Development DevOps Programming Cloud

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. Technology advancements in content creation and consumption have also increased its data footprint. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

In that scenario, the system would need to deal with the data propagation latency directly, for example, by use of timeouts or client-originated update tracking mechanisms. We started seeing increased response latencies and leader servers running at dangerously high utilization. The query rate in this test is set to 1K requests/second.

Cache

Cache Latency Traffic Systems

DevOps observability: A guide for DevOps and DevSecOps teams

Dynatrace

JANUARY 18, 2023

According to recent Dynatrace research , organizations expect to make software updates 58% more frequently in the coming year. DevOps and DevSecOps practices help organizations release software faster and more frequently, paving the way for digital transformation. Site reliability engineers, or SREs, lead these efforts.

DevOps

DevOps Best Practices Innovation Strategy

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly

MARCH 25, 2025

Most teams approach this like traditional software development but quickly discover it’s a fundamentally different beast. Check out the graph belowsee how excitement for traditional software builds steadily while GenAI starts with a flashy demo and then hits a wall of challenges? Whats worse: Inputs are rarely exactly the same.

Systems

Systems Development Tuning Monitoring

Edge Authentication and Token-Agnostic Identity Propagation

The Netflix TechBlog

FEBRUARY 9, 2021

By offloading token processing from these systems to the central Edge Authentication Services, downstream systems saw significant gains in CPU, request latency, and garbage collection metrics, all of which help reduce cluster footprint and cloud costs. And, we’re hiring Senior Software Engineers !

Architecture

Architecture Latency Servers Website

Snap: a microkernel approach to host networking

The Morning Paper

NOVEMBER 10, 2019

It’s been clear for a while that software designed explicitly for the data center environment will increasingly want/need to make different design trade-offs to e.g. general-purpose systems software that you might install on your own machines. The desire for CPU efficiency and lower latencies is easy to understand. Enter Google!

Network

Network Transportation Latency Entertainment

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. Technology advancements in content creation and consumption have also increased its data footprint. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. Technology advancements in content creation and consumption have also increased its data footprint. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Evolution of ML Fact Store

The Netflix TechBlog

APRIL 26, 2022

Low-latency Queries To avoid downloading all of the fact data from s3 in a spark executor and then dropping it, we analyzed our query patterns and figured out that there is a way to only access the data that we are interested in. and we had no additional information to optimize our queries.

Storage

Storage Design Scalability Latency

Achieving observability in async workflows

The Netflix TechBlog

MAY 14, 2021

We are expected to process 1,000 watermarks for a single distribution in a minute, with non-linear latency growth as the number of watermarks increases. The goal is to process these documents as fast as possible and reliably deliver them to recipients while offering strong observability to both our users and internal teams.

Traffic

Traffic Java Latency Google

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

As our business scales globally, the demand for data is growing and the needs for scalable low latency incremental processing begin to emerge. It serves thousands of users, including data scientists, data engineers, machine learning engineers, software engineers, content producers, and business analysts, in various use cases.

Processing

Processing Big Data Efficiency Engineering

Growth Engineering at Netflix?—?Automated Imagery Generation

The Netflix TechBlog

FEBRUARY 9, 2021

Server-generated assets, since client-side generation would require the retrieval of many individual images, which would increase latency and time-to-render. To reduce latency, assets should be generated in an offline fashion and not in real time. Different assets for different device types and screen sizes.

Engineering

Engineering Storage Latency Entertainment

Re-Architecting the Video Gatekeeper

The Netflix TechBlog

JULY 12, 2019

This data-propagation latency was unacceptable?—?we The Tangible Result With the data propagation latency issue solved, we were able to re-implement the Gatekeeper system to eliminate all I/O boundaries. Traditional Hollow usage The problem with this total-source-of-truth iteration model is that it can take a long time.

Cache

Cache Architecture Engineering Latency

O’Reilly serverless survey 2019: Concerns, what works, and what to expect

O'Reilly

NOVEMBER 12, 2019

We suspect this points to a general drift toward software teams taking more responsibility for infrastructure, and increasingly, enabled by serverless options. As noted earlier, the majority of survey respondents are software engineers. latency, startup, mocking, etc.) Industries of survey respondents.

Serverless

Serverless Architecture FinTech Infrastructure

Millions of tiny databases

The Morning Paper

MARCH 3, 2020

The core algorithms (chain-replication, Paxos-based consensus) aren’t the stars of the show here, instead the paper focuses on how these algorithms are deployed, and the software engineering practices behind the creation of a mission-critical production system employing them. A guiding principle. Cells have seven nodes.

Database

Database AWS Network Design

What is a Site Reliability Engineer (SRE)?

Dotcom-Montior

OCTOBER 6, 2021

A site reliability engineer, or SRE, is a role that that encompasses aspects of both software engineering and operations/infrastructure. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created. At that time, the team was made up of software engineers.

Engineering

Engineering DevOps Monitoring Google

Curbing Connection Churn in Zuul

The Netflix TechBlog

AUGUST 16, 2023

System Metrics Given the significant reduction in connections, we saw reduced CPU utilization (~4%), heap usage (~15%), and latency (~3%) on Zuul, as well. In this case, we went from a subset size of 100 for 400 servers (a division of 4) to 50 (a division of 8).

Traffic

Traffic Servers Google Metrics

Starting an SRE Team? Stay Away From Uptime.

DZone

DECEMBER 8, 2021

A good SRE engineer will tell you your service is never down. A great SRE engineer will tell you that’s not what you should be measuring. In fact, they’ll tell you their job is customer service.

Engineering

Engineering Scalability Systems Traffic

Automating chaos experiments in production

The Morning Paper

JULY 4, 2019

Two failure modes we focus on are a service becoming slower (increase in response latency) or a service failing outright (returning errors). The criticality score is combined with a safety score and experiment weight (failure experiments, then latency, than failure inducing latency) to produce the final prioritization score.

Latency

Latency Engineering Metrics Traffic

Growth Engineering at Netflix- Creating a Scalable Offers Platform

The Netflix TechBlog

FEBRUARY 9, 2021

It also means fewer engineering teams are required to support initiatives in this space. Lower latency as a result of fewer service calls, which means fewer errors for our visitors. Configuration instead of code for updating SKU data, which improves innovation velocity. The world is constantly changing.

Engineering

Engineering Scalability Architecture Innovation

Communal Computing’s Many Problems

O'Reilly

JULY 20, 2021

You will find that the paradigms you choose for other parties won’t align with the expectations for children, and modifying your software to accommodate children is difficult or impossible. Most software is built to work for as many people as possible; this is called generalization. Norms stand in the way of generalization.

Google

Google Games Technology Technology

Microservices – What CSPs can Learn From IT

VoltDB

DECEMBER 8, 2017

As vendors and CSPs are faced with building these virtualized systems, it’s imperative to look at the software engineering methodologies that the IT industry has successfully applied to challenges at comparable scale. It’s accepted that new tooling is required, but there’s no consensus yet on standards.

Latency

Latency Virtualization Cloud Software Engineering

Microservices – What CSPs can Learn From IT

VoltDB

DECEMBER 8, 2017

As vendors and CSPs are faced with building these virtualized systems, it’s imperative to look at the software engineering methodologies that the IT industry has successfully applied to challenges at comparable scale. It’s accepted that new tooling is required, but there’s no consensus yet on standards.

Latency

Latency Virtualization Cloud Software Engineering

Exercises in Emulation: Xbox 360’s FMA Instruction

Randon ASCII

MARCH 20, 2019

And, FMA instructions often have lower latency than a multiply followed by an add instruction. On the Xbox 360 CPU the latency and throughput of FMA was the same as for fmul or fadd so using an FMA instead of an fmul followed by a dependent fadd would halve the latency. Emulating FMA. Hypothetically.

Games

Games Latency Software Engineering Programming

Open Source at AWS re:Invent

Adrian Cockcroft

NOVEMBER 18, 2019

Join Lee Packham, AWS Solutions Architect and Enrico Huijbers, AWS Software Development Engineer to find out how easy it is. OPN304 Learnings from migrating a service from JDK 8 to JDK 11 AWS Lambda improved latency by migrating to JDK 11 with Amazon Corretto.

Open Source

Open Source AWS Lambda Serverless

Open Source at AWS re:Invent

Adrian Cockcroft

NOVEMBER 18, 2019

Join Lee Packham, AWS Solutions Architect and Enrico Huijbers, AWS Software Development Engineer to find out how easy it is. OPN304 Learnings from migrating a service from JDK 8 to JDK 11 AWS Lambda improved latency by migrating to JDK 11 with Amazon Corretto.

Open Source

Open Source AWS Lambda Serverless

Reverb: speculative debugging for web applications

The Morning Paper

JANUARY 26, 2020

If the server-side responder is also being replayed, then Reverb inserts a new request into the server-side log… When the response is generated, Reverb buffers it and uses a model of network latency to determine where to inject the response into the client-side log.

Programming

Programming Servers Network Latency

Content Management Systems of the Future: Headless, JAMstack, ADN and Functions at the Edge

Abhishek Tiwari

NOVEMBER 3, 2018

In addition, traditional CMS solutions lack integration with modern software stack, cloud services, and software delivery pipelines. Using CDN for the whole website, you can offload most of the website traffic to your CDN which will handle not only large traffic spikes but also reduce the latency of content delivery.

Systems

Systems Cache Website Network

Queuing Theory for Software Engineers

Why applying chaos engineering to data-intensive applications matters

Trending Sources

SRE vs DevOps: What you need to know

Software engineering for machine learning: a case study

Site reliability engineering: 5 things you need to know

Site reliability engineering: 5 things to you need to know

Site reliability done right: 5 SRE best practices that deliver on business objectives

Designing Instagram

Automated observability, security, and reliability at scale

Application observability meets developer observability: Unlock a 360º view of your environment

Netflix at AWS re:Invent 2019

Consistent caching mechanism in Titus Gateway

DevOps observability: A guide for DevOps and DevSecOps teams

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Edge Authentication and Token-Agnostic Identity Propagation

Snap: a microkernel approach to host networking

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Evolution of ML Fact Store

Achieving observability in async workflows

Incremental Processing using Netflix Maestro and Apache Iceberg

Growth Engineering at Netflix?—?Automated Imagery Generation

Re-Architecting the Video Gatekeeper

O’Reilly serverless survey 2019: Concerns, what works, and what to expect

Millions of tiny databases

What is a Site Reliability Engineer (SRE)?

Curbing Connection Churn in Zuul

Starting an SRE Team? Stay Away From Uptime.

Automating chaos experiments in production

Growth Engineering at Netflix- Creating a Scalable Offers Platform

Communal Computing’s Many Problems

Microservices – What CSPs can Learn From IT

Microservices – What CSPs can Learn From IT

Exercises in Emulation: Xbox 360’s FMA Instruction

Open Source at AWS re:Invent

Open Source at AWS re:Invent

Reverb: speculative debugging for web applications

Content Management Systems of the Future: Headless, JAMstack, ADN and Functions at the Edge

Stay Connected