Latency and Software Engineering - Technology Performance Pulse

Queuing Theory for Software Engineers

DZone

JUNE 4, 2024

Not being familiar with the basics of queuing theory will prevent you from understanding the relations between latency and throughput , high-level capacity estimations, and workload optimization. In this article, I'll sum up the essence of what's required for a software engineer to be more effective in their field.

Software Engineering

Software Engineering Engineering Software Software

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing One approach to such a challenging scenario is stream processing, a computing paradigm and software architectural style for data-intensive software systems that emerged to cope with requirements for near real-time processing of massive amounts of data. This significantly increases event latency.

Engineering

Engineering Tuning Latency Open Source

SRE vs DevOps: What you need to know

Dynatrace

FEBRUARY 24, 2021

SRE is the transformation of traditional operations practices by using software engineering and DevOps principles to improve the availability, performance, and scalability of releases by building resiliency into apps and infrastructure. Reduced latency. Investing in automation and tooling to avoid toil. SRE vs DevOps?

DevOps

DevOps Software Engineering Speed Google

Software engineering for machine learning: a case study

The Morning Paper

JULY 7, 2019

Software engineering for machine learning: a case study Amershi et al., More specifically, we’ll be looking at the results of an internal study with over 500 participants designed to figure out how product development and software engineering is changing at Microsoft with the rise of AI and ML. ICSE’19.

Software Engineering

Software Engineering Engineering Software Software

Designing Instagram

High Scalability

JANUARY 11, 2022

When a user requests for feed then there will be two parallel threads involved in fetching the user feeds to optimize for latency. FUN FACT : In this talk , Dikang Gu, a software engineer at Instagram core infra team has mentioned about how they use Cassandra to serve critical usecases, high scalability requirements, and some pain points.

Design

Design Media Storage Logistics

Site reliability engineering: 5 things you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. For more about this ongoing conversation, see A guide to event-driven SRE-inspired DevOps.

Engineering

Engineering DevOps Government Latency

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

This shift is leading more organizations to hire site reliability engineers to guarantee the reliability and resiliency of their services. How site reliability engineering affects organizations’ bottom line SRE applies the disciplines of software engineering to infrastructure management, both on-premises and in the cloud.

Best Practices

Best Practices DevOps Latency Metrics

Site reliability engineering: 5 things to you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. For more about this ongoing conversation, see A guide to event-driven SRE-inspired DevOps.

Engineering

Engineering DevOps Government Latency

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

In that scenario, the system would need to deal with the data propagation latency directly, for example, by use of timeouts or client-originated update tracking mechanisms. We started seeing increased response latencies and leader servers running at dangerously high utilization. The query rate in this test is set to 1K requests/second.

Cache

Cache Latency Traffic Systems

Application observability meets developer observability: Unlock a 360º view of your environment

Dynatrace

NOVEMBER 6, 2023

In a recent webinar , Dynatrace DevOps activist Andi Grabner and senior software engineer Yarden Laifenfeld explored developer observability. Dynatrace enables teams to specify SLOs, such as latency, uptime, availability, and more. How do you know if this problem has business impact?

Development

Development DevOps Programming Cloud

Automated observability, security, and reliability at scale

Dynatrace

JULY 18, 2023

To handle this challenge, enterprises need to automate and streamline the onboarding and lifecycle of tool configurations in the software development processes, including aspects of observability, security, alerting, and remediation. Development teams must set up tailored configurations for each tool and component they’re responsible for.

Best Practices

Best Practices Code Infrastructure Latency

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. Technology advancements in content creation and consumption have also increased its data footprint. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Edge Authentication and Token-Agnostic Identity Propagation

The Netflix TechBlog

FEBRUARY 9, 2021

By offloading token processing from these systems to the central Edge Authentication Services, downstream systems saw significant gains in CPU, request latency, and garbage collection metrics, all of which help reduce cluster footprint and cloud costs. And, we’re hiring Senior Software Engineers !

Architecture

Architecture Latency Servers Website

DevOps observability: A guide for DevOps and DevSecOps teams

Dynatrace

JANUARY 18, 2023

Site reliability engineering (SRE) is a software operations methodology that enables organizations to create highly reliable and scalable applications. SRE applies software engineering principles to operations and infrastructure processes. Site reliability engineers, or SREs, lead these efforts.

DevOps

DevOps Best Practices Innovation Strategy

Achieving observability in async workflows

The Netflix TechBlog

MAY 14, 2021

We are expected to process 1,000 watermarks for a single distribution in a minute, with non-linear latency growth as the number of watermarks increases. The goal is to process these documents as fast as possible and reliably deliver them to recipients while offering strong observability to both our users and internal teams.

Traffic

Traffic Java Latency Google

Evolution of ML Fact Store

The Netflix TechBlog

APRIL 26, 2022

Low-latency Queries To avoid downloading all of the fact data from s3 in a spark executor and then dropping it, we analyzed our query patterns and figured out that there is a way to only access the data that we are interested in. and we had no additional information to optimize our queries.

Storage

Storage Design Scalability Latency

Snap: a microkernel approach to host networking

The Morning Paper

NOVEMBER 10, 2019

It’s been clear for a while that software designed explicitly for the data center environment will increasingly want/need to make different design trade-offs to e.g. general-purpose systems software that you might install on your own machines. The desire for CPU efficiency and lower latencies is easy to understand. Enter Google!

Network

Network Transportation Latency Entertainment

Growth Engineering at Netflix?—?Automated Imagery Generation

The Netflix TechBlog

FEBRUARY 9, 2021

Server-generated assets, since client-side generation would require the retrieval of many individual images, which would increase latency and time-to-render. To reduce latency, assets should be generated in an offline fashion and not in real time. Different assets for different device types and screen sizes.

Engineering

Engineering Storage Latency Entertainment

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

As our business scales globally, the demand for data is growing and the needs for scalable low latency incremental processing begin to emerge. It serves thousands of users, including data scientists, data engineers, machine learning engineers, software engineers, content producers, and business analysts, in various use cases.

Processing

Processing Big Data Efficiency Engineering

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. Technology advancements in content creation and consumption have also increased its data footprint. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 209 File system as a service at Netflix Kishore Kasi , Senior Software Engineer Abstract : As Netflix grows in original content creation, its need for storage is also increasing at a rapid pace. Technology advancements in content creation and consumption have also increased its data footprint. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Re-Architecting the Video Gatekeeper

The Netflix TechBlog

JULY 12, 2019

This data-propagation latency was unacceptable?—?we The Tangible Result With the data propagation latency issue solved, we were able to re-implement the Gatekeeper system to eliminate all I/O boundaries. Traditional Hollow usage The problem with this total-source-of-truth iteration model is that it can take a long time.

Cache

Cache Architecture Engineering Latency

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly

MARCH 25, 2025

Were also betting that this will be a time of software development flourishing. With the advent of generative AI, therell be significant opportunities for product managers, designers, executives, and more traditional software engineers to contribute to and build AI-powered software. We tested both retrieval quality (e.g.,

Systems

Systems Development Tuning Monitoring

Millions of tiny databases

The Morning Paper

MARCH 3, 2020

The core algorithms (chain-replication, Paxos-based consensus) aren’t the stars of the show here, instead the paper focuses on how these algorithms are deployed, and the software engineering practices behind the creation of a mission-critical production system employing them. A guiding principle. Cells have seven nodes.

Database

Database AWS Network Design

Automating chaos experiments in production

The Morning Paper

JULY 4, 2019

Two failure modes we focus on are a service becoming slower (increase in response latency) or a service failing outright (returning errors). The criticality score is combined with a safety score and experiment weight (failure experiments, then latency, than failure inducing latency) to produce the final prioritization score.

Latency

Latency Engineering Metrics Traffic

O’Reilly serverless survey 2019: Concerns, what works, and what to expect

O'Reilly

NOVEMBER 12, 2019

More than a fifth of the respondents work in the software industry—skewing results toward the concerns of software companies, and helping explain the preponderance of those with software engineering roles. As noted earlier, the majority of survey respondents are software engineers.

Serverless

Serverless Architecture FinTech Infrastructure

Curbing Connection Churn in Zuul

The Netflix TechBlog

AUGUST 16, 2023

System Metrics Given the significant reduction in connections, we saw reduced CPU utilization (~4%), heap usage (~15%), and latency (~3%) on Zuul, as well. In this case, we went from a subset size of 100 for 400 servers (a division of 4) to 50 (a division of 8).

Traffic

Traffic Servers Google Metrics

Starting an SRE Team? Stay Away From Uptime.

DZone

DECEMBER 8, 2021

A good SRE engineer will tell you your service is never down. A great SRE engineer will tell you that’s not what you should be measuring. In fact, they’ll tell you their job is customer service.

Engineering

Engineering Scalability Systems Traffic

What is a Site Reliability Engineer (SRE)?

Dotcom-Montior

OCTOBER 6, 2021

A site reliability engineer, or SRE, is a role that that encompasses aspects of both software engineering and operations/infrastructure. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created. At that time, the team was made up of software engineers.

Engineering

Engineering DevOps Monitoring Google

Growth Engineering at Netflix- Creating a Scalable Offers Platform

The Netflix TechBlog

FEBRUARY 9, 2021

It also means fewer engineering teams are required to support initiatives in this space. Lower latency as a result of fewer service calls, which means fewer errors for our visitors. Configuration instead of code for updating SKU data, which improves innovation velocity. The world is constantly changing.

Engineering

Engineering Scalability Architecture Innovation

Microservices – What CSPs can Learn From IT

VoltDB

DECEMBER 8, 2017

As vendors and CSPs are faced with building these virtualized systems, it’s imperative to look at the software engineering methodologies that the IT industry has successfully applied to challenges at comparable scale. It’s accepted that new tooling is required, but there’s no consensus yet on standards.

Latency

Latency Virtualization Cloud Software Engineering

Microservices – What CSPs can Learn From IT

VoltDB

DECEMBER 8, 2017

As vendors and CSPs are faced with building these virtualized systems, it’s imperative to look at the software engineering methodologies that the IT industry has successfully applied to challenges at comparable scale. It’s accepted that new tooling is required, but there’s no consensus yet on standards.

Latency

Latency Virtualization Cloud Software Engineering

Exercises in Emulation: Xbox 360’s FMA Instruction

Randon ASCII

MARCH 20, 2019

And, FMA instructions often have lower latency than a multiply followed by an add instruction. On the Xbox 360 CPU the latency and throughput of FMA was the same as for fmul or fadd so using an FMA instead of an fmul followed by a dependent fadd would halve the latency. Emulating FMA. Hypothetically.

Games

Games Latency Software Engineering Programming

Reverb: speculative debugging for web applications

The Morning Paper

JANUARY 26, 2020

If the server-side responder is also being replayed, then Reverb inserts a new request into the server-side log… When the response is generated, Reverb buffers it and uses a model of network latency to determine where to inject the response into the client-side log.

Programming

Programming Servers Network Latency

Communal Computing’s Many Problems

O'Reilly

JULY 20, 2021

While techniques like federated learning are on the horizon, to avoid latency issues and mass data collection, it remains to be seen whether those techniques are satisfactory for companies that collect data. Is there a benefit to both organizations and their customers to limit or obfuscate the transmission of data away from the device?

Google

Google Games Technology Technology

Open Source at AWS re:Invent

Adrian Cockcroft

NOVEMBER 18, 2019

OPN304 Learnings from migrating a service from JDK 8 to JDK 11 AWS Lambda improved latency by migrating to JDK 11 with Amazon Corretto. Software Engineer, AWS Serverless Applications, and Yishai Galatzer, Senior Manager Software Development.

Open Source

Open Source AWS Lambda Serverless

Open Source at AWS re:Invent

Adrian Cockcroft

NOVEMBER 18, 2019

OPN304 Learnings from migrating a service from JDK 8 to JDK 11 AWS Lambda improved latency by migrating to JDK 11 with Amazon Corretto. Software Engineer, AWS Serverless Applications, and Yishai Galatzer, Senior Manager Software Development.

Open Source

Open Source AWS Lambda Serverless

Content Management Systems of the Future: Headless, JAMstack, ADN and Functions at the Edge

Abhishek Tiwari

NOVEMBER 3, 2018

Using CDN for the whole website, you can offload most of the website traffic to your CDN which will handle not only large traffic spikes but also reduce the latency of content delivery. Secondly, having a CDN in front of origin (static site or APIs) reduces the global and regional latency.

Systems

Systems Cache Website Network

Queuing Theory for Software Engineers

Why applying chaos engineering to data-intensive applications matters

Trending Sources

SRE vs DevOps: What you need to know

Software engineering for machine learning: a case study

Designing Instagram

Site reliability engineering: 5 things you need to know

Site reliability done right: 5 SRE best practices that deliver on business objectives

Site reliability engineering: 5 things to you need to know

Consistent caching mechanism in Titus Gateway

Application observability meets developer observability: Unlock a 360º view of your environment

Automated observability, security, and reliability at scale

Netflix at AWS re:Invent 2019

Edge Authentication and Token-Agnostic Identity Propagation

DevOps observability: A guide for DevOps and DevSecOps teams

Achieving observability in async workflows

Evolution of ML Fact Store

Snap: a microkernel approach to host networking

Growth Engineering at Netflix?—?Automated Imagery Generation

Incremental Processing using Netflix Maestro and Apache Iceberg

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Re-Architecting the Video Gatekeeper

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Millions of tiny databases

Automating chaos experiments in production

O’Reilly serverless survey 2019: Concerns, what works, and what to expect

Curbing Connection Churn in Zuul

Starting an SRE Team? Stay Away From Uptime.

What is a Site Reliability Engineer (SRE)?

Growth Engineering at Netflix- Creating a Scalable Offers Platform

Microservices – What CSPs can Learn From IT

Microservices – What CSPs can Learn From IT

Exercises in Emulation: Xbox 360’s FMA Instruction

Reverb: speculative debugging for web applications

Communal Computing’s Many Problems

Open Source at AWS re:Invent

Open Source at AWS re:Invent

Content Management Systems of the Future: Headless, JAMstack, ADN and Functions at the Edge

Stay Connected