Engineering, Latency and Scalability - Technology Performance Pulse

How to Scale Elasticsearch to Solve Your Scalability Issues

DZone

FEBRUARY 26, 2025

With the evolution of modern applications serving increasing needs for real-time data processing and retrieval, scalability does, too. One such open-source, distributed search and analytics engine is Elasticsearch, which is very efficient at handling data in large sets and high-velocity queries.

Scalability

Scalability Open Source Latency Architecture

Scalable Annotation Service?—?Marken

The Netflix TechBlog

JANUARY 25, 2023

Scalable Annotation Service — Marken by Varun Sekhri , Meenakshi Jindal Introduction At Netflix, we have hundreds of micro services each with its own data models or entities. The service should be able to serve real-time, aka UI, applications so CRUD and search operations should be achieved with low latency.

Scalability

Scalability Latency Media Architecture

Why applying chaos engineering to data-intensive applications matters

Dynatrace

MAY 23, 2024

Stream processing enables software engineers to model their applications’ business logic as high-level representations in a directed acyclic graph without explicitly defining a physical execution plan. We designed experimental scenarios inspired by chaos engineering. This significantly increases event latency.

Engineering

Engineering Tuning Latency Open Source

Site reliability engineering: 5 things you need to know

Dynatrace

FEBRUARY 4, 2021

What is site reliability engineering? Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. Dynatrace news. SRE focuses on automation.

Engineering

Engineering DevOps Government Latency

Site reliability engineering: 5 things to you need to know

Dynatrace

FEBRUARY 4, 2021

Site reliability engineering (SRE) is the practice of applying software engineering principles to operations and infrastructure processes to help organizations create highly reliable and scalable software systems. Organizations can then integrate these skilled engineers at key points in the DevOps life cycle.

Engineering

Engineering DevOps Government Latency

Title Launch Observability at Netflix Scale

The Netflix TechBlog

DECEMBER 17, 2024

The Challenge of Title Launch Observability As engineers, were wired to track system metrics like error rates, latencies, and CPU utilizationbut what about metrics that matter to a titlessuccess? The complexity of these operational demands underscored the urgent need for a scalable solution.

Traffic

Traffic Scalability Strategy Monitoring

Foundation Model for Personalized Recommendation

The Netflix TechBlog

MARCH 28, 2025

Yet, many are confined to a brief temporal window due to constraints in serving latency or training costs. Key insights from this shiftinclude: A Data-Centric Approach : Shifting focus from model-centric strategies, which heavily rely on feature engineering, to a data-centric one.

Tuning

Tuning Efficiency Latency Strategy

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

Werner Vogels weblog on building scalable and robust distributed systems. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. The original Dynamo design was based on a core set of strong distributed systems principles resulting in an ultra-scalable and highly reliable database system.

Scalability

Scalability Database Ecommerce Latency

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

Key Takeaways RabbitMQ improves scalability and fault tolerance in distributed systems by decoupling applications, enabling reliable message exchanges. This decoupling is crucial in modern architectures where scalability and fault tolerance are paramount. Keeping queues short maintains a responsive and efficient RabbitMQ setup.

Best Practices

Best Practices Traffic Strategy Efficiency

Growth Engineering at Netflix- Creating a Scalable Offers Platform

The Netflix TechBlog

FEBRUARY 9, 2021

The Growth Engineering team is responsible for executing growth initiatives that help us anticipate and adapt to this change. For more background on Growth Engineering and the signup funnel, please have a look at our previous blog post that covers the basics. We need to be constantly adapting and innovating as a result of this change.

Engineering

Engineering Scalability Architecture Innovation

Dynatrace supports SnapStart for Lambda as an AWS launch partner

Dynatrace

NOVEMBER 28, 2022

The new Amazon capability enables customers to improve the startup latency of their functions from several seconds to as low as sub-second (up to 10 times faster) at P99 (the 99th latency percentile). This can cause latency outliers and may lead to a poor end-user experience for latency-sensitive applications.

Lambda

Lambda AWS Serverless Latency

Stuff The Internet Says On Scalability For March 1st, 2019

High Scalability

MARCH 1, 2019

It was made possible by using a low latency of 0.1 seconds, the lower the latency, the more responsive the robot. Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading).

Internet

Internet Internet Scalability Blockchain

Designing Instagram

High Scalability

JANUARY 11, 2022

Machine Learning Engineer at Amazon and has led several machine-learning initiatives across the Amazon ecosystem. FUN FACT : In this talk , Rodrigo Schmidt, director of engineering at Instagram talks about the different challenges they have faced in scaling the data infrastructure at Instagram. This is a guest post by Ankit Sirmorya.

Design

Design Media Storage Logistics

Stuff The Internet Says On Scalability For November 23rd, 2018

High Scalability

NOVEMBER 23, 2018

Here's some fancy FCC reverse engineering magic. Delay is Not an Option: Low Latency Routing in Space , Murat ). Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading).

Internet

Internet Internet Scalability Analytics

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

Central to this infrastructure is our use of multiple online distributed databases such as Apache Cassandra , a NoSQL database known for its high availability and scalability. It also serves as central configuration of access patterns such as consistency or latency targets. Useful for keeping “n-newest” or prefix path deletion.

Latency

Latency Storage Cache Efficiency

Dynatrace supports the newly released AWS Lambda Response Streaming

Dynatrace

APRIL 7, 2023

Customers can use AWS Lambda Response Streaming to improve performance for latency-sensitive applications and return larger payload sizes. Streaming raises the default 6 MB hard limit to a 20 MB soft limit, adding greater scalability and flexibility to their applications. What is a Lambda serverless function?

Lambda

Lambda AWS Serverless Latency

The Netflix Cosmos Platform

The Netflix TechBlog

MARCH 1, 2021

It supports both high throughput services that consume hundreds of thousands of CPUs at a time, and latency-sensitive workloads where humans are waiting for the results of a computation. The third generation, called Reloaded , has been online for about seven years and has proven to be stable and massively scalable.

Serverless

Serverless Media Latency Social Media

Nine ways technology executives can get significant business value with the right observability platform

Dynatrace

MAY 21, 2024

That’s because it does not require any pre-prepared schemas, and access to cold/hot storage is fully automatic and with zero latency. The principle of “keep it simple, stupid” is more important than ever, translating to consolidating tools and making processes more consistent at higher grades of scalability and automation.

Technology

Technology Technology Analytics Storage

Mastering Disk Space Management with MongoDB® Storage Engines

Scalegrid

MAY 11, 2024

MongoDB offers several storage engines that cater to various use cases. The default storage engine in earlier versions was MMAPv1, which utilized memory-mapped files and document-level locking. The newer, pluggable storage engine, WiredTiger, addresses this by using prefix compression, collection-level locking, and row-based storage.

Storage

Storage Engineering Cache Database

Engineering dependability and fault tolerance in a distributed system

High Scalability

FEBRUARY 19, 2021

This means a system that is not merely available but is also engineered with extensive redundant measures to continue to work as its users expect. reliability situations, where continuity of service is essential, with redundant elements continuously in-service, such as with airplane engines. This ensures reliability.

Engineering

Engineering Systems Availability Scalability

Dynatrace supports Azure Managed Instance for Apache Cassandra

Dynatrace

MAY 13, 2022

Because of its scalability and distributed architecture, thousands of companies trust it to run their cloud and hybrid-based workloads at high availability without compromising performance. With the Dynatrace Data Explorer, you can easily analyze metrics, such as client read/write latency by Cassandra nodes and disk space usage by keyspaces.

Azure

Azure Latency Metrics Infrastructure

Stuff The Internet Says On Scalability For December 21st, 2018

High Scalability

DECEMBER 21, 2018

It's HighScalability time: Have a very scalable Xmas everyone! Tim Bray : How to talk about [Serverless Latency] · To start with, don’t just say “I need 120ms.” See you in the New Year. Do you like this sort of Stuff? Please support me on Patreon. I'd really appreciate it. Explain the Cloud Like I'm 10.

Internet

Internet Internet Scalability Serverless

SRE vs DevOps: What you need to know

Dynatrace

FEBRUARY 24, 2021

SRE is the transformation of traditional operations practices by using software engineering and DevOps principles to improve the availability, performance, and scalability of releases by building resiliency into apps and infrastructure. Reduced latency. Investing in automation and tooling to avoid toil. SRE vs DevOps?

DevOps

DevOps Software Engineering Speed Google

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

OCTOBER 19, 2020

a Netflix member via Twitter This is an example of a question our on-call engineers need to answer to help resolve a member issue?—?which We needed to increase engineering productivity via distributed request tracing. That is the first question our engineering teams asked us when integrating the tracer library.

Infrastructure

Infrastructure Transportation Storage Open Source

How to maximize serverless benefits and overcome its challenges

Dynatrace

OCTOBER 10, 2022

Many organizations today rely on cloud-native applications for their scalability and agility, among other benefits. Serverless benefits include the following: Dynamic scalability. Reduced latency. By using cloud providers with multiple server sites, organizations can reduce function latency for end users.

Serverless

Serverless Infrastructure Lambda Latency

Starting an SRE Team? Stay Away From Uptime.

DZone

DECEMBER 8, 2021

A good SRE engineer will tell you your service is never down. A great SRE engineer will tell you that’s not what you should be measuring. In fact, they’ll tell you their job is customer service.

Engineering

Engineering Scalability Systems Traffic

Improved Alerting with Atlas Streaming Eval

The Netflix TechBlog

APRIL 27, 2023

Engineers want their alerting system to be realtime, reliable, and actionable. A few years ago, we were paged by our SRE team due to our Metrics Alerting System falling behind — critical application health alerts reached engineers 45 minutes late! It opens doors to support more exciting use-cases. OK, Results?

Storage

Storage Cache Metrics Database

Introducing Netflix TimeSeries Data Abstraction Layer

The Netflix TechBlog

OCTOBER 8, 2024

Rajiv Shringi Vinay Chella Kaidan Fullerton Oleksii Tkachuk Joey Lynch Introduction As Netflix continues to expand and diversify into various sectors like Video on Demand and Gaming , the ability to ingest and store vast amounts of temporal data — often reaching petabytes — with millisecond access latency has become increasingly vital.

Latency

Latency Storage Traffic Tuning

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The data warehouse is not designed to serve point requests from microservices with low latency. Moving data with Bulldozer at Netflix.

Latency

Latency Storage Big Data Tuning

Rebuilding Netflix Video Processing Pipeline with Microservices

The Netflix TechBlog

JANUARY 10, 2024

This architecture shift greatly reduced the processing latency and increased system resiliency. We expanded pipeline support to serve our studio/content-development use cases, which had different latency and resiliency requirements as compared to the traditional streaming use case. divide the input video into small chunks 2.

Processing

Processing Media Latency Innovation

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

In that scenario, the system would need to deal with the data propagation latency directly, for example, by use of timeouts or client-originated update tracking mechanisms. We started seeing increased response latencies and leader servers running at dangerously high utilization.

Cache

Cache Latency Traffic Systems

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

To this end, we developed a Rapid Event Notification System (RENO) to support use cases that require server initiated communication with devices in a scalable and extensible manner. Personalized Experience Refresh Netflix Recommendation engine continuously refreshes recommendations for every member.

Systems

Systems Traffic Architecture Mobile

What is AWS Lambda?

Dynatrace

APRIL 5, 2021

Lambda’s toolbox of automated processes helps developers streamline to build fast, robust, and scalable applications on accelerated timelines. AWS continues to improve how it handles latency issues. As The New Stack reports, developers spend only 32% of their time at work actually coding. It helps SRE teams automate responses.

Lambda

Lambda AWS Serverless Hardware

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Over the years, this platform took on support for both elastic online services and fully featured batch workloads supporting use cases across Netflix engineering.

AWS

AWS Entertainment Open Source Benchmarking

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Without these integrations, projects would be stuck at the prototyping stage, or they would have to be maintained as outliers outside the systems maintained by our engineering teams, incurring unsustainable operational overhead. Importantly, all the use cases were engineered by practitioners themselves.

Systems

Systems Media Cache Open Source

Observability vs. monitoring: What’s the difference?

Dynatrace

NOVEMBER 3, 2021

For example, when monitoring a database, you’ll want to know about any latency when writing data to a disk or average query response time. DevOps practitioners struggle to maintain highly available and scalable applications. Experienced database administrators learn to spot patterns that can lead to common problems.

Monitoring

Monitoring Metrics DevOps Scalability

DevOps observability: A guide for DevOps and DevSecOps teams

Dynatrace

JANUARY 18, 2023

From site reliability engineering to service-level objectives and DevSecOps, these resources focus on how organizations are using these best practices to innovate at speed without sacrificing quality, reliability, or security. SRE applies software engineering principles to operations and infrastructure processes. – blog.

DevOps

DevOps Best Practices Innovation Strategy

Stuff The Internet Says On Scalability For August 17th, 2018

High Scalability

AUGUST 17, 2018

12 million requests / hour with sub-second latency, ~300GB of throughput / day. allspaw : engineer: “Unless you’re familiar with Lamport, Brewer, Fox, Armstrong, Stonebraker, Parker, Shapiro.(and @coryodaniel : Rewrote an #AWS APIGateway & #lambda service that was costing us about $16000 / month in #elixir.

Internet

Internet Internet Scalability Games

Automated observability, security, and reliability at scale

Dynatrace

JULY 18, 2023

Efficient environment configuration at scale One of software engineers’ most significant challenges is managing the numerous tools and technologies required for the software product lifecycle. Development teams must set up tailored configurations for each tool and component they’re responsible for.

Best Practices

Best Practices Code Infrastructure Latency

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Increased latency during peak loads. Introduce scalable microservices architectures to distribute computational loads efficiently. High costs of training and retaining talent.

IoT

IoT Energy Logistics Latency

Real user monitoring vs. synthetic monitoring: Understanding best practices

Dynatrace

JUNE 27, 2022

In reality, only highly scalable RUM solutions can collect data on all user actions, while less scalable tools must sample user actions and make inferences from partial data. connectivity, access, user count, latency) of geographic regions. Use data from one engine to facilitate testing for the other.

Best Practices

Best Practices Monitoring Wireless Traffic

A one size fits all database doesn't fit anyone

All Things Distributed

JUNE 21, 2018

As I have talked about before, one of the reasons why we built Amazon DynamoDB was that Amazon was pushing the limits of what was a leading commercial database at the time and we were unable to sustain the availability, scalability, and performance needs that our growing Amazon.com business demanded. The opposite is true.

Database

Database AWS Games Latency

Under the Hood of Amazon EC2 Container Service

All Things Distributed

JULY 20, 2015

To be robust and scalable, this key/value store needs to be distributed for durability and availability, to protect against network partitions or hardware failures. This architecture affords Amazon ECS high availability, low latency, and high throughput because the data store is never pessimistically locked.

Latency

Latency Architecture AWS Open Source

Accelerating Data: Faster and More Scalable ElastiCache for Redis

All Things Distributed

OCTOBER 12, 2016

Three years ago, as part of our AWS Fast Data journey we introduced Amazon ElastiCache for Redis , a fully managed in-memory data store that operates at sub-millisecond latency. This allows for faster failover times while minimizing latency. Amazon’s enhancements address many day-to-day challenges with running Redis.

Scalability

Scalability Cache Analytics AWS

How to Scale Elasticsearch to Solve Your Scalability Issues

Scalable Annotation Service?—?Marken

Trending Sources

Why applying chaos engineering to data-intensive applications matters

Site reliability engineering: 5 things you need to know

Site reliability engineering: 5 things to you need to know

Title Launch Observability at Netflix Scale

Foundation Model for Personalized Recommendation

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

Best Practices for Scaling RabbitMQ

Growth Engineering at Netflix- Creating a Scalable Offers Platform

Dynatrace supports SnapStart for Lambda as an AWS launch partner

Stuff The Internet Says On Scalability For March 1st, 2019

Designing Instagram

Stuff The Internet Says On Scalability For November 23rd, 2018

Introducing Netflix’s Key-Value Data Abstraction Layer

Dynatrace supports the newly released AWS Lambda Response Streaming

The Netflix Cosmos Platform

Nine ways technology executives can get significant business value with the right observability platform

Mastering Disk Space Management with MongoDB® Storage Engines

Engineering dependability and fault tolerance in a distributed system

Dynatrace supports Azure Managed Instance for Apache Cassandra

Stuff The Internet Says On Scalability For December 21st, 2018

SRE vs DevOps: What you need to know

Building Netflix’s Distributed Tracing Infrastructure

How to maximize serverless benefits and overcome its challenges

Starting an SRE Team? Stay Away From Uptime.

Improved Alerting with Atlas Streaming Eval

Introducing Netflix TimeSeries Data Abstraction Layer

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Rebuilding Netflix Video Processing Pipeline with Microservices

Consistent caching mechanism in Titus Gateway

Rapid Event Notification System at Netflix

What is AWS Lambda?

Netflix at AWS re:Invent 2019

Supporting Diverse ML Systems at Netflix

Observability vs. monitoring: What’s the difference?

DevOps observability: A guide for DevOps and DevSecOps teams

Stuff The Internet Says On Scalability For August 17th, 2018

Automated observability, security, and reliability at scale

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Real user monitoring vs. synthetic monitoring: Understanding best practices

A one size fits all database doesn't fit anyone

Under the Hood of Amazon EC2 Container Service

Accelerating Data: Faster and More Scalable ElastiCache for Redis

Stay Connected