Cloud, Design and Latency - Technology Performance Pulse

Designing Instagram

High Scalability

JANUARY 11, 2022

Design a photo-sharing platform similar to Instagram where users can upload their photos and share it with their followers. High Level Design. Component Design. API Design. We have provided the API design of posting an image on Instagram below. API Design. Problem Statement. Architecture. Fetching User Feed.

Design

Design Media Storage Logistics

Netflix’s Distributed Counter Abstraction

The Netflix TechBlog

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Latency

Latency Cache Infrastructure Strategy

Netflix Cloud Packaging in the Terabyte Era

The Netflix TechBlog

SEPTEMBER 24, 2021

As an example, cloud-based post-production editing and collaboration pipelines demand a complex set of functionalities, including the generation and hosting of high quality proxy content. Uploading and downloading data always come with a penalty, namely latency. Supporting those workflows poses new challenges to our packaging service.

Cloud

Cloud Media Storage Cache

RabbitMQ vs. Kafka: Key Differences

Scalegrid

FEBRUARY 6, 2025

RabbitMQ is designed for flexible routing and message reliability, while Kafka handles high-throughput event streaming and real-time data processing. Its design prioritizes high availability and efficient data transfer with minimal overhead, making it a practical choice for handling real-time data pipelines and distributed event processing.

Latency

Latency Analytics Architecture Storage

What is cloud migration?

Dynatrace

SEPTEMBER 30, 2021

More organizations than ever are undertaking cloud migration as digital transformation continues to gain momentum across every industry in every region. But what does it take to migrate your existing applications to the cloud? What is cloud migration? However, it can also mean migrating from one cloud to another.

Cloud

Cloud Traffic Best Practices Strategy

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Smashing Magazine

JANUARY 7, 2025

How To Design For High-Traffic Events And Prevent Your Website From Crashing How To Design For High-Traffic Events And Prevent Your Website From Crashing Saad Khan 2025-01-07T14:00:00+00:00 2025-01-07T22:04:48+00:00 This article is sponsored by Cloudways Product launches and sales typically attract large volumes of traffic.

Traffic

Traffic Website Design Cache

Spring WebFlux: publishOn vs subscribeOn for Improving Microservices Performance

DZone

SEPTEMBER 23, 2024

With the rise of microservices architecture , there has been a rapid acceleration in the modernization of legacy platforms, leveraging cloud infrastructure to deliver highly scalable, low-latency, and more responsive services. Why Use Spring WebFlux?

Performance

Performance Latency Architecture Programming

MySQL on Azure Performance Benchmark – ScaleGrid vs. Azure Database

Scalegrid

AUGUST 26, 2020

Microsoft Azure is one of the most popular cloud providers in the world, and a natural fit for database hosting on applications leveraging Microsoft across their infrastructure. ScaleGrid MySQL on Azure so you can see which provider offers the best throughput and latency performance. We measure latency in ms 95th percentile latency.

Azure

Azure Benchmarking Database Latency

Seeing through hardware counters: a journey to threefold performance increase

The Netflix TechBlog

NOVEMBER 9, 2022

A quick canary test was free of errors and showed lower latency, which is expected given that our standard canary setup routes an equal amount of traffic to both the baseline running on 4xl and the canary on 12xl. What’s worse, average latency degraded by more than 50%, with both CPU and latency patterns becoming more “choppy.”

Hardware

Hardware Cache Performance Latency

Best Practices for Scaling RabbitMQ

Scalegrid

FEBRUARY 24, 2025

The architecture of RabbitMQ is meticulously designed for complex message routing, enabling dynamic and flexible interactions between producers and consumers. While clustering across wide-area networks (WANs) is discouraged due to latency issues, leased links can mitigate some connectivity challenges.

Best Practices

Best Practices Traffic Strategy Efficiency

Introducing Netflix’s Key-Value Data Abstraction Layer

The Netflix TechBlog

SEPTEMBER 18, 2024

These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination. It also serves as central configuration of access patterns such as consistency or latency targets.

Latency

Latency Storage Cache Servers

What is AWS Lambda?

Dynatrace

APRIL 5, 2021

The 2014 launch of AWS Lambda marked a milestone in how organizations use cloud services to deliver their applications more efficiently, by running functions at the edge of the cloud without the cost and operational overhead of on-premises servers. AWS continues to improve how it handles latency issues. Dynatrace news.

Lambda

Lambda AWS Serverless Hardware

What is observability? Not just logs, metrics and traces

Dynatrace

OCTOBER 1, 2021

As dynamic systems architectures increase in complexity and scale, IT teams face mounting pressure to track and respond to conditions and issues across their multi-cloud environments. Observability relies on telemetry derived from instrumentation that comes from the endpoints and services in your multi-cloud computing environments.

Metrics

Metrics Open Source Monitoring Cloud

Dynatrace automatically monitors OpenAI ChatGPT for companies that deliver reliable, cost-effective services powered by generative AI

Dynatrace

JUNE 7, 2023

A typical design pattern is the use of a semantic search over a domain-specific knowledge base, like internal documentation, to provide the required context in the prompt. With these latency, reliability, and cost measurements in place, your operations team can now define their own OpenAI dashboards and SLOs.

Monitoring

Monitoring Latency Metrics Azure

SRE vs DevOps: What you need to know

Dynatrace

FEBRUARY 24, 2021

The events of 2020 accelerated the trend of organizations shifting to cloud-native technologies in response to the dramatic increase in demand for online services. Cloud-native environments bring speed and agility to software development and operations (DevOps) practices. Reduced latency. Dynatrace news. Efficiency.

DevOps

DevOps Software Engineering Speed Google

Why growing AI adoption requires an AI observability strategy

Dynatrace

JANUARY 17, 2024

Growing AI adoption brings rising cloud costs There are three key reasons that AI costs can spiral out of control: AI consumes additional resources. Running artificial intelligence models and querying data requires massive amounts of computational resources in the cloud, which results in higher cloud costs. Use containerization.

Strategy

Strategy Artificial Intelligence Storage Cloud

Implementing AWS well-architected pillars with automated workflows

Dynatrace

SEPTEMBER 13, 2023

If you use AWS cloud services to build and run your applications, you may be familiar with the AWS Well-Architected framework. This is a set of best practices and guidelines that help you design and operate reliable, secure, efficient, cost-effective, and sustainable systems in the cloud.

AWS

AWS Efficiency Azure Cloud

Artificial Intelligence in Cloud Computing

Scalegrid

JANUARY 8, 2024

Exploring artificial intelligence in cloud computing reveals a game-changing synergy. This article delves into the specifics of how AI optimizes cloud efficiency, ensures scalability, and reinforces security, providing a glimpse at its transformative role without giving away extensive details.

Artificial Intelligence

Artificial Intelligence Cloud Scalability Analytics

Dynatrace accelerates business transformation with new AI observability solution

Dynatrace

JANUARY 31, 2024

Many organizations face significant challenges in pursuing their cloud migration initiatives, which often accompany or precede AI initiatives. Worse, the costs associated with GenAI aren’t straightforward, are often multi-layered, and can be five times higher than traditional cloud services. Service reliability.

Cache

Cache Azure Infrastructure Monitoring

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

As more organizations adopt cloud-native technologies, traditional approaches to IT operations have been evolving. Complex cloud computing environments are increasingly replacing traditional data centers. The importance of ITOps cannot be overstated, especially as organizations adopt more cloud-native technologies.

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

For your eyes only: improving Netflix video quality with neural networks

The Netflix TechBlog

NOVEMBER 17, 2022

Our approach to NN-based video downscaling The deep downscaler is a neural network architecture designed to improve the end-to-end video quality by learning a higher-quality video downscaler. We employed an adaptive network design that is applicable to the wide variety of resolutions we use for encoding.

Network

Network Media Innovation Efficiency

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

Mastering Hybrid Cloud Strategy Are you looking to leverage the best private and public cloud worlds to propel your business forward? A hybrid cloud strategy could be your answer. This approach allows companies to combine the security and control of private clouds with public clouds’ scalability and innovation potential.

Strategy

Strategy Cloud Infrastructure Artificial Intelligence

Predictive CPU isolation of containers at Netflix

The Netflix TechBlog

JUNE 4, 2019

When you’re running in the cloud your containers are in a shared space; in particular they share the CPU’s memory hierarchy of the host instance. These applications range from critical low-latency services powering our customer-facing video streaming service, to batch jobs for encoding or machine learning.

Cache

Cache Latency Airlines Logistics

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

All Things Distributed

JANUARY 18, 2012

a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Today is a very exciting day as we release Amazon DynamoDB , a fast, highly reliable and cost-effective NoSQL database service designed for internet scale applications. Amazon DynamoDB offers low, predictable latencies at any scale. Comments ().

Scalability

Scalability Database Ecommerce Latency

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

To support this growth, we’ve revisited Pushy’s past assumptions and design decisions with an eye towards both Pushy’s future role and future stability. In our case, we value low latency — the faster we can read from KeyValue, the faster these messages can get delivered.

Latency

Latency Cache Tuning Efficiency

What is real user monitoring (RUM)?

Dynatrace

JANUARY 13, 2022

Providing insight into the service latency to help developers identify poorly performing code. And UX designers can use that data to better understand how users interact with an application and how developers can streamline the interface. Want to learn more? There are also some limitations of real user monitoring.

Monitoring

Monitoring Mobile Latency Best Practices

Plan Your Multi Cloud Strategy

Scalegrid

MARCH 22, 2024

Thinking about going multi-cloud? A well-planned multi cloud strategy can seriously upgrade your business’s tech game, making you more agile. Key Takeaways Multi-cloud strategies have become increasingly popular due to the need for flexibility, innovation, and the avoidance of vendor lock-in.

Strategy

Strategy Cloud Government Innovation

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN. We explore all the systems necessary to make and stream content from Netflix.

AWS

AWS Entertainment Open Source Benchmarking

Seamlessly Swapping the API backend of the Netflix Android app

The Netflix TechBlog

SEPTEMBER 8, 2020

How we migrated our Android endpoints out of a monolith into a new microservice by Rohan Dhruva , Ed Ballot As Android developers, we usually have the luxury of treating our backends as magic boxes running in the cloud, faithfully returning us JSON.

Latency

Latency Cache Java Traffic

The Best Way to Host MongoDB on DigitalOcean

Scalegrid

DECEMBER 16, 2019

While you may assume a great majority of the cloud database deployments are run on AWS, Azure, or Google Cloud Platform, small to medium-sized businesses in particular are gravitating towards the developer-friendly cloud provider, DigitalOcean , for their hosting for MongoDB® needs. DigitalOcean Droplets.

Azure

Azure AWS Database Latency

Towards a Reliable Device Management Platform

The Netflix TechBlog

AUGUST 30, 2021

Complementing the hardware is the software on the RAE and in the cloud, and bridging the software on both ends is a bi-directional control plane. Since Kafka is a supported messaging platform at Netflix, a bridge is established between the two protocols to allow cloud-side services to communicate with the control plane.

Latency

Latency Traffic Transportation Cloud

The Netflix Cosmos Platform

The Netflix TechBlog

MARCH 1, 2021

It supports both high throughput services that consume hundreds of thousands of CPUs at a time, and latency-sensitive workloads where humans are waiting for the results of a computation. The subsystems all communicate with each other asynchronously via Timestone, a high-scale, low-latency priority queuing system. Warm capacity.

Serverless

Serverless Media Latency Social Media

Zero Configuration Service Mesh with On-Demand Cluster Discovery

The Netflix TechBlog

AUGUST 29, 2023

A brief history of IPC at Netflix Netflix was early to the cloud, particularly for large-scale companies: we began the migration in 2008, and by 2010, Netflix streaming was fully run on AWS. Today we have a wealth of tools, both OSS and commercial, all designed for cloud-native environments.

Traffic

Traffic Latency Cloud C++

Rapid Event Notification System at Netflix

The Netflix TechBlog

FEBRUARY 18, 2022

Scaling Policies To address the thundering herd problem and to keep latencies under acceptable thresholds, the cluster scale-up policies are configured to be more aggressive than the scale-down policies. This approach enables the computing power to catch up quickly when the queues grow.

Systems

Systems Traffic Architecture Mobile

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

by Tomasz Bak and Fabio Kung Introduction Titus is the Netflix cloud container runtime that runs and manages containers at scale. In that scenario, the system would need to deal with the data propagation latency directly, for example, by use of timeouts or client-originated update tracking mechanisms.

Cache

Cache Latency Traffic Systems

Expanding the Cloud ? The Amazon Simple Workflow Service - All.

All Things Distributed

FEBRUARY 22, 2012

Expanding the Cloud â?? Amazon SWF makes it very easy for developers to architect and implement these tasks, run them in the cloud or on premise and coordinate their flow. By designing autonomous distributed components, developers get the flexibility to deploy and scale out parts of the application independently as load increases.

Cloud

Cloud AWS Java Scalability

What Is a Workload in Cloud Computing

Scalegrid

JANUARY 12, 2024

What is workload in cloud computing? Simply put, it’s the set of computational tasks that cloud systems perform, such as hosting databases, enabling collaboration tools, or running compute-intensive algorithms. The environments, which were previously isolated, are now working seamlessly under central control.

Cloud

Cloud Virtualization Storage Efficiency

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

High Scalability

SEPTEMBER 8, 2018

RISELabs , those wonderfully innovative folks over at Berkeley, have uplifted their Anna datatabase —a shared-nothing, thread-per-core architecture to achieve lightning-fast speeds by avoiding all coordination mechanisms—to become cloud-aware. Anna Paper: Eliminating Boundaries in Cloud Storage with Anna. What's changed ?

Storage

Storage Performance AWS Cloud

Expanding the Cloud – The Second AWS GovCloud (US) Region, AWS GovCloud (US-East)

All Things Distributed

NOVEMBER 12, 2018

The AWS GovCloud (US-East) Region is our second AWS GovCloud (US) Region, joining AWS GovCloud (US-West) to further help US government agencies, the contractors that serve them, and organizations in highly regulated industries move more of their workloads to the AWS Cloud by implementing a number of US government-specific regulatory requirements.

AWS

AWS Healthcare Cloud Government

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

Helios: hyperscale indexing for the cloud & edge , Potharaju et al., Cloud-native systems represent by far the largest, most distributed, computing systems in our history. And the established cloud-native architectural principles behind them aren’t changing here. PVLDB’20. Emphasis mine ). Emphasis mine ).

Cloud

Cloud Big Data Latency Architecture

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Dynatrace

DECEMBER 22, 2019

AWS offers a broad set of global, cloud-based services including computing, storage, networking, Internet of Things (IoT), and many others. Therefore, the ability to view a real-time map of your applications, services, and cloud resources is key to your success. Dynatrace news. You can also create custom charts. Requirements.

AWS

AWS Metrics IoT Storage

Extending Dynatrace

Dynatrace

JULY 10, 2019

OneAgent & cloud metrics. Virtualization can be a key player in your process’ performance, and Dynatrace has built-in integrations to bring metrics about the Cloud Infrastructure into your Dynatrace environment. Dynatrace provides out-of-the-box support for VMware, AWS, Azure, Pivotal Cloud Foundry, and Kubernetes.

Java

Java Best Practices Metrics Azure

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Expanding the Cloud - Cluster Compute Instances for Amazon EC2. Today, Amazon Web Services took very an important step in unlocking the advantages of cloud computing for a very important application area. Cluster Computer Instances for Amazon EC2 are a new instance type specifically designed for High Performance Computing applications.

Cloud

Cloud AWS Automotive Latency

Expanding the Cloud: Enabling Globally Distributed Applications and Disaster Recovery

All Things Distributed

NOVEMBER 26, 2013

About 5 years ago, I introduced you to AWS Availability Zones, which are distinct locations within a Region that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same region.

Cloud

Cloud AWS Traffic Latency

Designing Instagram

Netflix’s Distributed Counter Abstraction

Trending Sources

Netflix Cloud Packaging in the Terabyte Era

RabbitMQ vs. Kafka: Key Differences

What is cloud migration?

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Spring WebFlux: publishOn vs subscribeOn for Improving Microservices Performance

MySQL on Azure Performance Benchmark – ScaleGrid vs. Azure Database

Seeing through hardware counters: a journey to threefold performance increase

Best Practices for Scaling RabbitMQ

Introducing Netflix’s Key-Value Data Abstraction Layer

What is AWS Lambda?

What is observability? Not just logs, metrics and traces

Dynatrace automatically monitors OpenAI ChatGPT for companies that deliver reliable, cost-effective services powered by generative AI

SRE vs DevOps: What you need to know

Why growing AI adoption requires an AI observability strategy

Implementing AWS well-architected pillars with automated workflows

Artificial Intelligence in Cloud Computing

Dynatrace accelerates business transformation with new AI observability solution

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

For your eyes only: improving Netflix video quality with neural networks

Mastering Hybrid Cloud Strategy

Predictive CPU isolation of containers at Netflix

Amazon DynamoDB ? a Fast and Scalable NoSQL Database.

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

What is real user monitoring (RUM)?

Plan Your Multi Cloud Strategy

Netflix at AWS re:Invent 2019

Seamlessly Swapping the API backend of the Netflix Android app

The Best Way to Host MongoDB on DigitalOcean

Towards a Reliable Device Management Platform

The Netflix Cosmos Platform

Zero Configuration Service Mesh with On-Demand Cluster Discovery

Rapid Event Notification System at Netflix

Consistent caching mechanism in Titus Gateway

Expanding the Cloud ? The Amazon Simple Workflow Service - All.

What Is a Workload in Cloud Computing

The Anna Key-Value Store Now Has 355x the Performance of DynamoDB for the Dollar

Expanding the Cloud – The Second AWS GovCloud (US) Region, AWS GovCloud (US-East)

Helios: hyperscale indexing for the cloud & edge – part 1

Get up to 300 new metrics out of the box with AWS supporting services (GA)

Extending Dynatrace

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

Expanding the Cloud: Enabling Globally Distributed Applications and Disaster Recovery

Stay Connected