AWS, Big Data and Tuning - Technology Performance Pulse

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

Challenges The cloud network infrastructure that Netflix utilizes today consists of AWS services such as VPC, DirectConnect, VPC Peering, Transit Gateways, NAT Gateways, etc and Netflix owned devices. After several iterations of the architecture and some tuning, the solution has proven to be able to scale.

Network

Network Transportation AWS Cloud

Auto-Diagnosis and Remediation in Netflix Data Platform

The Netflix TechBlog

JANUARY 13, 2022

This blog will explore these two systems and how they perform auto-diagnosis and remediation across our Big Data Platform and Real-time infrastructure. Since the data platform manages keystone pipelines, users expect platform issues to be detected and remediated by the Keystone team without any involvement from their end.

Big Data

Big Data Infrastructure Metrics Games

Bringing the Magic of Amazon AI and Alexa to Apps on AWS.

All Things Distributed

NOVEMBER 30, 2016

Last week, I wrote a blog about helping the machine learning scientist community select the right deep learning framework from among many we support on AWS such as MxNet, TensorFlow, Caffe, etc. Developers can build, test, and deploy chatbots directly from the AWS Management Console.

AWS

AWS Lambda Artificial Intelligence Mobile

Python at Netflix

The Netflix TechBlog

APRIL 29, 2019

takes place in Amazon Web Services (AWS), whereas everything that happens afterwards (i.e., The service that orchestrates failover uses numpy and scipy to perform numerical analysis, boto3 to make changes to our AWS infrastructure, rq to run asynchronous workloads and we wrap it all up in a thin layer of Flask APIs. are you logged in?

Open Source

Open Source Network Infrastructure Big Data

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

Service Segmentation: The ease of the cloud deployments has led to the organic growth of multiple AWS accounts, deployment practices, interconnection practices, etc. VPC Flow Logs VPC Flow Logs is an AWS feature that captures information about the IP traffic going to and from network interfaces in a VPC. We named this library Sqooby.

Network

Network Tuning AWS Traffic

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Usually Data scientists and engineers write Extract-Transform-Load (ETL) jobs and pipelines using big data compute technologies, like Spark or Presto , to process this data and periodically compute key information for a member or a video. The processed data is typically stored as data warehouse tables in AWS S3.

Latency

Latency Storage Big Data Tuning

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Netflix’s diverse data landscape made it challenging to capture all the right data and conforming it to a common data model. Spark is the primary big-data compute engine at Netflix and with pretty much every upgrade in Spark, the spark plan changed as well springing continuous and unexpected surprises for us.

Infrastructure

Infrastructure Big Data Transportation Architecture

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

We see that with our Amazon customers; when they hear a great tune on a radio they may identify it using the Shazam or Soundhound apps on their mobile phone and buy that song instantly from the Amazon MP3 store. Driving Storage Costs Down for AWS Customers. Expanding the Cloud - The AWS Storage Gateway. All postings.

AWS

AWS Cloud Storage Internet

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java

Java Scalability Traffic Architecture

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. Orient: Gather tuning parameters for a particular table that changed. Also, respond to ad-hoc requests created manually by end-users.

Storage

Storage Latency Efficiency Data Engineering

Delta: A Data Synchronization and Enrichment Platform

The Netflix TechBlog

OCTOBER 15, 2019

High availability, via standby instances across AWS Availability Zones. We currently support MySQL and Postgres, including when deployed in AWS RDS and its Aurora flavor. Please stay tuned. No need to acquire locks on tables, which is essential to ensure that the write traffic on the database is never blocked by our service.

Transportation

Transportation Architecture Processing Storage

Will AWS Have Anything New To Say About Sustainability at re:Invent 2024?

Adrian Cockcroft

NOVEMBER 18, 2024

Photo by Adrian of my father’s “round tuit” which I’m hoping will inspire AWS to do something… There’s an old saying that any headline that ends in a question mark can be answered with a “no”. Learn from Nasdaq, whose AI-powered environmental, social, and governance (ESG) platform uses Amazon Bedrock and AWS Lambda.

AWS

AWS Energy Lambda Government

Technology Performance Pulse

How Netflix uses eBPF flow logs at scale for network insight

Auto-Diagnosis and Remediation in Netflix Data Platform

Trending Sources

Bringing the Magic of Amazon AI and Alexa to Apps on AWS.

Python at Netflix

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Music to my Ears - All Things Distributed

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Optimizing data warehouse storage

Delta: A Data Synchronization and Enrichment Platform

Will AWS Have Anything New To Say About Sustainability at re:Invent 2024?

Stay Connected