AWS and Data Engineering - Technology Performance Pulse

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. We’ve compiled our speaking events below so you know what we’ve been working on.

AWS

AWS Entertainment Open Source Benchmarking

Bringing Software Engineering Rigor to Data

DZone

FEBRUARY 20, 2023

This is a recording of a breakout session from AWS Heroes at re:Invent 2022, presented by AWS Hero Zainab Maleki. In software engineering, we've learned that building robust and stable applications has a direct correlation with overall organization performance. Posted with permission.

Software Engineering

Software Engineering Engineering Software Software

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. We’ve compiled our speaking events below so you know what we’ve been working on.

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

by Shefali Vyas Dalal AWS re:Invent is a couple weeks away and our engineers & leaders are thrilled to be in attendance yet again this year! Technology advancements in content creation and consumption have also increased its data footprint. We’ve compiled our speaking events below so you know what we’ve been working on.

AWS

AWS Entertainment Open Source Benchmarking

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

DZone

DECEMBER 27, 2023

This post focuses on elevating our data engineering game, streamlining your data workflows, and significantly cutting computing costs. The need to optimize offline data pipeline optimization has become a necessity with the growing complexity and scale of modern data pipelines.

Best Practices

Best Practices Data Engineering Big Data Games

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

All these micro-services are currently operated in AWS cloud infrastructure. As a micro-service owner, a Netflix engineer is responsible for its innovation as well as its operation, which includes making sure the service is reliable, secure, efficient and performant. Give us a holler if you are interested in a thought exchange.

Infrastructure

Infrastructure Cloud Scalability AWS

5 key areas for tech leaders to watch in 2020

O'Reilly

FEBRUARY 18, 2020

The results for data-related topics are both predictable and—there’s no other way to put it—confusing. Starting with data engineering, the backbone of all data work (the category includes titles covering data management, i.e., relational databases, Spark, Hadoop, SQL, NoSQL, etc.). This follows a 3% drop in 2018.

Software Architecture

Software Architecture DevOps Data Engineering Architecture

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

We adopted the following mission statement to guide our investments: “Provide a complete and accurate data lineage system enabling decision-makers to win moments of truth.” We are loading the lineage data to a graph database to enable seamless integration with a REST data lineage service to address business use cases.

Infrastructure

Infrastructure Big Data Transportation Architecture

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Finally, when all matching is done and data is written the new table is committed so it can be read by other jobs. Compute: Titus Whereas open-source users of Metaflow rely on AWS Batch or Kubernetes as the compute backend , we rely on our centralized compute-platform, Titus.

Systems

Systems Media Cache Open Source

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. Some of the optimizations are prerequisites for a high-performance data warehouse.

Storage

Storage Latency Efficiency Data Engineering

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

Adrian Cockcroft

FEBRUARY 13, 2023

Sustainability at AWS re:Invent 2022 -All the talks and videos I could find… Las Vegas MSG Sphere under construction next door to the Venetian Sands Expo Center — Photo by Adrian This blog post is long overdue — I spent too long trying to find time to watch all the videos, and finally gave up and listed a few below that I haven’t seen.

AWS

AWS Energy Architecture Programming

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

Service Segmentation: The ease of the cloud deployments has led to the organic growth of multiple AWS accounts, deployment practices, interconnection practices, etc. VPC Flow Logs VPC Flow Logs is an AWS feature that captures information about the IP traffic going to and from network interfaces in a VPC.

Network

Network Tuning AWS Traffic

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

All Things Distributed

NOVEMBER 15, 2016

As I mentioned, we live in a world where massive volumes of data are being generated, every day, from connected devices, websites, mobile apps, and customer applications running on top of AWS infrastructure. Auto-discovery : One of the challenges with BI is discovering and accessing the data.

Analytics

Analytics Availability Media Social Media

AWS Launches General Availability of Amazon EC2 P5 Instances for AI/ML and HPC Workloads

InfoQ

AUGUST 3, 2023

AWS recently announced the general availability (GA) of Amazon EC2 P5 instances powered by the latest NVIDIA H100 Tensor Core GPUs suitable for users that require high performance and scalability in AI/ML and HPC workloads. The GA is a follow-up to the earlier announcement of the development of the infrastructure. By Steef-Jan Wiggers

AWS

AWS Availability Scalability Infrastructure

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

However, the data infrastructure to collect, store and process data is geared toward developers (e.g., In AWS’ quest to enable the best data storage options for engineers, we have built several innovative database solutions like Amazon RDS, Amazon RDS for Aurora, Amazon DynamoDB, and Amazon Redshift.

Cloud

Cloud Big Data AWS Analytics

2021 Data/AI Salary Survey

O'Reilly

SEPTEMBER 15, 2021

Cloud certifications, specifically in AWS and Microsoft Azure, were most strongly associated with salary increases. As we’ll see later, cloud certifications (specifically in AWS and Microsoft Azure) were the most popular and appeared to have the largest effect on salaries. The top certification was for AWS (3.9% The Last Word.

Azure

Azure Programming AWS Social Media

Back-to-Basics Weekend Reading - The 5 Minute Rule - All Things.

All Things Distributed

AUGUST 24, 2012

The AWS team launched this week Amazon Glacier , a cold storage archive service at the very low price point of $0.01 Which makes this week a good moment to read up on some of the historical work around the costs of data engineering. I am in the midst of my South America tour in the beautiful but very cold Santiago, Chile.

Storage

Storage Hardware AWS Data Engineering

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

InfoQ

AUGUST 7, 2024

Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.

AWS

AWS Analytics Performance Data Engineering

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

As the usage increased, we had to vertically scale the system to keep up and were approaching AWS instance type limits. increasing at > 100% a year, the need for a scalable data workflow orchestrator has become paramount for Netflix’s business needs. Meson was based on a single leader architecture with high availability.

Java

Java Scalability Traffic Architecture

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

Where aws ends and the internet begins is an exercise left to the reader. The folks on the Cloud Data Engineering (CDE) team, the ones building the paved path for internal data at Netflix, graciously helped us scale it up and make adjustments, but it ended up being an involved process as we kept growing.

Latency

Latency Cache Tuning Efficiency

Symphonia at Velocity 2018, and more Serverless Insights

The Symphonia

JUNE 19, 2018

I was fortunate to be both presenting a 2-day workshop (on AWS Serverless Architectures and Continuous Deployment) as well as hosting a full-day Serverless track of talks. This has proved especially true in the last couple of months, as we helped a company update it’s entire AWS infrastructure in a number of critical ways. Great stuff!

Serverless

Serverless AWS DevOps Open Source

Data Pipelines: The Hammer for Every Nail

Abhishek Tiwari

JULY 7, 2023

Airflow provides rich scheduling and execution semantics enabling data engineers to easily define complex pipelines, running at regular intervals. In reality, a DAG lacks the necessary workflow context, and relying solely on it can result in incomplete solutions and missed opportunities.

Logistics

Logistics Transportation Scalability Data Engineering

Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs

InfoQ

DECEMBER 29, 2023

Zendesk reduced its data storage costs by over 80% by migrating from DynamoDB to a tiered storage solution using MySQL and S3. The company considered different storage technologies and decided to combine the relational database and the object store to strike a balance between querybility and scalability while keeping the costs down.

Storage

Storage Scalability Database Technology

Cloud Efficiency at Netflix

The Netflix TechBlog

DECEMBER 17, 2024

By J Han , PallaviPhadnis Context At Netflix, we use Amazon Web Services (AWS) for our cloud infrastructure needs, such as compute, storage, and networking to build and run the streaming platform that we love. In turn, our self-serve platforms allow teams to create and deploy, sometimes custom, workloads more efficiently.

Efficiency

Efficiency Cloud Analytics Infrastructure

Educating a New Generation of Workers

O'Reilly

NOVEMBER 26, 2024

Entirely new paradigms rise quickly: cloud computing, data engineering, machine learning engineering, mobile development, and large language models. It’s less risky to hire adjunct professors with industry experience to fill teaching roles that have a vocational focus: mobile development, data engineering, and cloud computing.

Education

Education Azure AWS Java

Technology Performance Pulse

Netflix at AWS re:Invent 2019

Bringing Software Engineering Rigor to Data

Trending Sources

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

5 key areas for tech leaders to watch in 2020

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Supporting Diverse ML Systems at Netflix

Optimizing data warehouse storage

Sustainability at AWS re:Invent 2022 All the talks and videos I could find…

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

AWS Launches General Availability of Amazon EC2 P5 Instances for AI/ML and HPC Workloads

Expanding the Cloud: Introducing Amazon QuickSight

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

2021 Data/AI Salary Survey

Back-to-Basics Weekend Reading - The 5 Minute Rule - All Things.

Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Symphonia at Velocity 2018, and more Serverless Insights

Data Pipelines: The Hammer for Every Nail

Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs

Cloud Efficiency at Netflix

Educating a New Generation of Workers

Stay Connected