Data, Data Engineering and Infrastructure - Technology Performance Pulse

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

DZone

JULY 3, 2023

Data engineering projects often require the setup and management of complex infrastructures that support data processing, storage, and analysis. In this article, we will explore the benefits of leveraging IaC for data engineering projects and provide detailed implementation steps to get started.

Data Engineering

Data Engineering Infrastructure Engineering Code

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

Infrastructure

Infrastructure Big Data Transportation Architecture

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

DZone

MARCH 29, 2023

Data migration is the process of moving data from one location to another, which is an essential aspect of cloud migration. Data migration involves transferring data from on-premise storage to the cloud. With the rapid adoption of cloud computing , businesses are moving their IT infrastructure to the cloud.

Best Practices

Best Practices Cloud Data Engineering Storage

Bringing Software Engineering Rigor to Data

DZone

FEBRUARY 20, 2023

In software engineering, we've learned that building robust and stable applications has a direct correlation with overall organization performance. The data community is striving to incorporate the core concepts of engineering rigor found in software communities but still has further to go.

Software Engineering

Software Engineering Engineering Software Software

Ready-to-go sample data pipelines with Dataflow

The Netflix TechBlog

DECEMBER 3, 2022

by Jasmine Omeke , Obi-Ike Nwoke , Olek Gorajek Intro This post is for all data practitioners, who are interested in learning about bootstrapping, standardization and automation of batch data pipelines at Netflix. You may remember Dataflow from the post we wrote last year titled Data pipeline asset management with Dataflow.

Best Practices

Best Practices Code Testing Data Engineering

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

Central engineering teams enable this operational model by reducing the cognitive burden on innovation teams through solutions related to securing, scaling and strengthening (resilience) the infrastructure. All these micro-services are currently operated in AWS cloud infrastructure. Canaries ), detection and improved KPIs.

Infrastructure

Infrastructure Cloud Scalability AWS

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Experimentation is a major focus of Data Science across Netflix

The Netflix TechBlog

JANUARY 11, 2022

Here we describe the role of Experimentation and A/B testing within the larger Data Science and Engineering organization at Netflix, including how our platform investments support running tests at scale while enabling innovation. Curious to learn more about other Data Science and Engineering functions at Netflix?

Innovation

Innovation Metrics Engineering Testing

Data pipeline asset management with Dataflow

The Netflix TechBlog

FEBRUARY 9, 2022

JAR) form to be executed as part of the user defined data pipeline. data pipeline ?—?a DAG) for the purpose of transforming data using some business logic. Netflix homegrown CLI tool for data pipeline management. task, an atomic unit of data transformation logic, a non-separable execution block in the workflow chain.

Storage

Storage Data Engineering Testing Code

Reimagining Experimentation Analysis at Netflix

The Netflix TechBlog

SEPTEMBER 10, 2019

After recreating the dataset, you can plot the raw numbers and perform custom analyses to understand the distribution of the data across test cells. Our A/B tests range across UI, algorithms, messaging, marketing, operations, and infrastructure changes. Our data scientists faced numerous challenges in our previous infrastructure.

Metrics

Metrics Architecture Infrastructure Innovation

What is IT automation?

Dynatrace

JULY 6, 2022

Scripts and procedures usually focus on a particular task, such as deploying a new microservice to a Kubernetes cluster, implementing data retention policies on archived files in the cloud, or running a vulnerability scanner over code before it’s deployed. The range of use cases for automating IT is as broad as IT itself.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Berg , Romain Cledat , Kayla Seeley , Shashank Srikanth , Chaoying Wang , Darin Yu Netflix uses data science and machine learning across all facets of the company, powering a wide range of business applications from our internal infrastructure and content demand modeling to media understanding.

Systems

Systems Media Cache Open Source

5 key areas for tech leaders to watch in 2020

O'Reilly

FEBRUARY 18, 2020

It’s also the data source for our annual usage study, which examines the most-used topics and the top search terms. [1]. This year’s growth in Python usage was buoyed by its increasing popularity among data scientists and machine learning (ML) and artificial intelligence (AI) engineers.

Software Architecture

Software Architecture DevOps Data Engineering Architecture

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. Technology advancements in content creation and consumption have also increased its data footprint. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

MARCH 2, 2021

Stephanie Lane , Wenjing Zheng , Mihir Tendulkar Source credit: Netflix Within the rapid expansion of data-related roles in the last decade, the title Data Scientist has emerged as an umbrella term for myriad skills and areas of business focus. Learning through data is in Netflix’s DNA. It can be hard to know from the outside.

Analytics

Analytics C++ Innovation Engineering

SIEM Volume Spike Alerts Using ML

DZone

JANUARY 31, 2024

SIEM platforms offer centralized management of security operations, making it easier for organizations to monitor, manage, and secure their IT infrastructure. SIEM systems enable early detection of security threats and suspicious activities by analyzing vast amounts of log data in real time.

Data Engineering

Data Engineering Storage Network Infrastructure

Scaling Appsec at Netflix (Part 2)

The Netflix TechBlog

JUNE 6, 2022

This approach has also allowed us to build strong relationships with central engineering teams at Netflix (Data Platform, Developer Tools, Cloud Infrastructure, IAM Product Engineering) that will continue to serve as central points of leverage for security in the long term.

Software Engineering

Software Engineering Scalability Education Engineering

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

The Netflix TechBlog

MAY 26, 2020

Cloud Network Insight is a suite of solutions that provides both operational and analytical insight into the Cloud Network Infrastructure to address the identified problems. At Netflix we publish the Flow Log data to Amazon S3. And in order to gain visibility into these logs, we need to somehow ingest and enrich this data.

Network

Network Tuning AWS Traffic

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

This allows data to be sent to the device from backend services on demand, without the need for continually polling requests from the device. This question has been the driving force behind nearly all of the recent features built on top of Pushy, and it’s an exciting question to ask, particularly as an infrastructure team.

Latency

Latency Cache Tuning Efficiency

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Uber Engineering

AUGUST 14, 2019

Maintaining Uber’s large-scale data warehouse comes with an operational cost in terms of ETL functions and storage. Once identified, … The post Less is More: Engineering Data Warehouse Efficiency with Minimalist Design appeared first on Uber Engineering Blog.

Efficiency

Efficiency Engineering Design Storage

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

All Things Distributed

NOVEMBER 15, 2016

Previously, I wrote about Amazon QuickSight , a new service targeted at business users that aims to simplify the process of deriving insights from a wide variety of data sources quickly, easily, and at a low cost. Put simply, data is not always readily available and accessible to organizational end users.

Analytics

Analytics Availability Media Social Media

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

We live in a world where massive volumes of data are generated from websites, connected devices and mobile apps. In such a data intensive environment, making key business decisions such as running marketing and sales campaigns, logistic planning, financial analysis and ad targeting require deriving insights from these data.

Cloud

Cloud Big Data AWS Analytics

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

NOVEMBER 12, 2019

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Who's Hiring? Apply here. Check out the job opening on AngelList.

Education

Education Java Software Engineering Engineering

Friends don't let friends build data pipelines

Abhishek Tiwari

JULY 12, 2018

Building data pipelines can offer strategic advantages to the business. Often companies underestimate the necessary effort and cost involved to build and maintain data pipelines. Data pipeline initiatives are generally unfinished projects. In this post, we will discuss why you should avoid building data pipelines in first place.

Latency

Latency Analytics Scalability Engineering

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

OCTOBER 29, 2019

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Who's Hiring? Apply here. Check out the job opening on AngelList.

Education

Education Java Software Engineering Engineering

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

JANUARY 7, 2020

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Who's Hiring? Apply here. Check out the job opening on AngelList.

Education

Education Java Software Engineering Engineering

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

DECEMBER 12, 2019

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Who's Hiring? Apply here. Check out the job opening on AngelList.

Education

Education Java Software Engineering Engineering

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. Technology advancements in content creation and consumption have also increased its data footprint. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

Netflix shares how Amazon EC2 Auto Scaling allows its infrastructure to automatically adapt to changing traffic patterns in order to keep its audience entertained and its costs on target. Technology advancements in content creation and consumption have also increased its data footprint. Wednesday?—?December

AWS

AWS Entertainment Open Source Benchmarking

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 3, 2020

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Learn how world-class tech companies crush the hiring game! Apply here.

Education

Education Engineering Games Java

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 18, 2020

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Learn how world-class tech companies crush the hiring game! Apply here.

Education

Education Engineering Games Java

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 17, 2020

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Who's Hiring? Apply here. Check out the job opening on AngelList.

Education

Education Engineering Java Servers

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 9, 2020

Sisu Data is looking for machine learning engineers who are eager to deliver their features end-to-end, from Jupyter notebook to production, and provide actionable insights to businesses based on their first-party, streaming, and structured relational data. Learn how world-class tech companies crush the hiring game! Apply here.

Education

Education Engineering Games Java

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 24, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Engineering Big Data

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 30, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Engineering Big Data

Presentation: Modern Compute Stack for Scaling Large AI/ML/LLM Workloads

InfoQ

MAY 8, 2024

Jules Damji discusses which infrastructure should be used for distributed fine-tuning and training, how to scale ML workloads, how to accommodate large models, and how can CPUs and GPUs be utilized? By Jules Damji

Tuning

Tuning Infrastructure Artificial Intelligence Data Engineering

AWS Launches General Availability of Amazon EC2 P5 Instances for AI/ML and HPC Workloads

InfoQ

AUGUST 3, 2023

The GA is a follow-up to the earlier announcement of the development of the infrastructure. AWS recently announced the general availability (GA) of Amazon EC2 P5 instances powered by the latest NVIDIA H100 Tensor Core GPUs suitable for users that require high performance and scalability in AI/ML and HPC workloads. By Steef-Jan Wiggers

AWS

AWS Availability Scalability Infrastructure

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

Trending Sources

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

Bringing Software Engineering Rigor to Data

Ready-to-go sample data pipelines with Dataflow

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Optimizing data warehouse storage

Experimentation is a major focus of Data Science across Netflix

Data pipeline asset management with Dataflow

Reimagining Experimentation Analysis at Netflix

What is IT automation?

Supporting Diverse ML Systems at Netflix

5 key areas for tech leaders to watch in 2020

Netflix at AWS re:Invent 2019

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

SIEM Volume Spike Alerts Using ML

Scaling Appsec at Netflix (Part 2)

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Spice up your Analytics: Amazon QuickSight Now Generally Available in N. Virginia, Oregon, and Ireland.

Expanding the Cloud: Introducing Amazon QuickSight

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Friends don't let friends build data pipelines

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Presentation: Modern Compute Stack for Scaling Large AI/ML/LLM Workloads

AWS Launches General Availability of Amazon EC2 P5 Instances for AI/ML and HPC Workloads

Stay Connected