Data Engineering and Processing - Technology Performance Pulse

Financial Data Engineering in SAS

DZone

JANUARY 8, 2024

Financial data engineering in SAS involves the management, processing, and analysis of financial data using the various tools and techniques provided by the SAS software suite. Here are some key aspects of financial data engineering in SAS: 1.

Data Engineering

Data Engineering Engineering Database Software

Our First Netflix Data Engineering Summit

The Netflix TechBlog

DECEMBER 14, 2023

Engineers from across the company came together to share best practices on everything from Data Processing Patterns to Building Reliable Data Pipelines. The result was a series of talks which we are now sharing with the rest of the Data Engineering community! In this video, Sr.

Data Engineering

Data Engineering Engineering Software Engineering Best Practices

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions. What is late-arriving data? Let’s dive in!

Data Engineering

Data Engineering Engineering Processing Games

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

JULY 15, 2021

Data Engineers of Netflix?—?Interview Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin, what drew you to data engineering?

Data Engineering

Data Engineering Engineering Entertainment Big Data

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

2. Diving Deeper into Psyberg: Stateless vs Stateful Data Processing

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty In the inaugural blog post of this series, we introduced you to the state of our pipelines before Psyberg and the challenges with incremental processing that led us to create the Psyberg framework within Netflix’s Membership and Finance data engineering team.

Processing

Processing Data Engineering Efficiency Analytics

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

by Jun He , Yingyi Zhang , and Pawan Dixit Incremental processing is an approach to process new or changed data in workflows. The key advantage is that it only incrementally processes data that are newly added or updated to a dataset, instead of re-processing the complete dataset.

Processing

Processing Big Data Efficiency Engineering

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

DZone

JULY 3, 2023

Data engineering projects often require the setup and management of complex infrastructures that support data processing, storage, and analysis. Traditionally, this process involved manual configuration, leading to potential inconsistencies, human errors, and time-consuming deployments.

Data Engineering

Data Engineering Infrastructure Engineering Code

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Dhevi Rajendran Dhevi Rajendran This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix.

Data Engineering

Data Engineering Engineering Software Engineering Big Data

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

How TripleLift Built an Adtech Data Pipeline Processing Billions of Events Per Day

High Scalability

JUNE 15, 2020

This is a guest post by Eunice Do , Data Engineer at TripleLift , a technology company leading the next generation of programmatic advertising. The system is the data pipeline at TripleLift. TripleLift is an adtech company, and like most companies in this industry, we deal with high volumes of data on a daily basis.

Processing

Processing Data Engineering Engineering Efficiency

Secrets Detection: Optimizing Filter Processes

DZone

FEBRUARY 8, 2022

While increasing both the precision and the recall of our secrets detection engine, we felt the need to keep a close eye on speed. So it wasn’t a surprise to find that our engine had the same problem: more power, less speed. In a gearbox, if you want to increase torque, you need to decrease speed.

Processing

Processing Benchmarking Speed Engineering

Automated Testing in Data Engineering: An Imperative for Quality and Efficiency

DZone

JANUARY 9, 2024

This holds true for the critical field of data engineering as well. As organizations gather and process astronomical volumes of data, manual testing is no longer feasible or reliable. This comprehensive guide takes an in-depth look at automated testing in the data engineering domain.

Data Engineering

Data Engineering Efficiency Engineering Testing

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

DZone

MARCH 29, 2023

Data migration is the process of moving data from one location to another, which is an essential aspect of cloud migration. Data migration involves transferring data from on-premise storage to the cloud. With the rapid adoption of cloud computing , businesses are moving their IT infrastructure to the cloud.

Best Practices

Best Practices Cloud Data Engineering Storage

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

It requires a state-of-the-art system that can track and process these impressions while maintaining a detailed history of each profiles exposure. This nuanced integration of data and technology empowers us to offer bespoke content recommendations. This queue ensures we are consistently capturing raw events from our global userbase.

Tuning

Tuning Latency Efficiency Storage

Ready-to-go sample data pipelines with Dataflow

The Netflix TechBlog

DECEMBER 3, 2022

Obviously not all tools are made with the same use case in mind, so we are planning to add more code samples for other (than classical batch ETL) data processing purposes, e.g. Machine Learning model building and scoring. This allows other processes, consuming our table, to be notified and start their processing.

Best Practices

Best Practices Code Testing Data Engineering

What is IT automation?

Dynatrace

JULY 6, 2022

At its most basic, automating IT processes works by executing scripts or procedures either on a schedule or in response to particular events, such as checking a file into a code repository. Adding AIOps to automation processes makes the volume of data that applications and multicloud environments generate much less overwhelming.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

3. Psyberg: Automated end to end catch up

The Netflix TechBlog

NOVEMBER 14, 2023

In the previous installments of this series, we introduced Psyberg and delved into its core operational modes: Stateless and Stateful Data Processing. Pipelines After Psyberg Let’s explore how different modes of Psyberg could help with a multistep data pipeline. In this case, the minimum hour to process the data is hour 2.

Processing

Processing Tuning C++ Efficiency

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

There are several benefits of such optimizations like saving on storage, faster query time, cheaper downstream processing, and an increase in developer productivity by removing additional ETLs written only for query performance improvement. Some of the optimizations are prerequisites for a high-performance data warehouse.

Storage

Storage Latency Efficiency Data Engineering

Analytics at Netflix: Who we are and what we do

The Netflix TechBlog

SEPTEMBER 18, 2020

The Engineer enjoys making data available by piping it in from new sources in optimal ways, building robust data models, prototyping systems, and doing project-specific engineering.

Analytics

Analytics Engineering Data Engineering Tuning

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

DZone

DECEMBER 27, 2023

Welcome to the first post in our exciting series on mastering offline data pipeline's best practices, focusing on the potent combination of Apache Airflow and data processing engines like Hive and Spark. Working together, they form the backbone of many modern data engineering solutions.

Best Practices

Best Practices Data Engineering Big Data Games

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

We adopted the following mission statement to guide our investments: “Provide a complete and accurate data lineage system enabling decision-makers to win moments of truth.” We are loading the lineage data to a graph database to enable seamless integration with a REST data lineage service to address business use cases.

Infrastructure

Infrastructure Big Data Transportation Architecture

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

InfoQ

NOVEMBER 29, 2023

HubSpot adopted routing messages over multiple Kafka topics (called swimlanes) for the same producer to avoid the build-up in the consumer group lag and prioritize the processing of real-time traffic.

Processing

Processing Traffic Data Engineering Scalability

Choosing an OLAP Engine for Financial Risk Management: What To Consider?

DZone

AUGUST 16, 2023

From a data engineer's point of view, financial risk management is a series of data analysis activities on financial data. The financial sector imposes its unique requirements on data engineering. Before they adopted an OLAP engine, they were using Kettle to collect data.

FinTech

FinTech Engineering Data Engineering Latency

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

This operational component places some cognitive load on our engineers, requiring them to develop deep understanding of telemetry and alerting systems, capacity provisioning process, security and reliability best practices, and a vast amount of informal knowledge about the cloud infrastructure.

Infrastructure

Infrastructure Cloud Scalability AWS

SIEM Volume Spike Alerts Using ML

DZone

JANUARY 31, 2024

SIEM platforms streamline incident response processes, allowing security teams to respond quickly and effectively to security incidents. Problem Statement In Data Engineering , the data/log collection is a challenging task for high-volume sources.

Data Engineering

Data Engineering Storage Network Infrastructure

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Edge computing has transformed how businesses and industries process and manage data. By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Leverage tiered storage systems that dynamically offload data based on priority.

IoT

IoT Energy Logistics Latency

Data pipeline asset management with Dataflow

The Netflix TechBlog

FEBRUARY 9, 2022

Let’s define some requirements that we are interested in delivering to the Netflix data engineers or anyone who would like to schedule a workflow with some external assets in it. By the end of the migration process our Jenkins configuration went from: Figure 4. The slightly improved approach is shown on the diagram below.

Storage

Storage Data Engineering Testing Code

5 key areas for tech leaders to watch in 2020

O'Reilly

FEBRUARY 18, 2020

Usage specific to Python as a programming language grew by just 4% in 2019; by contrast, usage that had to do with Python and ML—be it in the context of AI, deep learning, and natural language processing, or in combination with any of several popular ML/AI frameworks—grew by 9%. In aggregate, data engineering usage declined 8% in 2019.

Software Architecture

Software Architecture DevOps Data Engineering Architecture

Optimizing dbt and Google’s BigQuery

DZone

DECEMBER 21, 2020

Still, it is one of many that need to be taken before you can generate value from the data you gather. An important step in that chain of the process is data modeling and transformation. This is where data is extracted, transformed, and loaded (ETL) or extracted, loaded, and transformed (ELT).

Big Data

Big Data Google Scalability Processing

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

MARCH 7, 2024

Data: Fast Data Our main data lake is hosted on S3, organized as Apache Iceberg tables. For ETL and other heavy lifting of data, we mainly rely on Apache Spark. In addition to Spark, we want to support last-mile data processing in Python, addressing use cases such as feature transformations, batch inference, and training.

Systems

Systems Media Cache Open Source

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

Since memory management is not something one usually associates with classification problems, this blog focuses on formulating the problem as an ML problem and the data engineering that goes along with it. We now explore each of these components individually, while highlighting the nuances of the data pipeline and pre-processing.

Big Data

Big Data Cache Engineering Data Engineering

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

MARCH 2, 2021

At Netflix, our data scientists span many areas of technical specialization, including experimentation, causal inference, machine learning, NLP, modeling, and optimization. Together with data analytics and data engineering, we comprise the larger, centralized Data Science and Engineering group.

Analytics

Analytics C++ Innovation Engineering

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

The Netflix TechBlog

SEPTEMBER 10, 2024

The voice service then constructs a message for the device and places it on the message queue, which is then processed and sent to Pushy to deliver to the device. The previous version of the message processor was a Mantis stream-processing job that processed messages from the message queue.

Latency

Latency Cache Tuning Efficiency

Stream Processing: How it Works, Use Cases & Popular Frameworks

Simform

FEBRUARY 14, 2023

Stream processing has become a core part of enterprise data architecture today due to the explosive growth of data from sources such as IoT sensors, security logs, and web applications. This blog discusses the topic of stream processing in detail to help you navigate its landscape with ease.

Processing

Processing IoT Architecture Data Engineering

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

In the data domain, it is common to have a super large number of jobs within a single workflow. For example, a workflow to backfill hourly data for the past five years can lead to 43800 jobs (24 * 365 * 5), each of which processes data for an hour. All the requests are processed via distributed queues for message passing.

Java

Java Scalability Traffic Architecture

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

This entertaining romp through the tech stack serves as an introduction to how we think about and design systems, the Netflix approach to operational challenges, and how other organizations can apply our thought processes and technologies.

AWS

AWS Entertainment Open Source Benchmarking

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 3, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Games Java

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 18, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Games Java

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 17, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Java Servers

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

FEBRUARY 9, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Games Java

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 24, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Software Engineering Engineering Big Data

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Software Engineering Scalability Engineering

Financial Data Engineering in SAS

Our First Netflix Data Engineering Summit

Trending Sources

1. Streamlining Membership Data Engineering at Netflix with Psyberg

A Recap of the Data Engineering Open Forum at Netflix

Data Engineers of Netflix?—?Interview with Kevin Wylie

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

2. Diving Deeper into Psyberg: Stateless vs Stateful Data Processing

Incremental Processing using Netflix Maestro and Apache Iceberg

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

Data Engineers of Netflix?—?Interview with Samuel Setegne

How TripleLift Built an Adtech Data Pipeline Processing Billions of Events Per Day

Secrets Detection: Optimizing Filter Processes

Automated Testing in Data Engineering: An Imperative for Quality and Efficiency

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

Introducing Impressions at Netflix

Ready-to-go sample data pipelines with Dataflow

What is IT automation?

3. Psyberg: Automated end to end catch up

Optimizing data warehouse storage

Analytics at Netflix: Who we are and what we do

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions

Choosing an OLAP Engine for Financial Risk Management: What To Consider?

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

SIEM Volume Spike Alerts Using ML

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Data pipeline asset management with Dataflow

5 key areas for tech leaders to watch in 2020

Optimizing dbt and Google’s BigQuery

Supporting Diverse ML Systems at Netflix

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Stream Processing: How it Works, Use Cases & Popular Frameworks

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Netflix at AWS re:Invent 2019

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Stay Connected