Data Engineering - Technology Performance Pulse

Financial Data Engineering in SAS

DZone

JANUARY 8, 2024

Financial data engineering in SAS involves the management, processing, and analysis of financial data using the various tools and techniques provided by the SAS software suite. Here are some key aspects of financial data engineering in SAS: 1.

Data Engineering

Data Engineering Engineering Database Software

Our First Netflix Data Engineering Summit

The Netflix TechBlog

DECEMBER 14, 2023

Engineers from across the company came together to share best practices on everything from Data Processing Patterns to Building Reliable Data Pipelines. The result was a series of talks which we are now sharing with the rest of the Data Engineering community! In this video, Sr. In this video, Sr.

Data Engineering

Data Engineering Engineering Software Engineering Best Practices

Ensuring Data Integrity Through Anomaly Detection: Essential Tools for Data Engineers

DZone

JULY 31, 2024

This article sets out to explore some of the essential tools required by organizations in the domain of data engineering to efficiently improve data quality and triage/analyze data for effective business-centric machine learning analytics, reporting, and anomaly detection.

Data Engineering

Data Engineering FinTech Engineering Analytics

1. Streamlining Membership Data Engineering at Netflix with Psyberg

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions.

Data Engineering

Data Engineering Engineering Processing Games

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

JULY 15, 2021

Data Engineers of Netflix?—?Interview Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin, what drew you to data engineering?

Data Engineering

Data Engineering Engineering Entertainment Big Data

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Dhevi Rajendran Dhevi Rajendran This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Data Engineers of Netflix?—?Interview

Data Engineering

Data Engineering Engineering Software Engineering Big Data

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

DZone

JULY 3, 2023

Data engineering projects often require the setup and management of complex infrastructures that support data processing, storage, and analysis. In this article, we will explore the benefits of leveraging IaC for data engineering projects and provide detailed implementation steps to get started.

Data Engineering

Data Engineering Infrastructure Engineering Code

Automated Testing in Data Engineering: An Imperative for Quality and Efficiency

DZone

JANUARY 9, 2024

This holds true for the critical field of data engineering as well. As organizations gather and process astronomical volumes of data, manual testing is no longer feasible or reliable. This comprehensive guide takes an in-depth look at automated testing in the data engineering domain.

Data Engineering

Data Engineering Efficiency Engineering Testing

Chaos Data Engineering Manifesto: 5 Laws for Successful Failures

DZone

FEBRUARY 27, 2023

It's midnight in the dim and cluttered office of The New York Times, currently serving as the "situation room." A powerful surge of traffic is inevitable. During every major election, the wave would crest and crash against our overwhelmed systems before receding, allowing us to assess the damage.

Data Engineering

Data Engineering Engineering Traffic Systems

Optimizing Vector Search Performance With Elasticsearch

DZone

NOVEMBER 4, 2024

As data engineers, we are tasked with implementing these sophisticated solutions, ensuring organizations can derive actionable insights from vast datasets. This article explores the intricacies of vector search using Elasticsearch , focusing on effective techniques and best practices to optimize performance.

Retail

Retail Performance Best Practices Tuning

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

DZone

MARCH 29, 2023

This article discusses the challenges and best practices of data migration when transferring on-premise data to the cloud. The article will also explore the role of data engineering in ensuring successful data transfer and integration and different approaches to data migration.

Best Practices

Best Practices Cloud Data Engineering Storage

Bringing Software Engineering Rigor to Data

DZone

FEBRUARY 20, 2023

The data community is striving to incorporate the core concepts of engineering rigor found in software communities but still has further to go. This is achieved through practices like Infrastructure as Code for deployments, automated testing, application observability, and end-to-end application lifecycle ownership.

Software Engineering

Software Engineering Engineering Software Software

How TripleLift Built an Adtech Data Pipeline Processing Billions of Events Per Day

High Scalability

JUNE 15, 2020

This is a guest post by Eunice Do , Data Engineer at TripleLift , a technology company leading the next generation of programmatic advertising. The system is the data pipeline at TripleLift. TripleLift is an adtech company, and like most companies in this industry, we deal with high volumes of data on a daily basis.

Processing

Processing Data Engineering Engineering Efficiency

Analytics at Netflix: Who we are and what we do

The Netflix TechBlog

SEPTEMBER 18, 2020

The Engineer enjoys making data available by piping it in from new sources in optimal ways, building robust data models, prototyping systems, and doing project-specific engineering.

Analytics

Analytics Engineering Data Engineering Tuning

The 31 Flavors of Data Lineage and Why Vanilla Doesn’t Cut It

DZone

JANUARY 27, 2023

Data lineage, an automated visualization of the relationships for how data flows across tables and other data assets, is a must-have in the data engineering toolbox.

Government

Government Data Engineering Engineering

Ready-to-go sample data pipelines with Dataflow

The Netflix TechBlog

DECEMBER 3, 2022

is.null(title_id) & event_count > 0) |> select(title_id, country_code, event_type) |> group_by(title_id, country_code) |> summarize(event_count = sum(event_type, na.rm = TRUE)) ranked_title_activity_by_country <- title_activity_by_country |> group_by(country_code) |> mutate(title_rank = rank(desc(event_count))) top_25_title_by_country (..)

Best Practices

Best Practices Code Testing Data Engineering

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

DZone

DECEMBER 27, 2023

Welcome to the first post in our exciting series on mastering offline data pipeline's best practices, focusing on the potent combination of Apache Airflow and data processing engines like Hive and Spark. Working together, they form the backbone of many modern data engineering solutions.

Best Practices

Best Practices Data Engineering Big Data Games

2. Diving Deeper into Psyberg: Stateless vs Stateful Data Processing

The Netflix TechBlog

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty In the inaugural blog post of this series, we introduced you to the state of our pipelines before Psyberg and the challenges with incremental processing that led us to create the Psyberg framework within Netflix’s Membership and Finance data engineering team.

Processing

Processing Data Engineering Efficiency Analytics

5 key areas for tech leaders to watch in 2020

O'Reilly

FEBRUARY 18, 2020

The results for data-related topics are both predictable and—there’s no other way to put it—confusing. Starting with data engineering, the backbone of all data work (the category includes titles covering data management, i.e., relational databases, Spark, Hadoop, SQL, NoSQL, etc.). This follows a 3% drop in 2018.

Software Architecture

Software Architecture DevOps Data Engineering Architecture

Data pipeline asset management with Dataflow

The Netflix TechBlog

FEBRUARY 9, 2022

Let’s define some requirements that we are interested in delivering to the Netflix data engineers or anyone who would like to schedule a workflow with some external assets in it. Conclusions This new method available for Netflix data engineers makes workflow management easier, more transparent and more reliable.

Storage

Storage Data Engineering Testing Code

Choosing an OLAP Engine for Financial Risk Management: What To Consider?

DZone

AUGUST 16, 2023

From a data engineer's point of view, financial risk management is a series of data analysis activities on financial data. The financial sector imposes its unique requirements on data engineering.

FinTech

FinTech Engineering Data Engineering Latency

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

Some of the optimizations are prerequisites for a high-performance data warehouse. Sometimes Data Engineers write downstream ETLs on ingested data to optimize the data/metadata layouts to make other ETL processes cheaper and faster.

Storage

Storage Latency Efficiency Data Engineering

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

The Netflix TechBlog

MARCH 5, 2019

While our engineering teams have and continue to build solutions to lighten this cognitive load (better guardrails, improved tooling, …), data and its derived products are critical elements to understanding, optimizing and abstracting our infrastructure. Give us a holler if you are interested in a thought exchange.

Infrastructure

Infrastructure Cloud Scalability AWS

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

We adopted the following mission statement to guide our investments: “Provide a complete and accurate data lineage system enabling decision-makers to win moments of truth.” Please share your experience by adding your comments below and stay tuned for more on data lineage at Netflix in the follow up blog posts. .

Infrastructure

Infrastructure Big Data Transportation Architecture

What is IT automation?

Dynatrace

JULY 6, 2022

This requires significant data engineering efforts, as well as work to build machine-learning models. While automating IT processes without integrated AIOps can create challenges, the approach to artificial intelligence itself can also introduce potential issues. AI that is based on machine learning needs to be trained.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

3. Psyberg: Automated end to end catch up

The Netflix TechBlog

NOVEMBER 14, 2023

This helps overwrite data only when required and minimizes unnecessary reprocessing. As seen above, by chaining these Psyberg workflows, we could automate the catchup for late-arriving data from hours 2 and 6. The Data Engineer does not need to perform any manual intervention in this case and can thus focus on more important things!

Processing

Processing Tuning C++ Efficiency

SIEM Volume Spike Alerts Using ML

DZone

JANUARY 31, 2024

Problem Statement In Data Engineering , the data/log collection is a challenging task for high-volume sources. Compliance Reporting: SIEM solutions help organizations meet regulatory compliance requirements by providing reporting and audit trail capabilities.

Data Engineering

Data Engineering Storage Network Infrastructure

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

These challenges are currently addressed in suboptimal and less cost efficient ways by individual local teams to fulfill the needs, such as Lookback: This is a generic and simple approach that data engineers use to solve the data accuracy problem. Users configure the workflow to read the data in a window (e.g.

Processing

Processing Big Data Efficiency Engineering

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

MARCH 2, 2021

At Netflix, our data scientists span many areas of technical specialization, including experimentation, causal inference, machine learning, NLP, modeling, and optimization. Together with data analytics and data engineering, we comprise the larger, centralized Data Science and Engineering group.

Analytics

Analytics C++ Innovation Engineering

Introducing Impressions at Netflix

The Netflix TechBlog

FEBRUARY 14, 2025

Every image you hover over isnt just a visual placeholder; its a critical data point that fuels our sophisticated personalization engine. Part 1: Creating the Source of Truth for Impressions By: TulikaBhatt Imagine scrolling through Netflix, where each movie poster or promotional banner competes for your attention.

Tuning

Tuning Latency Efficiency Storage

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

Since memory management is not something one usually associates with classification problems, this blog focuses on formulating the problem as an ML problem and the data engineering that goes along with it. Some nuances while creating this dataset come from the on-field domain knowledge of our engineers.

Big Data

Big Data Cache Engineering Data Engineering

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

DECEMBER 11, 2024

Key issues include: A shortage of edge-native data engineers and architects. Talent and Expertise Shortages The rapid evolution of edge computing technologies has outpaced the availability of skilled professionals who can design, implement, and manage these systems effectively. High costs of training and retaining talent.

IoT

IoT Energy Logistics Latency

Scaling Appsec at Netflix (Part 2)

The Netflix TechBlog

JUNE 6, 2022

Our focus has been on improving overall security assurance as opposed to just vulnerability prevention. We are now expanding this approach to more parts of our ecosystem.

Software Engineering

Software Engineering Scalability Education Engineering

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

1:45pm-2:45pm NFX 201 More Data Science with less engineering: ML Infrastructure Ville Tuulos , Machine Learning Infrastructure Engineering Manager Abstract : Netflix is known for its unique culture that gives an extraordinary amount of freedom to individual engineers and data scientists.

AWS

AWS Entertainment Open Source Benchmarking

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

NOVEMBER 12, 2019

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Java Software Engineering Engineering

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

OCTOBER 29, 2019

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Java Software Engineering Engineering

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

JANUARY 7, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Java Software Engineering Engineering

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

High Scalability

DECEMBER 12, 2019

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Java Software Engineering Engineering

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 3, 2020

Etleap is analyst-friendly , enterprise-grade ETL-as-a-service , built for Redshift and Snowflake data warehouses and S3/Glue data lakes. Our intuitive software allows data engineers to maintain pipelines without writing code, and lets analysts gain access to data in minutes instead of months.

Education

Education Engineering Games Java

Financial Data Engineering in SAS

Our First Netflix Data Engineering Summit

Trending Sources

Ensuring Data Integrity Through Anomaly Detection: Essential Tools for Data Engineers

1. Streamlining Membership Data Engineering at Netflix with Psyberg

A Recap of the Data Engineering Open Forum at Netflix

Data Engineers of Netflix?—?Interview with Kevin Wylie

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

Data Engineers of Netflix?—?Interview with Samuel Setegne

Leveraging Infrastructure as Code for Data Engineering Projects: A Comprehensive Guide

Automated Testing in Data Engineering: An Imperative for Quality and Efficiency

Chaos Data Engineering Manifesto: 5 Laws for Successful Failures

Optimizing Vector Search Performance With Elasticsearch

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

Bringing Software Engineering Rigor to Data

How TripleLift Built an Adtech Data Pipeline Processing Billions of Events Per Day

Analytics at Netflix: Who we are and what we do

The 31 Flavors of Data Lineage and Why Vanilla Doesn’t Cut It

Ready-to-go sample data pipelines with Dataflow

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

2. Diving Deeper into Psyberg: Stateless vs Stateful Data Processing

5 key areas for tech leaders to watch in 2020

Data pipeline asset management with Dataflow

Choosing an OLAP Engine for Financial Risk Management: What To Consider?

Optimizing data warehouse storage

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

What is IT automation?

3. Psyberg: Automated end to end catch up

SIEM Volume Spike Alerts Using ML

Incremental Processing using Netflix Maestro and Apache Iceberg

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

Introducing Impressions at Netflix

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

These 7 Edge Data Challenges Will Test Companies the Most in 2025

Scaling Appsec at Netflix (Part 2)

Netflix at AWS re:Invent 2019

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Sponsored Post: Fauna, Sisu, Educative, PA File Sight, Etleap, PerfOps, Triplebyte, Stream

Post: Essilen Research, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Stay Connected