Big Data, Efficiency and Engineering - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

When handling large amounts of complex data, or big data, chances are that your main machine might start getting crushed by all of the data it has to process in order to produce your analytics results. Greenplum features a cost-based query optimizer for large-scale, big data workloads. Query Optimization.

Big Data

Big Data Database Artificial Intelligence Open Source

Sustainability: Thoughts from a software engineer

Dynatrace

MARCH 17, 2025

Until recently, improvements in data center power efficiency compensated almost entirely for the increasing demand for computing resources. The rise of big data, cryptocurrencies, and AI means the IT sector contributes significantly to global greenhouse gas emissions. However, this trend is now reversing.

Software Engineering

Software Engineering Engineering Software Software

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The design of the in-stream processing engine itself was driven by the following requirements: SQL-like functionality. Strict fault-tolerance is a principal requirement for the engine.

Big Data

Big Data Processing Lambda Database

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Uber Engineering

AUGUST 14, 2019

Maintaining Uber’s large-scale data warehouse comes with an operational cost in terms of ETL functions and storage. In our experience, optimizing for operational efficiency requires answering one key question: for which tables does the maintenance cost supersede utility?

Efficiency

Efficiency Engineering Design Storage

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

In addition to improved IT operational efficiency at a lower cost, ITOA also enhances digital experience monitoring for increased customer engagement and satisfaction. Then, big data analytics technologies, such as Hadoop, NoSQL, Spark, or Grail, the Dynatrace data lakehouse technology, interpret this information.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

Infrastructure

Infrastructure Big Data Transportation Architecture

What is software automation? Optimize the software lifecycle with intelligent automation

Dynatrace

JUNE 26, 2023

Software analytics offers the ability to gain and share insights from data emitted by software systems and related operational processes to develop higher-quality software faster while operating it efficiently and securely. This involves big data analytics and applying advanced AI and machine learning techniques, such as causal AI.

Software

Software Software Analytics Big Data

What is IT automation?

Dynatrace

JULY 6, 2022

As organizations continue to adopt multicloud strategies, the complexity of these environments grows, increasing the need to automate cloud engineering operations to ensure organizations can enforce their policies and architecture principles. This requires significant data engineering efforts, as well as work to build machine-learning models.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. the retry success probability) and compute cost efficiency (i.e., Multi-objective optimizations.

Tuning

Tuning Efficiency Big Data Engineering

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

With more automated approaches to log monitoring and log analysis, however, organizations can gain visibility into their applications and infrastructure efficiently and with greater precision—even as cloud environments grow. “The weakness of a data lake is they fail when you need to access them fast,” Pawlowski said.

Analytics

Analytics Infrastructure Storage Architecture

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud. However, this cannot be done without efficient, scalable data analytics.

Big Data

Big Data Analytics AWS Cloud

Python at Netflix

The Netflix TechBlog

APRIL 29, 2019

Python has long been a popular programming language in the networking space because it’s an intuitive language that allows engineers to quickly solve networking problems. Demand Engineering Demand Engineering is responsible for Regional Failovers , Traffic Distribution, Capacity Operations and Fleet Efficiency of the Netflix cloud.

Open Source

Open Source Network Infrastructure Big Data

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Several pain points have made it difficult for organizations to manage their data efficiently and create actual value. Limited data availability constrains value creation. Traditional solutions and approaches are inefficient given the number of manual tasks that are required for effective log data ingest.

Analytics

Analytics Artificial Intelligence Storage Serverless

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

Our customers have frequently requested support for this first new batch of services, which cover databases, big data, networks, and computing. See the health of your big data resources at a glance. Azure HDInsight supports a broad range of use cases including data warehousing, machine learning, and IoT analytics.

Azure

Azure Cloud Big Data Virtualization

How Our Paths Brought Us to Data and Netflix

The Netflix TechBlog

SEPTEMBER 18, 2020

—?and what the role entails by Julie Beckley & Chris Pham This Q&A provides insights into the diverse set of skills, projects, and culture within Data Science and Engineering (DSE) at Netflix through the eyes of two team members: Chris Pham and Julie Beckley. Photo from a team curling offsite?—?There’s There’s us to the right!

Analytics

Analytics Education Innovation Engineering

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

It also improves the engineering productivity by simplifying the existing pipelines and unlocking the new patterns. We will show how we are building a clean and efficient incremental processing solution (IPS) by using Netflix Maestro and Apache Iceberg. Backfill: Backfilling datasets is a common operation in big data processing.

Processing

Processing Big Data Efficiency Engineering

Path to NoOps part 1: How modern AIOps brings NoOps within reach

Dynatrace

OCTOBER 25, 2022

Organizations adopt DevOps, where developers and operations work together in a continuous loop, so they can develop software and resolve issues efficiently before they affect users. Davis®— the Dynatrace purpose-built AI engine —further uses event proximity and baselined data for automatic correlation and noise suppression.

DevOps

DevOps Big Data Cloud Innovation

What is container orchestration?

Dynatrace

MARCH 24, 2023

Container technology enables organizations to efficiently develop cloud-native applications or to modernize legacy applications to take advantage of cloud services. Docker Swarm First introduced in 2014 by Docker, Docker Swarm is an orchestration engine that popularized the use of containers with developers.

Infrastructure

Infrastructure Open Source Operating System Cloud

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

The following figure depicts imaginary “evolution” of the major NoSQL system families, namely, Key-Value stores, BigTable-style databases, Document databases, Full Text Search Engines, and Graph databases: NoSQL Data Models. Full Text Search Engines: Apache Lucene, Apache Solr. Document Databases: MongoDB, CouchDB.

Database

Database Ecommerce Efficiency Engineering

What is APM?

Dynatrace

JUNE 1, 2020

With our AI engine, Davis, at the core Dynatrace provides precise answers in real-time. Trying to manually keep up, configure, script and source data is beyond human capabilities and today everything must be automated and continuous. Some customers even say, having Davis is like having a whole team of engineers on their side.

Artificial Intelligence

Artificial Intelligence Social Media Monitoring IoT

Ensuring Performance, Efficiency, and Scalability of Digital Transformation

Alex Podelko

DECEMBER 19, 2019

– Performance engineering as it done at Alibaba – which emerging as a major cloud provider. – Clearly a hot topic – and the most interesting point here would be how it is changing performance engineering. Meeting of the Minds: Performance Engineering. a Panel Discussion. You can’t always get what you want. .

Efficiency

Efficiency Artificial Intelligence Scalability Performance

An overview of end-to-end entity resolution for big data

The Morning Paper

DECEMBER 13, 2020

An overview of end-to-end entity resolution for big data , Christophides et al., It’s an important part of many modern data workflows, and an area I’ve been wrestling with in one of my own projects. Dynamic approaches schedule block processing on the fly to maximise efficiency. ACM Computing Surveys, Dec.

Big Data

Big Data Open Source Processing Analytics

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

The Netflix TechBlog

FEBRUARY 16, 2021

Membership Engineering at Netflix is responsible for the plan and pricing configurations for every market worldwide. Operational Efficiency: The majority of the changes require metadata configuration files and library code changes, usually taking days of testing and service release to adopt the updates.

Mobile

Mobile Engineering Infrastructure Scalability

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

Berkeley Packet Filter (BPF) is an in-kernel execution engine that processes a virtual instruction set, and has been extended as eBPF for providing a safe way to extend kernel functionality. The data is also used by security and other partner teams for insight and incident analysis. What is BPF?

Network

Network Transportation AWS Cloud

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

DZone

DECEMBER 27, 2023

Welcome to the first post in our exciting series on mastering offline data pipeline's best practices, focusing on the potent combination of Apache Airflow and data processing engines like Hive and Spark. Working together, they form the backbone of many modern data engineering solutions.

Best Practices

Best Practices Data Engineering Big Data Games

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

Adding application security to development and operations workflows increases efficiency. AIOps (artificial intelligence for IT operations) combines big data, AI algorithms, and machine learning for actionable, real-time insights that help ITOps continuously improve operations. ITOps vs. AIOps. ” The post What is ITOps?

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java

Java Scalability Traffic Architecture

AIOps observability adoption ascends in healthcare

Dynatrace

MARCH 14, 2022

The healthcare industry is embracing cloud technology to improve the efficiency, quality, and security of patient care, and this year’s HIMSS Conference in Orlando, Fla., AIOps (or “AI for IT operations”) uses artificial intelligence so that big data can help IT teams work faster and more effectively.

Healthcare

Healthcare Artificial Intelligence Innovation Strategy

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits. This article will list some of the use cases of AutoOptimize, discuss the design principles that help enhance efficiency, and present the high-level architecture.

Storage

Storage Latency Efficiency Data Engineering

Applying real-world AIOps use cases to your operations

Dynatrace

OCTOBER 17, 2022

Artificial intelligence for IT operations, or AIOps, combines big data and machine learning to provide actionable insight for IT teams to shape and automate their operational strategy. The four stages of data processing. Alert fatigue and chasing false positives are not only efficiency problems.

DevOps

DevOps Artificial Intelligence Healthcare Innovation

What is AIOps? Everything you wanted to know

Dynatrace

OCTOBER 14, 2021

Gartner defines AIOps as the combination of “big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination.” But what is AIOps, exactly? And how can it support your organization? What is AIOps? Autonomous operations.

Artificial Intelligence

Artificial Intelligence DevOps Innovation Metrics

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

The Netflix TechBlog

MARCH 2, 2021

At Netflix, our data scientists span many areas of technical specialization, including experimentation, causal inference, machine learning, NLP, modeling, and optimization. Together with data analytics and data engineering, we comprise the larger, centralized Data Science and Engineering group.

Analytics

Analytics C++ Innovation Engineering

What is Application Performance Monitoring?

Dynatrace

JUNE 1, 2020

With our AI engine, Davis, at the core Dynatrace provides precise answers in real-time. Trying to manually keep up, configure, script and source data is beyond human capabilities and today everything must be automated and continuous. Some customers even say, having Davis is like having a whole team of engineers on their side.

Monitoring

Monitoring Performance Social Media Artificial Intelligence

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

On the surface this is a paper about fast data ingestion from high-volume streams, with indexing to support efficient querying. Helios also serves as a reference architecture for how Microsoft envisions its next generation of distributed big-data processing systems being built. PVLDB’20.

Cloud

Cloud Big Data Latency Architecture

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 24, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer! How to screen candidates efficiently, effectively, and without bias. Who's Hiring? Please apply here.

Education

Education Software Engineering Engineering Big Data

MySQL vs MongoDB: Best Choice for You

Scalegrid

FEBRUARY 11, 2025

DBMS provides a systematic way to store, retrieve, and manage data, ensuring it remains organized and controlled. These systems are crucial for handling large volumes of data efficiently, enabling businesses and applications to perform complex queries, maintain data integrity, and ensure security.

Scalability

Scalability Database Storage IoT

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer! How to screen candidates efficiently, effectively, and without bias. Try out their platform. Apply here.

Education

Education Software Engineering Scalability Engineering

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

In practice, a hybrid cloud operates by melding resources and services from multiple computing environments, which necessitates effective coordination, orchestration, and integration to work efficiently. Tailoring resource allocation efficiently ensures faster application performance in alignment with organizational demands.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 30, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer! How to screen candidates efficiently, effectively, and without bias. Try out their platform. Apply here.

Education

Education Software Engineering Engineering Big Data

What is Greenplum Database? Intro to the Big Data Database

Sustainability: Thoughts from a software engineer

Trending Sources

In-Stream Big Data Processing

A Recap of the Data Engineering Open Forum at Netflix

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

What is IT operations analytics? Extract more data insights from more sources

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

What is software automation? Optimize the software lifecycle with intelligent automation

What is IT automation?

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Data Engineers of Netflix?—?Interview with Samuel Setegne

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Conducting log analysis with an observability platform and full data context

Driving down the cost of Big-Data analytics - All Things Distributed

Python at Netflix

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

How Our Paths Brought Us to Data and Netflix

Incremental Processing using Netflix Maestro and Apache Iceberg

Path to NoOps part 1: How modern AIOps brings NoOps within reach

What is container orchestration?

NoSQL Data Modeling Techniques

What is APM?

Ensuring Performance, Efficiency, and Scalability of Digital Transformation

An overview of end-to-end entity resolution for big data

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

How Netflix uses eBPF flow logs at scale for network insight

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

AIOps observability adoption ascends in healthcare

Optimizing data warehouse storage

Applying real-world AIOps use cases to your operations

What is AIOps? Everything you wanted to know

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

What is Application Performance Monitoring?

Helios: hyperscale indexing for the cloud & edge – part 1

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

MySQL vs MongoDB: Best Choice for You

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Mastering Hybrid Cloud Strategy

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Stay Connected