Big Data, Engineering and Processing - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum Database is a massively parallel processing (MPP) SQL database that is built and based on PostgreSQL. It can scale towards a multi-petabyte level data workload without a single issue, and it allows access to a cluster of powerful servers that will work together within a single SQL interface where you can view all of the data.

Big Data

Big Data Database Artificial Intelligence Open Source

Sustainability: Thoughts from a software engineer

Dynatrace

MARCH 17, 2025

Until recently, improvements in data center power efficiency compensated almost entirely for the increasing demand for computing resources. The rise of big data, cryptocurrencies, and AI means the IT sector contributes significantly to global greenhouse gas emissions. However, this trend is now reversing.

Software Engineering

Software Engineering Engineering Software Software

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that real-time query processing and in-stream processing is the immediate need in many practical applications. Modularity and flexibility.

Big Data

Big Data Processing Lambda Database

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Scalegrid

NOVEMBER 25, 2019

ScyllaDB is an open-source distributed NoSQL data store, reimplemented from the popular Apache Cassandra database. Released just four years ago in 2015, Scylla has averaged over 220% year-over-year growth in popularity according to DB-Engines. So what are some of the reasons why users would pick ScyllaDB vs. Cassandra?

Big Data

Big Data Database Open Source Azure

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

JULY 15, 2021

Data Engineers of Netflix?—?Interview Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin, what drew you to data engineering?

Data Engineering

Data Engineering Engineering Entertainment Big Data

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

by Jun He , Yingyi Zhang , and Pawan Dixit Incremental processing is an approach to process new or changed data in workflows. The key advantage is that it only incrementally processes data that are newly added or updated to a dataset, instead of re-processing the complete dataset.

Processing

Processing Big Data Efficiency Engineering

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

IT operations analytics is the process of unifying, storing, and contextually analyzing operational data to understand the health of applications, infrastructure, and environments and streamline everyday operations. ITOA automates repetitive cloud operations tasks and streamlines the flow of analytics into decision-making processes.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Auto-Diagnosis and Remediation in Netflix Data Platform

The Netflix TechBlog

JANUARY 13, 2022

By Vikram Srivastava and Marcelo Mayworm Netflix has one of the most complex data platforms in the cloud on which our data scientists and engineers run batch and streaming workloads. Pensive collects logs for the failed jobs launched by the step from the relevant data platform components and then extracts the stack traces.

Big Data

Big Data Infrastructure Metrics Games

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Dhevi Rajendran Dhevi Rajendran This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix.

Data Engineering

Data Engineering Engineering Software Engineering Big Data

What is IT automation?

Dynatrace

JULY 6, 2022

At its most basic, automating IT processes works by executing scripts or procedures either on a schedule or in response to particular events, such as checking a file into a code repository. Adding AIOps to automation processes makes the volume of data that applications and multicloud environments generate much less overwhelming.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

What is software automation? Optimize the software lifecycle with intelligent automation

Dynatrace

JUNE 26, 2023

This, in turn, accelerates the need for businesses to implement the practice of software automation to improve and streamline processes. This involves big data analytics and applying advanced AI and machine learning techniques, such as causal AI. Automate DevSecOps processes at scale.

Software

Software Software Analytics Big Data

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Now, imagine yourself in the role of a software engineer responsible for a micro-service which publishes data consumed by few critical customer facing services (e.g. You are about to make structural changes to the data and want to know who and what downstream to your service will be impacted.

Infrastructure

Infrastructure Big Data Transportation Architecture

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The processed data is typically stored as data warehouse tables in AWS S3.

Latency

Latency Storage Big Data Tuning

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud. However, this cannot be done without efficient, scalable data analytics.

Big Data

Big Data Analytics AWS Cloud

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges.

Big Data

Big Data Storage Benchmarking Hardware

Path to NoOps part 1: How modern AIOps brings NoOps within reach

Dynatrace

OCTOBER 25, 2022

NoOps is a concept in software development that seeks to automate processes and eliminate the need for an extensive IT operations team. But it might also result in the entire software development process falling apart. Can organizations really function without an operations team? What is NoOps? Evolution of NoOps.

DevOps

DevOps Big Data Cloud Innovation

How to Optimize Elasticsearch for Better Search Performance

DZone

JULY 29, 2019

In today's world, data is generated in high volumes and to make something out of it, extracted data is needed to be transformed, stored, maintained, governed and analyzed. These processes are only possible with a distributed architecture and parallel processing mechanisms that Big Data tools are based on.

Big Data

Big Data Government Open Source Storage

Big / Bug Data: Analyzing the Apache Flink Source Code

DZone

DECEMBER 21, 2020

Applications used in the field of Big Data process huge amounts of information, and this often happens in real time. Naturally, such applications must be highly reliable so that no error in the code can interfere with data processing. This engine is written in Java and Scala. So, what is Apache Flink?

Code

Code Java Big Data Open Source

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Causal AI—which brings AI-enabled actionable insights to IT operations—and a data lakehouse, such as Dynatrace Grail , can help break down silos among ITOps, DevSecOps, site reliability engineering, and business analytics teams. “The weakness of a data lake is they fail when you need to access them fast,” Pawlowski said.

Analytics

Analytics Infrastructure Storage Architecture

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

Our customers have frequently requested support for this first new batch of services, which cover databases, big data, networks, and computing. See the health of your big data resources at a glance. Azure HDInsight supports a broad range of use cases including data warehousing, machine learning, and IoT analytics.

Azure

Azure Cloud Big Data Virtualization

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

We at Netflix, as a streaming service running on millions of devices, have a tremendous amount of data about device capabilities/characteristics and runtime data in our big data platform. With large data, comes the opportunity to leverage the data for predictive and classification based analysis.

Big Data

Big Data Cache Engineering Data Engineering

What is container orchestration?

Dynatrace

MARCH 24, 2023

Container orchestration is a process that automates the deployment and management of containerized applications and services at scale. Container orchestration enables organizations to manage and automate the many processes and services that comprise workflows.

Infrastructure

Infrastructure Open Source Operating System Cloud

An overview of end-to-end entity resolution for big data

The Morning Paper

DECEMBER 13, 2020

An overview of end-to-end entity resolution for big data , Christophides et al., It’s an important part of many modern data workflows, and an area I’ve been wrestling with in one of my own projects. The processing mode – traditional batch (with or without budget constraints), or incremental. Block processing.

Big Data

Big Data Open Source Processing Analytics

What is APM?

Dynatrace

JUNE 1, 2020

However, with today’s highly connected digital world, monitoring use cases expand to the services, processes, hosts, logs, networks, and of course, end-users that access these applications – including your customers and employees. With our AI engine, Davis, at the core Dynatrace provides precise answers in real-time. AI-Assistance.

Artificial Intelligence

Artificial Intelligence Social Media Monitoring IoT

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

The following figure depicts imaginary “evolution” of the major NoSQL system families, namely, Key-Value stores, BigTable-style databases, Document databases, Full Text Search Engines, and Graph databases: NoSQL Data Models. Data duplication and denormalization are first-class citizens. Conceptual Techniques.

Database

Database Ecommerce Efficiency Engineering

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Stop worrying about log data ingest and storage — start creating value instead. Dynatrace® Grail , an additional core technology for the Dynatrace® Software Intelligence platform , is the world’s first data lakehouse with massively parallel processing (MPP) for context-rich observability, business, and security analytics.

Analytics

Analytics Artificial Intelligence Storage Serverless

Optimizing dbt and Google’s BigQuery

DZone

DECEMBER 21, 2020

Setting up a data warehouse is the first step towards fully utilizing big data analysis. Still, it is one of many that need to be taken before you can generate value from the data you gather. An important step in that chain of the process is data modeling and transformation.

Big Data

Big Data Google Scalability Processing

DynatraceGo! APAC 2021: Lessons in thick data and keeping pace with the market

Dynatrace

AUGUST 10, 2021

Vikash Chhaganlal , GM of Engineering and Infrastructure at Kiwibank said it. Dynatrace CMO Mike Maciag talked about the dangers of “status quo” and failing to get where you need to go because of loyalty to legacy APM software, or hanging onto outdated processes. Investing in data is easy but using it is really hard”.

DevOps

DevOps Innovation Big Data Cloud

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

In fact, Gartner estimates that 80% of enterprises will shut down their on-premises data centers by 2025. This transition to public, private, and hybrid cloud is driving organizations to automate and virtualize IT operations to lower costs and optimize cloud processes and systems. So, what is ITOps? What is ITOps? ITOps vs. AIOps.

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Most Kubernetes clusters in the cloud (73%) are built on top of managed distributions from the hyperscalers like AWS Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). Big data : To store, search, and analyze large datasets, 32% of organizations use Elasticsearch.

Open Source

Open Source Java Operating System Programming

Business Insights extends support for optimizing Core Web Vitals

Dynatrace

APRIL 21, 2021

To do this effectively, you need a big data processing approach. The three challenges to optimizing Core Web Vitals is exactly why the Dynatrace Business Insights team have built the Insights Analytics Engine. How do you know where to focus first with failing pages? How Dynatrace Business Insights can help.

Traffic

Traffic Mobile Metrics Analytics

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

DZone

DECEMBER 27, 2023

Welcome to the first post in our exciting series on mastering offline data pipeline's best practices, focusing on the potent combination of Apache Airflow and data processing engines like Hive and Spark. Working together, they form the backbone of many modern data engineering solutions.

Best Practices

Best Practices Data Engineering Big Data Games

Turbocharge Your Apache Spark Jobs for Unmatched Performance

DZone

JULY 17, 2023

Apache Spark is a leading platform in the field of big data processing, known for its speed, versatility, and ease of use. Understanding Apache Spark Apache Spark is a unified computing engine designed for large-scale data processing.

Big Data

Big Data Performance Open Source Tuning

What is AIOps? Everything you wanted to know

Dynatrace

OCTOBER 14, 2021

Gartner defines AIOps as the combination of “big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination.” But what is AIOps, exactly? And how can it support your organization? What is AIOps? Two approaches to AIOps.

Artificial Intelligence

Artificial Intelligence DevOps Innovation Metrics

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. In this way, no human intervention is required in the remediation process. Multi-objective optimizations.

Tuning

Tuning Efficiency Big Data Engineering

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

The Netflix TechBlog

FEBRUARY 16, 2021

Membership Engineering at Netflix is responsible for the plan and pricing configurations for every market worldwide. To solve the challenges mentioned above and meet our rapidly evolving business needs, we re-architected the legacy SKU catalog from the ground up and partnered with the Growth Engineering team to build a scalable SKU platform.

Mobile

Mobile Engineering Infrastructure Scalability

Applying real-world AIOps use cases to your operations

Dynatrace

OCTOBER 17, 2022

Artificial intelligence for IT operations, or AIOps, combines big data and machine learning to provide actionable insight for IT teams to shape and automate their operational strategy. CloudOps includes processes such as incident management and event management. The four stages of data processing.

DevOps

DevOps Artificial Intelligence Healthcare Innovation

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java

Java Scalability Traffic Architecture

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

Dynatrace

JUNE 29, 2022

A hybrid cloud, however, combines public infrastructure and services with on-premises resources or a private data center to create a flexible, interconnected IT environment. Hybrid environments provide more options for storing and analyzing ever-growing volumes of big data and for deploying digital services.

Infrastructure

Infrastructure Cloud Azure AWS

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

Berkeley Packet Filter (BPF) is an in-kernel execution engine that processes a virtual instruction set, and has been extended as eBPF for providing a safe way to extend kernel functionality. What is BPF? In some ways, eBPF does to the kernel what JavaScript does to websites: it allows all sorts of new applications to be created.

Network

Network Transportation AWS Cloud

What is Application Performance Monitoring?

Dynatrace

JUNE 1, 2020

However, with today’s highly connected digital world, monitoring use cases expand to the services, processes, hosts, logs, networks, and of course end-users that access these applications – including your customers and employees. With our AI engine, Davis, at the core Dynatrace provides precise answers in real-time. AI-Assistance.

Monitoring

Monitoring Performance Social Media Artificial Intelligence

What is Greenplum Database? Intro to the Big Data Database

Sustainability: Thoughts from a software engineer

Trending Sources

In-Stream Big Data Processing

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Data Engineers of Netflix?—?Interview with Kevin Wylie

Incremental Processing using Netflix Maestro and Apache Iceberg

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

A Recap of the Data Engineering Open Forum at Netflix

What is IT operations analytics? Extract more data insights from more sources

Auto-Diagnosis and Remediation in Netflix Data Platform

Data Engineers of Netflix?—?Interview with Samuel Setegne

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

What is IT automation?

What is software automation? Optimize the software lifecycle with intelligent automation

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Driving down the cost of Big-Data analytics - All Things Distributed

Kubernetes for Big Data Workloads

Path to NoOps part 1: How modern AIOps brings NoOps within reach

How to Optimize Elasticsearch for Better Search Performance

Big / Bug Data: Analyzing the Apache Flink Source Code

Conducting log analysis with an observability platform and full data context

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

What is container orchestration?

An overview of end-to-end entity resolution for big data

What is APM?

NoSQL Data Modeling Techniques

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Optimizing dbt and Google’s BigQuery

DynatraceGo! APAC 2021: Lessons in thick data and keeping pace with the market

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Kubernetes in the wild report 2023

Business Insights extends support for optimizing Core Web Vitals

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

Turbocharge Your Apache Spark Jobs for Unmatched Performance

What is AIOps? Everything you wanted to know

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

Applying real-world AIOps use cases to your operations

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

How Netflix uses eBPF flow logs at scale for network insight

What is Application Performance Monitoring?

Stay Connected