Big Data, Data and Systems - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

It can scale towards a multi-petabyte level data workload without a single issue, and it allows access to a cluster of powerful servers that will work together within a single SQL interface where you can view all of the data. This feature-packed database provides powerful and rapid analytics on data that scales up to petabyte volumes.

Big Data

Big Data Database Artificial Intelligence Open Source

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. This system has been designed to supplement and succeed the existing Hadoop-based system that had too high latency of data processing and too high maintenance costs.

Big Data

Big Data Processing Lambda Database

Introduction to Azure Data Lake Storage Gen2

DZone

FEBRUARY 1, 2023

Built on Azure Blob Storage, Azure Data Lake Storage Gen2 is a suite of features for big data analytics. Azure Data Lake Storage Gen1 and Azure Blob Storage's capabilities are combined in Data Lake Storage Gen2. For instance, Data Lake Storage Gen2 offers scale, file-level security, and file system semantics.

Azure

Azure Storage Big Data Analytics

Auto-Diagnosis and Remediation in Netflix Data Platform

The Netflix TechBlog

JANUARY 13, 2022

By Vikram Srivastava and Marcelo Mayworm Netflix has one of the most complex data platforms in the cloud on which our data scientists and engineers run batch and streaming workloads. As our subscribers grow worldwide and Netflix enters the world of gaming , the number of batch workflows and real-time data pipelines increases rapidly.

Big Data

Big Data Infrastructure Metrics Games

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

JULY 26, 2021

This happens at an unprecedented scale and introduces many interesting challenges; one of the challenges is how to provide visibility of Studio data across multiple phases and systems to facilitate operational excellence and empower decision making.

Big Data

Big Data Government Processing Analytics

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

IT operations analytics is the process of unifying, storing, and contextually analyzing operational data to understand the health of applications, infrastructure, and environments and streamline everyday operations. ITOA collects operational data to identify patterns and anomalies for faster incident management and near-real-time insights.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

Infrastructure

Infrastructure Big Data Transportation Architecture

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

This aspect of NoSQL is well-studied both in practice and theory because specific non-functional properties are often the main justification for NoSQL usage and fundamental results on distributed systems like the CAP theorem apply well to NoSQL systems. Many techniques that are described below are perfectly applicable to this model.

Database

Database Ecommerce Efficiency Engineering

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Giving data a heartbeat

Dynatrace

SEPTEMBER 9, 2019

I love data. I have spent virtually my entire career looking at data. Synthetic data, network data, system data, and the list goes on. As much as I love data, data is cold, it lacks emotion. As much as I love data, data is cold, it lacks emotion. Often, 4s is too slow.

Big Data

Big Data Metrics Virtualization Network

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer on the Product Data Science and Engineering team.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

Sustainability: Thoughts from a software engineer

Dynatrace

MARCH 17, 2025

Until recently, improvements in data center power efficiency compensated almost entirely for the increasing demand for computing resources. The rise of big data, cryptocurrencies, and AI means the IT sector contributes significantly to global greenhouse gas emissions. However, this trend is now reversing.

Software Engineering

Software Engineering Engineering Software Software

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java

Java Scalability Traffic Architecture

DynatraceGo! APAC 2021: Lessons in thick data and keeping pace with the market

Dynatrace

AUGUST 10, 2021

This year’s conference agenda was packed full of choices, including: Keynotes : Topics included accelerating digital transformation, with Dynatrace CIO Mike Maciag, and Spatial Collapse: The Great Acceleration of Turning Data Into an Asset, with Tricia Wang from Sudden Compass. We’ve all heard it: data is one of your biggest assets.

DevOps

DevOps Innovation Big Data Cloud

Probabilistic Data Structures for Web Analytics and Data Mining

Highly Scalable

MAY 1, 2012

Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce.

Analytics

Analytics Traffic Big Data Efficiency

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud.

Big Data

Big Data Analytics AWS Cloud

What is software automation? Optimize the software lifecycle with intelligent automation

Dynatrace

JUNE 26, 2023

Software analytics offers the ability to gain and share insights from data emitted by software systems and related operational processes to develop higher-quality software faster while operating it efficiently and securely. This involves big data analytics and applying advanced AI and machine learning techniques, such as causal AI.

Software

Software Software Analytics Big Data

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges. Performance.

Big Data

Big Data Storage Benchmarking Hardware

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

The study analyzes factual Kubernetes production data from thousands of organizations worldwide that are using the Dynatrace Software Intelligence Platform to keep their Kubernetes clusters secure, healthy, and high performing. Kubernetes is emerging as the “operating system” of the cloud. Kubernetes moved to the cloud in 2022.

Open Source

Open Source Java Operating System Programming

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Behind the scenes, a myriad of systems and services are involved in orchestrating the product experience. These backend systems are consistently being evolved and optimized to meet and exceed customer and product expectations. This technique facilitates validation on multiple fronts.

Traffic

Traffic Latency Tuning Systems

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. This guide delves into how these systems work, the challenges they solve, and their essential role in businesses and technology.

Storage

Storage Systems Big Data Azure

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

MARCH 4, 2024

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. We have also noted a great potential for further improvement by model tuning (see the section of Rollout in Production).

Tuning

Tuning Efficiency Big Data Engineering

Stream Processing vs. Batch Processing: What to Know

DZone

JANUARY 31, 2023

Big data is at the center of all business decisions these days. It refers to large volumes of data generated through different sources, and this data then provides the foundation for business decisions. There are different ways through which we can process data. The size of data in batch processing is known.

Processing

Processing Big Data Systems

An overview of end-to-end entity resolution for big data

The Morning Paper

DECEMBER 13, 2020

An overview of end-to-end entity resolution for big data , Christophides et al., It’s an important part of many modern data workflows, and an area I’ve been wrestling with in one of my own projects. Open source ER systems. ACM Computing Surveys, Dec. 2020, Article No. All of the discussed approaches require schemas.

Big Data

Big Data Open Source Processing Analytics

Revolutionizing System Testing With AI and ML

DZone

JUNE 6, 2023

This can include the use of cloud computing, artificial intelligence, big data analytics, the Internet of Things (IoT), and other digital tools. One of the significant challenges that come with digital transformation is ensuring that software systems remain reliable and secure. This is where software testing comes in.

Artificial Intelligence

Artificial Intelligence Systems IoT Testing

Performance Monitoring Dashboards in the Age of Big Data Pollution

Rigor

MAY 22, 2019

Big data is like the pollution of the information age. The Big Data Struggle and Performance Reporting. Alternatively, a number of organizations have created their own internal home-grown systems for managing and distilling web performance and monitoring data. Insights At Any Level. Automatic Updates.

Big Data

Big Data Monitoring Performance Metrics

Python at Netflix

The Netflix TechBlog

APRIL 29, 2019

Various software systems are needed to design, build, and operate this CDN infrastructure, and a significant number of them are written in Python. The configuration of these devices is controlled by several other systems including source of truth, application of configurations to devices, and back up.

Open Source

Open Source Network Infrastructure Big Data

Scaling Uber’s Apache Hadoop Distributed File System for Growth

Uber Engineering

APRIL 5, 2018

Three years ago, Uber Engineering adopted Hadoop as the storage ( HDFS ) and compute ( YARN ) infrastructure for our organization’s big data analysis.

Systems

Systems Big Data Storage Infrastructure

Seven benefits of AIOps to transform your business operations

Dynatrace

JULY 5, 2022

AIOps combines big data and machine learning to automate key IT operations processes, including anomaly detection and identification, event correlation, and root-cause analysis. To achieve these AIOps benefits, comprehensive AIOps tools incorporate four key stages of data processing: Collection. Aggregation.

Artificial Intelligence

Artificial Intelligence Cloud Innovation Strategy

Structural Evolutions in Data

O'Reilly

SEPTEMBER 19, 2023

” I’ve called out the data field’s rebranding efforts before; but even then, I acknowledged that these weren’t just new coats of paint. Each time, the underlying implementation changed a bit while still staying true to the larger phenomenon of “Analyzing Data for Fun and Profit.” Goodbye, Hadoop.

Hardware

Hardware Storage Big Data Blockchain

What is IT automation?

Dynatrace

JULY 6, 2022

Scripts and procedures usually focus on a particular task, such as deploying a new microservice to a Kubernetes cluster, implementing data retention policies on archived files in the cloud, or running a vulnerability scanner over code before it’s deployed. The range of use cases for automating IT is as broad as IT itself.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

What Should You Know About Graph Database’s Scalability?

DZone

JANUARY 20, 2023

Having a distributed and scalable graph database system is highly sought after in many enterprise scenarios. Do Not Be Misled Designing and implementing a scalable graph database system has never been a trivial task.

Scalability

Scalability Big Data Hardware Internet

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

We at Netflix, as a streaming service running on millions of devices, have a tremendous amount of data about device capabilities/characteristics and runtime data in our big data platform. With large data, comes the opportunity to leverage the data for predictive and classification based analysis.

Big Data

Big Data Cache Engineering Data Engineering

Data Mining Problems in Retail

Highly Scalable

MARCH 10, 2015

Retail is one of the most important business domains for data science and data mining applications because of its prolific data and numerous optimization problems such as optimal prices, discounts, recommendations, and stock levels that can be solved using data analysis methods.

Retail

Retail C++ Analytics Metrics

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Dynatrace

DECEMBER 15, 2022

Complex cloud computing environments are increasingly replacing traditional data centers. In fact, Gartner estimates that 80% of enterprises will shut down their on-premises data centers by 2025. The IT help desk creates a ticketing system and resolves service request issues. So, what is ITOps? Why is IT operations important?

Artificial Intelligence

Artificial Intelligence DevOps Hardware Virtualization

Path to NoOps part 1: How modern AIOps brings NoOps within reach

Dynatrace

OCTOBER 25, 2022

Early implementations of NoOps were just ‘lift and shift’ efforts that replicated existing systems to the cloud. AIOps , a term coined by Gartner in 2016, combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination. Evolution of NoOps.

DevOps

DevOps Big Data Cloud Innovation

EDI and API: Which Trends Are Transforming the Modern Supply Chain Management?

DZone

JULY 22, 2022

Honestly, these two terms have recently been doing rounds in the big data world. Over the years, EDI has become a standard document exchange system, whereas API is on its way to becoming a popular alternative to EDI. API and EDI essentially fulfill the same function: getting data to and from two or more partners or recipients.

Big Data

Big Data Technology Technology Systems

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Several pain points have made it difficult for organizations to manage their data efficiently and create actual value. Limited data availability constrains value creation. Modern IT environments — whether multicloud, on-premises, or hybrid-cloud architectures — generate exponentially increasing data volumes.

Analytics

Analytics Artificial Intelligence Storage Serverless

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

At much less than 1% of CPU and memory on the instance, this highly performant sidecar provides flow data at scale for network insight. Flow Collector consumes two data streams, the IP address change events from Sonar via Kafka and eBPF flow log data from the Flow Exporter sidecars.

Network

Network Transportation AWS Cloud

A guide to Autonomous Performance Optimization

Dynatrace

SEPTEMBER 15, 2020

After every experiment run Akamas changes application, runtime, database or cloud configuration based on monitoring data it captured during the previous experiment run. Supported technologies include cloud services, big data, databases, OS, containers, and application runtimes like the JVM. or do you pull different percentiles?

Performance

Performance Java Metrics Cloud

Applying real-world AIOps use cases to your operations

Dynatrace

OCTOBER 17, 2022

Artificial intelligence for IT operations, or AIOps, combines big data and machine learning to provide actionable insight for IT teams to shape and automate their operational strategy. The four stages of data processing. There are four stages of data processing: Collect raw data. Analyze the data.

DevOps

DevOps Artificial Intelligence Healthcare Innovation

What is container orchestration?

Dynatrace

MARCH 24, 2023

Containers enable developers to package microservices or applications with the libraries, configuration files, and dependencies needed to run on any infrastructure, regardless of the target system environment. This means organizations are increasingly using Kubernetes not just for running applications, but also as an operating system.

Infrastructure

Infrastructure Open Source Operating System Cloud

What is Greenplum Database? Intro to the Big Data Database

In-Stream Big Data Processing

Trending Sources

Introduction to Azure Data Lake Storage Gen2

Auto-Diagnosis and Remediation in Netflix Data Platform

Data Movement in Netflix Studio via Data Mesh

What is IT operations analytics? Extract more data insights from more sources

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

NoSQL Data Modeling Techniques

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Giving data a heartbeat

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Sustainability: Thoughts from a software engineer

A Recap of the Data Engineering Open Forum at Netflix

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

DynatraceGo! APAC 2021: Lessons in thick data and keeping pace with the market

Probabilistic Data Structures for Web Analytics and Data Mining

Optimizing data warehouse storage

Driving down the cost of Big-Data analytics - All Things Distributed

What is software automation? Optimize the software lifecycle with intelligent automation

Kubernetes for Big Data Workloads

Kubernetes in the wild report 2023

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

What is a Distributed Storage System

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

Stream Processing vs. Batch Processing: What to Know

An overview of end-to-end entity resolution for big data

Revolutionizing System Testing With AI and ML

Performance Monitoring Dashboards in the Age of Big Data Pollution

Python at Netflix

Scaling Uber’s Apache Hadoop Distributed File System for Growth

Seven benefits of AIOps to transform your business operations

Structural Evolutions in Data

What is IT automation?

What Should You Know About Graph Database’s Scalability?

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

Data Mining Problems in Retail

What is ITOps? Why IT operations is more crucial than ever in a multicloud world

Path to NoOps part 1: How modern AIOps brings NoOps within reach

EDI and API: Which Trends Are Transforming the Modern Supply Chain Management?

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

How Netflix uses eBPF flow logs at scale for network insight

A guide to Autonomous Performance Optimization

Applying real-world AIOps use cases to your operations

What is container orchestration?

Stay Connected