Big Data and Engineering - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

When handling large amounts of complex data, or big data, chances are that your main machine might start getting crushed by all of the data it has to process in order to produce your analytics results. Greenplum features a cost-based query optimizer for large-scale, big data workloads. Query Optimization.

Big Data

Big Data Database Artificial Intelligence Open Source

Sustainability: Thoughts from a software engineer

Dynatrace

MARCH 17, 2025

Until recently, improvements in data center power efficiency compensated almost entirely for the increasing demand for computing resources. The rise of big data, cryptocurrencies, and AI means the IT sector contributes significantly to global greenhouse gas emissions. However, this trend is now reversing.

Software Engineering

Software Engineering Engineering Software Software

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The design of the in-stream processing engine itself was driven by the following requirements: SQL-like functionality. Strict fault-tolerance is a principal requirement for the engine.

Big Data

Big Data Processing Lambda Database

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Scalegrid

NOVEMBER 25, 2019

ScyllaDB is an open-source distributed NoSQL data store, reimplemented from the popular Apache Cassandra database. Released just four years ago in 2015, Scylla has averaged over 220% year-over-year growth in popularity according to DB-Engines. Wondering which wide-column store to use for your deployments? of all cloud deployments.

Big Data

Big Data Database Open Source Azure

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

JULY 15, 2021

Data Engineers of Netflix?—?Interview Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin, what drew you to data engineering?

Data Engineering

Data Engineering Engineering Entertainment Big Data

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Data Engineers of Netflix?—?Interview Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber Engineering

OCTOBER 17, 2018

To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks … The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Big Data

Big Data Latency Transportation Traffic

Databook: Turning Big Data into Knowledge with Metadata at Uber

Uber Engineering

AUGUST 3, 2018

Data powers Uber’s global marketplace, enabling more reliable and seamless user experiences across our products for riders, … The post Databook: Turning Big Data into Knowledge with Metadata at Uber appeared first on Uber Engineering Blog.

Big Data

Big Data Transportation Engineering Storage

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Dhevi Rajendran Dhevi Rajendran This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix.

Data Engineering

Data Engineering Engineering Software Engineering Big Data

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

Then, big data analytics technologies, such as Hadoop, NoSQL, Spark, or Grail, the Dynatrace data lakehouse technology, interpret this information. Here are the six steps of a typical ITOA process : Define the data infrastructure strategy. Why use a data lakehouse for causal AI? Why is ITOA important? Apache Spark.

Analytics

Analytics Artificial Intelligence Big Data Open Source

What Is Chaos Engineering?

DZone

JANUARY 15, 2021

It is no surprise that Site Reliability Engineers have risen to prominence in the last decade. Modern IT infrastructure requires robust systems thinking and reliability engineering to keep the show on the road. Downtime is not an option. 88% showed that 60 minutes of downtime costs their business more than $300,000.

Engineering

Engineering Infrastructure Cloud Systems

Data Engineers of Netflix?—?Interview with Samuel Setegne

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Samuel Setegne Samuel Setegne This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. What drew you to Netflix?

Data Engineering

Data Engineering Engineering Big Data Healthcare

Auto-Diagnosis and Remediation in Netflix Data Platform

The Netflix TechBlog

JANUARY 13, 2022

By Vikram Srivastava and Marcelo Mayworm Netflix has one of the most complex data platforms in the cloud on which our data scientists and engineers run batch and streaming workloads. Pensive collects logs for the failed jobs launched by the step from the relevant data platform components and then extracts the stack traces.

Big Data

Big Data Infrastructure Metrics Games

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud. Driving down the cost of Big-Data analytics. Comments ().

Big Data

Big Data Analytics AWS Cloud

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges.

Big Data

Big Data Storage Benchmarking Hardware

Engineering SQL Support on Apache Pinot at Uber

Uber Engineering

JANUARY 15, 2020

Uber leverages real-time analytics on aggregate data to improve the user experience across our products, from fighting fraudulent behavior on Uber Eats to forecasting demand on our platform. .

Engineering

Engineering Analytics Big Data Database

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Now, imagine yourself in the role of a software engineer responsible for a micro-service which publishes data consumed by few critical customer facing services (e.g. You are about to make structural changes to the data and want to know who and what downstream to your service will be impacted.

Infrastructure

Infrastructure Big Data Transportation Architecture

Python at Netflix

The Netflix TechBlog

APRIL 29, 2019

Python has long been a popular programming language in the networking space because it’s an intuitive language that allows engineers to quickly solve networking problems. Demand Engineering Demand Engineering is responsible for Regional Failovers , Traffic Distribution, Capacity Operations and Fleet Efficiency of the Netflix cloud.

Open Source

Open Source Network Infrastructure Big Data

DataCentral: Uber’s Big Data Observability and Chargeback Platform

Uber Engineering

MARCH 21, 2024

Discover real-time query analytics and governance with DataCentral: Uber’s big data observability powerhouse, tackling millions of queries in petabyte-scale environments.

Big Data

Big Data Government Analytics

What is software automation? Optimize the software lifecycle with intelligent automation

Dynatrace

JUNE 26, 2023

Software analytics offers the ability to gain and share insights from data emitted by software systems and related operational processes to develop higher-quality software faster while operating it efficiently and securely. This involves big data analytics and applying advanced AI and machine learning techniques, such as causal AI.

Software

Software Software Analytics Big Data

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

We at Netflix, as a streaming service running on millions of devices, have a tremendous amount of data about device capabilities/characteristics and runtime data in our big data platform. With large data, comes the opportunity to leverage the data for predictive and classification based analysis.

Big Data

Big Data Cache Engineering Data Engineering

An overview of end-to-end entity resolution for big data

The Morning Paper

DECEMBER 13, 2020

An overview of end-to-end entity resolution for big data , Christophides et al., It’s an important part of many modern data workflows, and an area I’ve been wrestling with in one of my own projects. ACM Computing Surveys, Dec. 2020, Article No.

Big Data

Big Data Open Source Processing Analytics

What is IT automation?

Dynatrace

JULY 6, 2022

As organizations continue to adopt multicloud strategies, the complexity of these environments grows, increasing the need to automate cloud engineering operations to ensure organizations can enforce their policies and architecture principles. This requires significant data engineering efforts, as well as work to build machine-learning models.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

How to Optimize Elasticsearch for Better Search Performance

DZone

JULY 29, 2019

These processes are only possible with a distributed architecture and parallel processing mechanisms that Big Data tools are based on. One of the top trending open-source data storage that responds to most of the use cases is Elasticsearch.

Big Data

Big Data Government Open Source Storage

Big / Bug Data: Analyzing the Apache Flink Source Code

DZone

DECEMBER 21, 2020

Applications used in the field of Big Data process huge amounts of information, and this often happens in real time. Naturally, such applications must be highly reliable so that no error in the code can interfere with data processing. It is an open-source framework for distributed processing of large amounts of data.

Code

Code Java Big Data Open Source

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

Our customers have frequently requested support for this first new batch of services, which cover databases, big data, networks, and computing. See the health of your big data resources at a glance. Azure HDInsight supports a broad range of use cases including data warehousing, machine learning, and IoT analytics.

Azure

Azure Cloud Big Data Virtualization

What is container orchestration?

Dynatrace

MARCH 24, 2023

Docker Swarm First introduced in 2014 by Docker, Docker Swarm is an orchestration engine that popularized the use of containers with developers. The Docker file format is used broadly for orchestration engines, and Docker Engine ships with Docker Swarm and Kubernetes frameworks included.

Infrastructure

Infrastructure Open Source Operating System Cloud

Path to NoOps part 1: How modern AIOps brings NoOps within reach

Dynatrace

OCTOBER 25, 2022

“AIOps platforms address IT leaders’ need for operations support by combining big data and machine learning functionality to analyze the ever-increasing volume, variety and velocity of data generated by IT in response to digital transformation.” – Gartner Market Guide for AIOps platforms.

DevOps

DevOps Big Data Cloud Innovation

DynatraceGo! APAC 2021: Lessons in thick data and keeping pace with the market

Dynatrace

AUGUST 10, 2021

Vikash Chhaganlal , GM of Engineering and Infrastructure at Kiwibank said it. She dispelled the myth that more big data equals better decisions, higher profits, or more customers. Investing in data is easy but using it is really hard”. The fact is, data on its own isn’t meaningful. And they were. That speaks to me.

DevOps

DevOps Innovation Big Data Cloud

A guide to Autonomous Performance Optimization

Dynatrace

SEPTEMBER 15, 2020

Without automation, Performance engineers and developers can no longer ensure that applications perform as planned, and costs are minimized.”. Supported technologies include cloud services, big data, databases, OS, containers, and application runtimes like the JVM.

Performance

Performance Java Metrics Cloud

What is APM?

Dynatrace

JUNE 1, 2020

With our AI engine, Davis, at the core Dynatrace provides precise answers in real-time. Trying to manually keep up, configure, script and source data is beyond human capabilities and today everything must be automated and continuous. Some customers even say, having Davis is like having a whole team of engineers on their side.

Processing Big Data Efficiency Engineering

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

DZone

DECEMBER 27, 2023

Welcome to the first post in our exciting series on mastering offline data pipeline's best practices, focusing on the potent combination of Apache Airflow and data processing engines like Hive and Spark. Working together, they form the backbone of many modern data engineering solutions.

Best Practices

Best Practices Data Engineering Big Data Games

What is Greenplum Database? Intro to the Big Data Database

Sustainability: Thoughts from a software engineer

Trending Sources

In-Stream Big Data Processing

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Data Engineers of Netflix?—?Interview with Kevin Wylie

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

A Recap of the Data Engineering Open Forum at Netflix

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Databook: Turning Big Data into Knowledge with Metadata at Uber

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

What is IT operations analytics? Extract more data insights from more sources

What Is Chaos Engineering?

Data Engineers of Netflix?—?Interview with Samuel Setegne

Auto-Diagnosis and Remediation in Netflix Data Platform

Driving down the cost of Big-Data analytics - All Things Distributed

Kubernetes for Big Data Workloads

Engineering SQL Support on Apache Pinot at Uber

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Python at Netflix

DataCentral: Uber’s Big Data Observability and Chargeback Platform

What is software automation? Optimize the software lifecycle with intelligent automation

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

An overview of end-to-end entity resolution for big data

What is IT automation?

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

How to Optimize Elasticsearch for Better Search Performance

Big / Bug Data: Analyzing the Apache Flink Source Code

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

What is container orchestration?

Path to NoOps part 1: How modern AIOps brings NoOps within reach

DynatraceGo! APAC 2021: Lessons in thick data and keeping pace with the market

A guide to Autonomous Performance Optimization

What is APM?

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Conducting log analysis with an observability platform and full data context

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

How Our Paths Brought Us to Data and Netflix

Scaling Uber’s Apache Hadoop Distributed File System for Growth

Kubernetes in the wild report 2023

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

Business Insights extends support for optimizing Core Web Vitals

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Incremental Processing using Netflix Maestro and Apache Iceberg

Offline Data Pipeline Best Practices Part 1:Optimizing Airflow Job Parameters for Apache Hive

Stay Connected