Big Data - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

When handling large amounts of complex data, or big data, chances are that your main machine might start getting crushed by all of the data it has to process in order to produce your analytics results. Greenplum features a cost-based query optimizer for large-scale, big data workloads. Query Optimization.

Big Data

Big Data Database Artificial Intelligence Open Source

Data Storage Formats for Big Data Analytics: Performance and Cost Implications of Parquet, Avro, and ORC

DZone

SEPTEMBER 9, 2024

Efficient data processing is crucial for businesses and organizations that rely on big data analytics to make informed decisions. One key factor that significantly affects the performance of data processing is the storage format of the data.

Big Data

Big Data Storage Analytics Benchmarking

3 Performance Tricks for Dealing With Big Data Sets

DZone

AUGUST 21, 2021

This article describes 3 different tricks that I used in dealing with big data sets (order of 10 million records) and that proved to enhance performance dramatically. Trick 1: CLOB Instead of Result Set.

Big Data

Big Data Performance Tuning Mobile

Cutting Big Data Costs: Effective Data Processing With Apache Spark

DZone

SEPTEMBER 14, 2023

In today's data-driven world, efficient data processing plays a pivotal role in the success of any project. Apache Spark , a robust open-source data processing framework, has emerged as a game-changer in this domain.

Big Data

Big Data Processing Open Source Games

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. Other flows are more sophisticated: one Storm topology can pass the data to another topology via Kafka or Cassandra. Towards Unified Big Data Processing. Apache Spark [10].

Big Data

Big Data Processing Lambda Database

Write Optimized Spark Code for Big Data Applications

DZone

MARCH 7, 2023

Apache Spark is a powerful open-source distributed computing framework that provides a variety of APIs to support big data processing. PySpark is the Python API for Apache Spark , which allows Python developers to write Spark applications using Python instead of Scala or Java.

Big Data

Big Data Code Tuning Open Source

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Scalegrid

NOVEMBER 25, 2019

Google Cloud does offer their own wide column store and big data database called Bigtable which is actually ranked #111, one under ScyllaDB at #110 on DB-Engines. Google Cloud Platform (GCP) was the second most popular cloud provider for ScyllaDB, coming in at 30.4% of all cloud deployments.

Big Data

Big Data Database Open Source Azure

How Amazon is solving big-data challenges with data lakes

All Things Distributed

JANUARY 20, 2020

Back when Jeff Bezos filled orders in his garage and drove packages to the post office himself, crunching the numbers on costs, tracking inventory, and forecasting future demand was relatively simple.

Big Data

Big Data Logistics Retail Government

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber Engineering

OCTOBER 17, 2018

To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks … The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Big Data

Big Data Latency Transportation Traffic

Databook: Turning Big Data into Knowledge with Metadata at Uber

Uber Engineering

AUGUST 3, 2018

From driver and rider locations and destinations, to restaurant orders and payment transactions, every interaction on Uber’s transportation platform is driven by data.

Big Data

Big Data Transportation Engineering Storage

Sustainability: Thoughts from a software engineer

Dynatrace

MARCH 17, 2025

Until recently, improvements in data center power efficiency compensated almost entirely for the increasing demand for computing resources. The rise of big data, cryptocurrencies, and AI means the IT sector contributes significantly to global greenhouse gas emissions. However, this trend is now reversing.

Software Engineering

Software Engineering Engineering Software Software

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud. Driving down the cost of Big-Data analytics. Comments ().

Big Data

Big Data Analytics AWS Cloud

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges. Performance.

Big Data

Big Data Storage Benchmarking Hardware

DataCentral: Uber’s Big Data Observability and Chargeback Platform

Uber Engineering

MARCH 21, 2024

Discover real-time query analytics and governance with DataCentral: Uber’s big data observability powerhouse, tackling millions of queries in petabyte-scale environments.

Big Data

Big Data Government Analytics

Microsoft Azure Event Hubs

DZone

FEBRUARY 23, 2023

Introduction With big data streaming platform and event ingestion service Azure Event Hubs , millions of events can be received and processed in a single second. Any real-time analytics provider or batching/storage adaptor can transform and store data supplied to an event hub.

Azure

Azure Big Data Storage Analytics

Master the Art of Querying Data on Amazon S3

DZone

JUNE 3, 2024

This is especially the case when it comes to taking advantage of vast amounts of data stored in cloud platforms like Amazon S3 - Simple Storage Service, which has become a central repository of data types ranging from the content of web applications to big data analytics.

Big Data

Big Data AWS Storage Analytics

Performance Monitoring Dashboards in the Age of Big Data Pollution

Rigor

MAY 22, 2019

Big data is like the pollution of the information age. The Big Data Struggle and Performance Reporting. As the big data era brings in multiple options for visualization, it has become apparent that not all solutions are created equal. Conclusion.

Big Data

Big Data Monitoring Performance Metrics

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

Then, big data analytics technologies, such as Hadoop, NoSQL, Spark, or Grail, the Dynatrace data lakehouse technology, interpret this information. Here are the six steps of a typical ITOA process : Define the data infrastructure strategy. Why use a data lakehouse for causal AI? Why is ITOA important? Apache Spark.

Analytics

Analytics Artificial Intelligence Big Data Open Source

An overview of end-to-end entity resolution for big data

The Morning Paper

DECEMBER 13, 2020

An overview of end-to-end entity resolution for big data , Christophides et al., It’s an important part of many modern data workflows, and an area I’ve been wrestling with in one of my own projects. ACM Computing Surveys, Dec. 2020, Article No.

Big Data

Big Data Open Source Processing Analytics

Moving HPC to the Cloud: A Guide for 2020

High Scalability

SEPTEMBER 14, 2020

This is a guest post by Limor Maayan-Wainstein , a senior technical writer with 10 years of experience writing about cybersecurity, big data, cloud computing, web development, and more. High performance computing (HPC) enables you to solve complex problems which cannot be solved by regular computing.

Cloud

Cloud Big Data Virtualization Efficiency

Introduction to Azure Data Lake Storage Gen2

DZone

FEBRUARY 1, 2023

Built on Azure Blob Storage, Azure Data Lake Storage Gen2 is a suite of features for big data analytics. Azure Data Lake Storage Gen1 and Azure Blob Storage's capabilities are combined in Data Lake Storage Gen2. For instance, Data Lake Storage Gen2 offers scale, file-level security, and file system semantics.

Azure

Azure Storage Big Data Analytics

Scaling for Success: Why Scalability Is the Forefront of Modern Applications

DZone

JUNE 13, 2023

The reason is straightforward, today, applications generate enormous amounts of data. As we embrace new technologies like cloud computing, big data analysis, and the Internet of Things (IoT), there is a noticeable spike in the amount of data generated from different applications.

Scalability

Scalability IoT Big Data Internet

Understanding gRPC Concepts, Use Cases, and Best Practices

DZone

JANUARY 19, 2023

Because with the advent of cloud providers, we are less worried about managing data centers. This leads to an increase in the size of data as well. Big data is generated and transported using various mediums in single requests. Everything is available within seconds on-demand.

Best Practices

Best Practices Transportation Big Data Latency

Auto-Diagnosis and Remediation in Netflix Data Platform

The Netflix TechBlog

JANUARY 13, 2022

This blog will explore these two systems and how they perform auto-diagnosis and remediation across our Big Data Platform and Real-time infrastructure. The streaming platform recently added Data Mesh , and we need to expand Streaming Pensive to cover that.

Big Data

Big Data Infrastructure Metrics Games

What is software automation? Optimize the software lifecycle with intelligent automation

Dynatrace

JUNE 26, 2023

Software analytics offers the ability to gain and share insights from data emitted by software systems and related operational processes to develop higher-quality software faster while operating it efficiently and securely. This involves big data analytics and applying advanced AI and machine learning techniques, such as causal AI.

Software

Software Software Analytics Big Data

Stream Processing vs. Batch Processing: What to Know

DZone

JANUARY 31, 2023

Big data is at the center of all business decisions these days. It refers to large volumes of data generated through different sources, and this data then provides the foundation for business decisions. There are different ways through which we can process data.

Processing

Processing Big Data Systems

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

JULY 15, 2021

I was later hired into my first purely data gig where I was able to deepen my knowledge of big data. After that, I joined MySpace back at its peak as a data engineer and got my first taste of data warehousing at internet-scale. In the data engineering space, very little of the same technology remains.

Data Engineering

Data Engineering Engineering Entertainment Big Data

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

The Netflix TechBlog

OCTOBER 28, 2021

Netflix’s unique work culture and petabyte-scale data problems are what drew me to Netflix. During earlier years of my career, I primarily worked as a backend software engineer, designing and building the backend systems that enable big data analytics.

Data Engineering

Data Engineering Engineering Big Data Software Engineering

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

Our customers have frequently requested support for this first new batch of services, which cover databases, big data, networks, and computing. See the health of your big data resources at a glance. Azure HDInsight supports a broad range of use cases including data warehousing, machine learning, and IoT analytics.

Azure

Azure Cloud Big Data Virtualization

Experiences with approximating queries in Microsoft’s production big-data clusters

The Morning Paper

SEPTEMBER 8, 2019

Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of users to run some pretty complex queries. VLDB’19. For the larger more production-like query analysed in §4.2.1,

Big Data

Big Data Analytics Latency Azure

Giving data a heartbeat

Dynatrace

SEPTEMBER 9, 2019

I still love data, but I am starting to love emotion-filled data. Big” data helps us make the right decisions and focus on the right things. It takes data back to its human roots and brings human data to our big data world. The post Giving data a heartbeat appeared first on Dynatrace blog.

Big Data

Big Data Metrics Virtualization Network

How to Optimize Elasticsearch for Better Search Performance

DZone

JULY 29, 2019

In today's world, data is generated in high volumes and to make something out of it, extracted data is needed to be transformed, stored, maintained, governed and analyzed. These processes are only possible with a distributed architecture and parallel processing mechanisms that Big Data tools are based on.

Big Data

Big Data Government Open Source Storage

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Netflix’s diverse data landscape made it challenging to capture all the right data and conforming it to a common data model. Spark is the primary big-data compute engine at Netflix and with pretty much every upgrade in Spark, the spark plan changed as well springing continuous and unexpected surprises for us.

Infrastructure

Infrastructure Big Data Transportation Architecture

Path to NoOps part 1: How modern AIOps brings NoOps within reach

Dynatrace

OCTOBER 25, 2022

“AIOps platforms address IT leaders’ need for operations support by combining big data and machine learning functionality to analyze the ever-increasing volume, variety and velocity of data generated by IT in response to digital transformation.” – Gartner Market Guide for AIOps platforms.

DevOps

DevOps Big Data Cloud Innovation

What is IT automation?

Dynatrace

JULY 6, 2022

This kind of automation can support key IT operations, such as infrastructure, digital processes, business processes, and big-data automation. Big data automation tools. These tools provide the means to collect, transfer, and process large volumes of data that are increasingly common in analytics applications.

Artificial Intelligence

Artificial Intelligence Tuning Strategy Big Data

Optimizing dbt and Google’s BigQuery

DZone

DECEMBER 21, 2020

Setting up a data warehouse is the first step towards fully utilizing big data analysis. Still, it is one of many that need to be taken before you can generate value from the data you gather. An important step in that chain of the process is data modeling and transformation.

Big Data

Big Data Google Scalability Processing

What Should You Know About Graph Database’s Scalability?

DZone

JANUARY 20, 2023

Having a distributed and scalable graph database system is highly sought after in many enterprise scenarios.

Scalability

Scalability Big Data Hardware Internet

EDI and API: Which Trends Are Transforming the Modern Supply Chain Management?

DZone

JULY 22, 2022

Honestly, these two terms have recently been doing rounds in the big data world. These technologies specialize in transmitting large amounts of data across different trading partners and companies.

Big Data

Big Data Technology Technology Systems

Top 15 Software Testing Trends to Watch Out in 2021

DZone

DECEMBER 28, 2020

Nowadays, Big Data tests mainly include data testing, paving the way for the Internet of Things to become the center point. Digital transformation is yet another significant focus point for the sectors and the enterprises that are ranking top on cloud and business analytics. Besides, AI and ML seem to reach a new level.

Software

Software Software Testing Big Data

Big / Bug Data: Analyzing the Apache Flink Source Code

DZone

DECEMBER 21, 2020

Applications used in the field of Big Data process huge amounts of information, and this often happens in real time. Naturally, such applications must be highly reliable so that no error in the code can interfere with data processing. It is an open-source framework for distributed processing of large amounts of data.

Code

Code Java Big Data Open Source

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

The Netflix TechBlog

JULY 21, 2022

We at Netflix, as a streaming service running on millions of devices, have a tremendous amount of data about device capabilities/characteristics and runtime data in our big data platform. With large data, comes the opportunity to leverage the data for predictive and classification based analysis.

Big Data

Big Data Cache Engineering Data Engineering

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Dynatrace

OCTOBER 4, 2022

While data lakehouses combine the flexibility and cost-efficiency of data lakes with the querying capabilities of data warehouses, it’s important to understand how these storage environments differ. Data warehouses. Data warehouses were the original big data storage option.

Artificial Intelligence

Artificial Intelligence Storage Analytics Government

Python at Netflix

The Netflix TechBlog

APRIL 29, 2019

Orchestration The Big Data Orchestration team is responsible for providing all of the services and tooling to schedule and execute ETL and Adhoc pipelines. These libraries are the primary way users interface programmatically with work in the Big Data platform.

Open Source

Open Source Network Infrastructure Big Data

DynatraceGo! APAC 2021: Lessons in thick data and keeping pace with the market

Dynatrace

AUGUST 10, 2021

She dispelled the myth that more big data equals better decisions, higher profits, or more customers. Investing in data is easy but using it is really hard”. The fact is, data on its own isn’t meaningful. Tricia quoted the statistic that companies typically use 3% of their data to inform decisions.

DevOps

DevOps Innovation Big Data Cloud

What is Greenplum Database? Intro to the Big Data Database

Data Storage Formats for Big Data Analytics: Performance and Cost Implications of Parquet, Avro, and ORC

Trending Sources

3 Performance Tricks for Dealing With Big Data Sets

Cutting Big Data Costs: Effective Data Processing With Apache Spark

In-Stream Big Data Processing

Write Optimized Spark Code for Big Data Applications

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

How Amazon is solving big-data challenges with data lakes

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Databook: Turning Big Data into Knowledge with Metadata at Uber

Sustainability: Thoughts from a software engineer

Driving down the cost of Big-Data analytics - All Things Distributed

Kubernetes for Big Data Workloads

DataCentral: Uber’s Big Data Observability and Chargeback Platform

Microsoft Azure Event Hubs

Master the Art of Querying Data on Amazon S3

Performance Monitoring Dashboards in the Age of Big Data Pollution

What is IT operations analytics? Extract more data insights from more sources

An overview of end-to-end entity resolution for big data

Moving HPC to the Cloud: A Guide for 2020

Introduction to Azure Data Lake Storage Gen2

Scaling for Success: Why Scalability Is the Forefront of Modern Applications

Understanding gRPC Concepts, Use Cases, and Best Practices

Auto-Diagnosis and Remediation in Netflix Data Platform

What is software automation? Optimize the software lifecycle with intelligent automation

Stream Processing vs. Batch Processing: What to Know

Data Engineers of Netflix?—?Interview with Kevin Wylie

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Experiences with approximating queries in Microsoft’s production big-data clusters

Giving data a heartbeat

How to Optimize Elasticsearch for Better Search Performance

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Path to NoOps part 1: How modern AIOps brings NoOps within reach

What is IT automation?

Optimizing dbt and Google’s BigQuery

What Should You Know About Graph Database’s Scalability?

EDI and API: Which Trends Are Transforming the Modern Supply Chain Management?

Top 15 Software Testing Trends to Watch Out in 2021

Big / Bug Data: Analyzing the Apache Flink Source Code

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

What is a data lakehouse? Combining data lakes and warehouses for the best of both worlds

Python at Netflix

DynatraceGo! APAC 2021: Lessons in thick data and keeping pace with the market

Stay Connected