Big Data, Data and Database - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum Database is a massively parallel processing (MPP) SQL database that is built and based on PostgreSQL. It can scale towards a multi-petabyte level data workload without a single issue, and it allows access to a cluster of powerful servers that will work together within a single SQL interface where you can view all of the data.

Big Data

Big Data Database Artificial Intelligence Open Source

3 Performance Tricks for Dealing With Big Data Sets

DZone

AUGUST 21, 2021

This article describes 3 different tricks that I used in dealing with big data sets (order of 10 million records) and that proved to enhance performance dramatically. Trick 1: CLOB Instead of Result Set.

Big Data

Big Data Performance Tuning Mobile

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

Scalegrid

NOVEMBER 25, 2019

ScyllaDB is an open-source distributed NoSQL data store, reimplemented from the popular Apache Cassandra database. We’ve heard a lot about this rising database from the DBA community and our users, and decided to become a sponsor for this years Scylla Summit to learn more about the deployment trends from its users.

Big Data

Big Data Database Open Source Azure

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. This system has been designed to supplement and succeed the existing Hadoop-based system that had too high latency of data processing and too high maintenance costs.

Big Data

Big Data Processing Lambda Database

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. At the same time, NoSQL data modeling is not so well studied and lacks the systematic theory found in relational databases. Many techniques that are described below are perfectly applicable to this model.

Database

Database Ecommerce Efficiency Engineering

Data Movement in Netflix Studio via Data Mesh

The Netflix TechBlog

JULY 26, 2021

This happens at an unprecedented scale and introduces many interesting challenges; one of the challenges is how to provide visibility of Studio data across multiple phases and systems to facilitate operational excellence and empower decision making. With the latest Data Mesh Platform, data movement in Netflix Studio reaches a new stage.

Big Data

Big Data Government Processing Analytics

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

IT operations analytics is the process of unifying, storing, and contextually analyzing operational data to understand the health of applications, infrastructure, and environments and streamline everyday operations. ITOA collects operational data to identify patterns and anomalies for faster incident management and near-real-time insights.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can

Infrastructure

Infrastructure Big Data Transportation Architecture

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Data Engineers of Netflix?—?Interview with Kevin Wylie

The Netflix TechBlog

JULY 15, 2021

Data Engineers of Netflix?—?Interview Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin Wylie is a Data Engineer on the Content Data Science and Engineering team.

Data Engineering

Data Engineering Engineering Entertainment Big Data

Understanding the Database Connection Pool (DBCP) Properties

DZone

APRIL 15, 2022

Recently, I faced an issue related to a very high load on the database layer. The database was having too many connections in parallel. I had to review my application’s database connection pool (DBCP) properties very closely.

Database

Database Code Big Data Performance

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Dynatrace

FEBRUARY 16, 2023

How do you get more value from petabytes of exponentially exploding, increasingly heterogeneous data? The short answer: The three pillars of observability—logs, metrics, and traces—converging on a data lakehouse. To solve this problem, Dynatrace launched Grail, its causational data lakehouse , in 2022.

Analytics

Analytics Innovation Metrics Database

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Modern organizations ingest petabytes of data daily, but legacy approaches to log analysis and management cannot accommodate this volume of data. based financial services group, discussed how the bank uses log monitoring on the Dynatrace platform with an emphasis on observability and security data.

Analytics

Analytics Infrastructure Storage Architecture

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. We built AutoOptimize to efficiently and transparently optimize the data and metadata storage layout while maximizing their cost and performance benefits.

Storage

Storage Latency Efficiency Data Engineering

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

The study analyzes factual Kubernetes production data from thousands of organizations worldwide that are using the Dynatrace Software Intelligence Platform to keep their Kubernetes clusters secure, healthy, and high performing. The strongest Kubernetes growth areas are security, databases, and CI/CD technologies. Java, Go, and Node.js

Open Source

Open Source Java Operating System Programming

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud. However, this cannot be done without efficient, scalable data analytics.

Big Data

Big Data Analytics AWS Cloud

Probabilistic Data Structures for Web Analytics and Data Mining

Highly Scalable

MAY 1, 2012

Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce.

Analytics

Analytics Traffic Big Data Efficiency

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Dynatrace

JULY 6, 2020

In addition to providing visibility for core Azure services like virtual machines, load balancers, databases, and application services, we’re happy to announce support for the following 10 new Azure services, with many more to come soon: Virtual Machines (classic ones). Effortlessly optimize Azure database performance.

Azure

Azure Cloud Big Data Virtualization

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations.

Java

Java Scalability Traffic Architecture

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

As cloud and big data complexity scales beyond the ability of traditional monitoring tools to handle, next-generation cloud monitoring and observability are becoming necessities for IT teams. With agent monitoring, third-party software collects data and reports from the component that’s attached to the agent.

Cloud

Cloud Monitoring Best Practices Infrastructure

What Should You Know About Graph Database’s Scalability?

DZone

JANUARY 20, 2023

Having a distributed and scalable graph database system is highly sought after in many enterprise scenarios. Do Not Be Misled Designing and implementing a scalable graph database system has never been a trivial task.

Scalability

Scalability Big Data Hardware Internet

Mastering Distributed SQL™ Databases in 2025

Scalegrid

JANUARY 10, 2025

Heading into 2024, SQL databases will remain essential in data management, increasingly using distributed systems to meet growing needs for scalability and reliability. According to 2023 statistics, 49% of web applications use an SQL-based database , with SQL having a 75% adoption rate in the IT industry.

Database

Database Scalability Best Practices Blockchain

A guide to Autonomous Performance Optimization

Dynatrace

SEPTEMBER 15, 2020

Stefano started his presentation by showing how much cost and performance optimization is possible when knowing how to properly configure your application runtimes, databases, or cloud environments: Correct configuration of JVM parameters can save up to 75% resource utilization while delivering same or better performance!

Performance

Performance Java Metrics Cloud

What is APM?

Dynatrace

JUNE 1, 2020

The variables that can impact the performance of an application vary; from coding errors or ‘bugs’ in the software, database slowdowns, hosting and network performance, to operating system and device type support. And I’m sure we’ve all experienced frustration when an application crashes, is slow to load, or doesn’t load at all.

Artificial Intelligence

Artificial Intelligence Social Media Monitoring IoT

Data Mining Problems in Retail

Highly Scalable

MARCH 10, 2015

Retail is one of the most important business domains for data science and data mining applications because of its prolific data and numerous optimization problems such as optimal prices, discounts, recommendations, and stock levels that can be solved using data analysis methods.

Retail

Retail C++ Analytics Metrics

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

Dynatrace

JUNE 29, 2022

Hybrid cloud architecture is a computing environment that shares data and applications on a combination of public clouds and on-premises private clouds. A hybrid cloud, however, combines public infrastructure and services with on-premises resources or a private data center to create a flexible, interconnected IT environment.

Infrastructure

Infrastructure Cloud Azure AWS

RSA Guide 2023: Cloud application security remains core challenge for organizations

Dynatrace

APRIL 11, 2023

Log4Shell required many organizations to take devices and applications offline to prevent malicious attackers from gaining access to IT systems and sensitive data. As a result, organizations need to be vigilant in identifying and addressing vulnerabilities to protect their systems and data.

Cloud

Cloud DevOps Open Source Retail

Optimizing anomaly detection and noise

Dynatrace

MARCH 11, 2021

In the fourth part of the series, I’ll show you how I used Dynatrace’s raw problem and event data to find the best fit for optimized anomaly detection settings. I took a big-data-analysis approach, which started with another problem visualization. Statistically analyzing Dynatrace’s event and problem data.

Tuning

Tuning Architecture Monitoring Big Data

MySQL vs MongoDB: Best Choice for You

Scalegrid

FEBRUARY 11, 2025

Choosing the right database often comes down to MongoDB vs MySQL. This article will help you understand the core differences in data structure, scalability, and use cases. Whether you need a relational database for complex transactions or a NoSQL database for flexible data storage, weve got you covered.

Scalability

Scalability Database Storage IoT

DynamoDB for Location Data: Geospatial querying on DynamoDB datasets

All Things Distributed

SEPTEMBER 5, 2013

Over the past few years, two important trends that have been disrupting the database industry are mobile applications and big data. The explosive growth in mobile devices and mobile apps is generating a huge amount of data, which has fueled the demand for big data services and for high scale databases.

Big Data

Big Data Mobile Latency Database

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and manageable. Understanding distributed storage is imperative as data volumes and the need for robust storage solutions rise.

Storage

Storage Systems Big Data Azure

What is Application Performance Monitoring?

Dynatrace

JUNE 1, 2020

The variables that can impact the performance of an application vary; from coding errors or ‘bugs’ in the software, database slowdowns, hosting and network performance, to operating system and device type support. And I’m sure we’ve all experienced frustration when an application crashes, is slow to load, or doesn’t load at all.

Monitoring

Monitoring Performance Social Media Artificial Intelligence

DROAM - Dreaming about Cheap Data Roaming - All Things.

All Things Distributed

JANUARY 11, 2011

DROAM - Dreaming about Cheap Data Roaming. The one thing that I have always struggled with during my travels are the data plans of the cell phone companies. This international data mess has been a frequent conversation topic with fellow travelers and no one has a good, simple and reliable solution. Comments ().

Wireless

Wireless AWS Internet Internet

Comparing Apache Ignite In-Memory Cache Performance With Hazelcast In-Memory Cache and Java Native Hashmap

DZone

MARCH 16, 2020

This article compares different options for the in-memory maps and their performances in order for an application to move away from traditional RDBMS tables for frequently accessed data. The migration will enable the application to quickly lookup in the map and vet the physician rather than querying the database table for vetting.

Cache

Cache Java Performance Database

Job Openings in AWS - Senior Leader in Database Services - All.

All Things Distributed

AUGUST 19, 2011

Job Openings in AWS - Senior Leader in Database Services. This week it is an opening for senior leaders with AWS Database Services. AWS Database Services is responsible for setting the database strategy and delivering distributed structured storage services to our AWS customers. Comments (). Contact Info. Werner Vogels.

AWS

AWS Database Storage Scalability

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

We live in a world where massive volumes of data are generated from websites, connected devices and mobile apps. In such a data intensive environment, making key business decisions such as running marketing and sales campaigns, logistic planning, financial analysis and ad targeting require deriving insights from these data.

Cloud

Cloud Big Data AWS Analytics

Helios: hyperscale indexing for the cloud & edge – part 1

The Morning Paper

OCTOBER 26, 2020

On the surface this is a paper about fast data ingestion from high-volume streams, with indexing to support efficient querying. Helios also serves as a reference architecture for how Microsoft envisions its next generation of distributed big-data processing systems being built. PVLDB’20. Emphasis mine ).

Cloud

Cloud Big Data Latency Architecture

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios.

Cache

Cache Storage Architecture Scalability

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

The Netflix TechBlog

FEBRUARY 16, 2021

This critical insight helped us re-envision the SKU catalog as a seamless, scalable platform that empowers our stakeholders to make rapid changes with confidence while the platform ensures suitable guardrails for data accuracy and integrity. SKUDB: SKU catalog data was migrated from the metadata configuration files to a relational database.

Mobile

Mobile Engineering Infrastructure Scalability

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

The Morning Paper

MAY 14, 2019

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices Gan et al., Seer uses a lightweight RPC-level tracing system to collect request traces and aggregate them in a Cassandra database. ASPLOS’19. Seer in action.

Big Data

Big Data Cloud Performance Hardware

Why MySQL Could Be Slow With Large Tables

Percona

JANUARY 19, 2023

Some startups adopted MySQL in its early days such as Facebook, Uber, Pinterest, and many more, which are now big and successful companies that prove that MySQL can run on large databases and on heavily used sites. For instance, in Percona Managed Services , we have many clients with TBs worth of data that are well performant.

Open Source

Open Source Storage Database Big Data

Automating Physical Backups of MongoDB on Kubernetes

Percona

MARCH 15, 2023

We at Percona talk a lot about how Kubernetes Operators automate the deployment and management of databases. Operators seamlessly handle lots of Kubernetes primitives and database configuration bits and pieces, all to remove toil from operation teams and provide a self-service experience for developers.

Database

Database Big Data Processing Servers

The Need for Real-Time Device Tracking

ScaleOut Software

JULY 19, 2021

Incoming data is saved into data storage (historian database or log store) for query by operational managers who must attempt to find the highest priority issues that require their attention. The following diagram illustrates a typical workflow. What’s missing in this picture?

IoT

IoT Big Data Analytics Architecture

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

To our shareowners: Random forests, naÃ¯ve Bayesian estimators, RESTful services, gossip protocols, eventual consistency, data sharding, anti-entropy, Byzantine quorum, erasure coding, vector clocks. To do so, weve leaned heavily on the core principles from the distributed systems and database research communities and invented from there.

Technology

Technology Technology AWS Storage

What is Greenplum Database? Intro to the Big Data Database

3 Performance Tricks for Dealing With Big Data Sets

Trending Sources

ScyllaDB Trends – How Users Deploy The Real-Time Big Data Database

In-Stream Big Data Processing

NoSQL Data Modeling Techniques

Data Movement in Netflix Studio via Data Mesh

What is IT operations analytics? Extract more data insights from more sources

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Data Engineers of Netflix?—?Interview with Kevin Wylie

Understanding the Database Connection Pool (DBCP) Properties

Data lakehouse innovations advance the three pillars of observability for more collaborative analytics

Conducting log analysis with an observability platform and full data context

Optimizing data warehouse storage

Kubernetes in the wild report 2023

Driving down the cost of Big-Data analytics - All Things Distributed

Probabilistic Data Structures for Web Analytics and Data Mining

No need to compromise visibility in public clouds with the new Azure services supported by Dynatrace

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

What is cloud monitoring? How to improve your full-stack visibility

What Should You Know About Graph Database’s Scalability?

Mastering Distributed SQL™ Databases in 2025

A guide to Autonomous Performance Optimization

What is APM?

Data Mining Problems in Retail

Hybrid cloud infrastructure explained: Weighing the pros, cons, and complexities

RSA Guide 2023: Cloud application security remains core challenge for organizations

Optimizing anomaly detection and noise

MySQL vs MongoDB: Best Choice for You

DynamoDB for Location Data: Geospatial querying on DynamoDB datasets

What is a Distributed Storage System

What is Application Performance Monitoring?

DROAM - Dreaming about Cheap Data Roaming - All Things.

Comparing Apache Ignite In-Memory Cache Performance With Hazelcast In-Memory Cache and Java Native Hashmap

Job Openings in AWS - Senior Leader in Database Services - All.

Expanding the Cloud: Introducing Amazon QuickSight

Helios: hyperscale indexing for the cloud & edge – part 1

Redis vs Memcached in 2024

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

Why MySQL Could Be Slow With Large Tables

Automating Physical Backups of MongoDB on Kubernetes

The Need for Real-Time Device Tracking

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Stay Connected