article thumbnail

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

When handling large amounts of complex data, or big data, chances are that your main machine might start getting crushed by all of the data it has to process in order to produce your analytics results. Greenplum features a cost-based query optimizer for large-scale, big data workloads. Query Optimization.

Big Data 321
article thumbnail

An overview of end-to-end entity resolution for big data

The Morning Paper

An overview of end-to-end entity resolution for big data , Christophides et al., It’s an important part of many modern data workflows, and an area I’ve been wrestling with in one of my own projects. Learning-based methods train classifiers for pruning. ACM Computing Surveys, Dec. 2020, Article No.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber

Uber Engineering

Michelangelo , Uber’s machine learning (ML) platform, powers machine learning model training across various use cases at Uber, such as forecasting rider demand , fraud detection , food discovery and recommendation for Uber Eats , and improving the accuracy of … The post Productionizing Distributed XGBoost to Train Deep Tree Models with Large (..)

article thumbnail

Python at Netflix

The Netflix TechBlog

We also use Python to detect sensitive data using Lanius. Orchestration The Big Data Orchestration team is responsible for providing all of the services and tooling to schedule and execute ETL and Adhoc pipelines. These libraries are the primary way users interface programmatically with work in the Big Data platform.

article thumbnail

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data…

The Netflix TechBlog

Operational automation–including but not limited to, auto diagnosis, auto remediation, auto configuration, auto tuning, auto scaling, auto debugging, and auto testing–is key to the success of modern data platforms. In particular, we use a simple Feedforward Multilayer Perceptron (MLP) with two heads, one to predict each outcome.

Tuning 219
article thumbnail

What is IT automation?

Dynatrace

AI that is based on machine learning needs to be trained. This requires significant data engineering efforts, as well as work to build machine-learning models. This kind of automation can support key IT operations, such as infrastructure, digital processes, business processes, and big-data automation.

article thumbnail

Experiences with approximating queries in Microsoft’s production big-data clusters

The Morning Paper

Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of users to run some pretty complex queries. Creating training datasets for machine learning ! VLDB’19.