Big Data, Engineering and Scalability - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

This feature-packed database provides powerful and rapid analytics on data that scales up to petabyte volumes. Greenplum uses an MPP database design that can help you develop a scalable, high performance deployment. High performance, query optimization, open source and polymorphic data storage are the major Greenplum advantages.

Big Data

Big Data Database Artificial Intelligence Open Source

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The design of the in-stream processing engine itself was driven by the following requirements: SQL-like functionality. Strict fault-tolerance is a principal requirement for the engine.

Big Data

Big Data Processing Lambda Database

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

Then, big data analytics technologies, such as Hadoop, NoSQL, Spark, or Grail, the Dynatrace data lakehouse technology, interpret this information. Here are the six steps of a typical ITOA process : Define the data infrastructure strategy. Identify data use cases and develop a scalable delivery model with documentation.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Now, imagine yourself in the role of a software engineer responsible for a micro-service which publishes data consumed by few critical customer facing services (e.g. You are about to make structural changes to the data and want to know who and what downstream to your service will be impacted.

Infrastructure

Infrastructure Big Data Transportation Architecture

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

The Netflix TechBlog

JUNE 1, 2021

Data Engineers of Netflix?—?Interview Interview with Dhevi Rajendran Dhevi Rajendran This post is part of our “Data Engineers of Netflix” interview series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix.

Data Engineering

Data Engineering Engineering Software Engineering Big Data

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud.

Big Data

Big Data Analytics AWS Cloud

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

The Netflix TechBlog

FEBRUARY 16, 2021

Membership Engineering at Netflix is responsible for the plan and pricing configurations for every market worldwide. To solve the challenges mentioned above and meet our rapidly evolving business needs, we re-architected the legacy SKU catalog from the ground up and partnered with the Growth Engineering team to build a scalable SKU platform.

Mobile

Mobile Engineering Infrastructure Scalability

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

Causal AI—which brings AI-enabled actionable insights to IT operations—and a data lakehouse, such as Dynatrace Grail , can help break down silos among ITOps, DevSecOps, site reliability engineering, and business analytics teams. “It’s quite a big scale,” said an engineer at the financial services group.

Analytics

Analytics Infrastructure Storage Architecture

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

Latency

Latency Storage Big Data Tuning

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges.

Big Data

Big Data Storage Benchmarking Hardware

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

As Big data and ML became more prevalent and impactful, the scalability, reliability, and usability of the orchestrating ecosystem have increasingly become more important for our data scientists and the company. Another dimension of scalability to consider is the size of the workflow.

Java

Java Scalability Traffic Architecture

What is container orchestration?

Dynatrace

MARCH 24, 2023

Docker Swarm First introduced in 2014 by Docker, Docker Swarm is an orchestration engine that popularized the use of containers with developers. The Docker file format is used broadly for orchestration engines, and Docker Engine ships with Docker Swarm and Kubernetes frameworks included.

Infrastructure

Infrastructure Open Source Operating System Cloud

Optimizing dbt and Google’s BigQuery

DZone

DECEMBER 21, 2020

Setting up a data warehouse is the first step towards fully utilizing big data analysis. Still, it is one of many that need to be taken before you can generate value from the data you gather. An important step in that chain of the process is data modeling and transformation.

Big Data

Big Data Google Scalability Processing

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Most Kubernetes clusters in the cloud (73%) are built on top of managed distributions from the hyperscalers like AWS Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). Through effortless provisioning, a larger number of small hosts provide a cost-effective and scalable platform.

Open Source

Open Source Java Operating System Programming

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

Dynatrace

OCTOBER 4, 2022

Dynatrace built and optimized it for Davis® AI, the game-changing Dynatrace artificial intelligence engine that processes billions of dependencies in the blink of an eye. Grail addresses today’s challenges of big data and cloud everywhere: Grail is highly scalable, cost-effective, and super-fast.

Analytics

Analytics Artificial Intelligence Storage Serverless

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

Berkeley Packet Filter (BPF) is an in-kernel execution engine that processes a virtual instruction set, and has been extended as eBPF for providing a safe way to extend kernel functionality. The data is also used by security and other partner teams for insight and incident analysis. What is BPF?

Network

Network Transportation AWS Cloud

NoSQL Data Modeling Techniques

Highly Scalable

MARCH 1, 2012

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. This user-oriented nature had vast implications: The end user is often interested in aggregated reporting information, not in separate data items, and SQL pays a lot of attention to this aspect. 1) Denormalization.

Database

Database Ecommerce Efficiency Engineering

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

It also improves the engineering productivity by simplifying the existing pipelines and unlocking the new patterns. Whether in analyzing A/B tests, optimizing studio production, training algorithms, investing in content acquisition, detecting security breaches, or optimizing payments, well structured and accurate data is foundational.

Processing

Processing Big Data Efficiency Engineering

Ensuring Performance, Efficiency, and Scalability of Digital Transformation

Alex Podelko

DECEMBER 19, 2019

– Performance engineering as it done at Alibaba – which emerging as a major cloud provider. – Clearly a hot topic – and the most interesting point here would be how it is changing performance engineering. Meeting of the Minds: Performance Engineering. a Panel Discussion. You can’t always get what you want. .

Efficiency

Efficiency Artificial Intelligence Scalability Performance

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer! Learn to balance architecture trade-offs and design scalable enterprise-level software. Please apply here.

Education

Education Software Engineering Scalability Engineering

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). this is going to be a challenging journey for any backend engineer! this is going to be a challenging journey for any backend engineer! Learn to balance architecture trade-offs and design scalable enterprise-level software. Please apply here.

Education

Education Software Engineering Scalability Engineering

MySQL vs MongoDB: Best Choice for You

Scalegrid

FEBRUARY 11, 2025

This article will help you understand the core differences in data structure, scalability, and use cases. Whether you need a relational database for complex transactions or a NoSQL database for flexible data storage, weve got you covered. Choosing the right database often comes down to MongoDB vs MySQL.

Scalability

Scalability Database Storage IoT

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

However, the data infrastructure to collect, store and process data is geared toward developers (e.g., In AWS’ quest to enable the best data storage options for engineers, we have built several innovative database solutions like Amazon RDS, Amazon RDS for Aurora, Amazon DynamoDB, and Amazon Redshift. Big data challenges.

Cloud

Cloud Big Data AWS Analytics

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Werner Vogels weblog on building scalable and robust distributed systems. And while many of our systems are based on the latest in computer science research, this often hasnt been sufficient: our architects and engineers have had to advance research in directions that no academic had yet taken. All Things Distributed. Comments ().

Technology

Technology Technology AWS Storage

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

Some of the optimizations are prerequisites for a high-performance data warehouse. Sometimes Data Engineers write downstream ETLs on ingested data to optimize the data/metadata layouts to make other ETL processes cheaper and faster. Other Components Iceberg We use Apache Iceberg as the table format.

Storage

Storage Latency Efficiency Data Engineering

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

JULY 19, 2021

And it can maintain contextual information about every data source (like the medical history of a device wearer or the maintenance history of a refrigeration system) and keep it immediately at hand to enhance the analysis.

IoT

IoT Big Data Analytics Architecture

Dutch Enterprises and The Cloud

All Things Distributed

SEPTEMBER 6, 2013

Shell leverages AWS for big data analytics to help achieve these goals. Shell''s scientists, especially the geophysicists and drilling engineers, frequently use cloud computing to run models. Essent – supplies customers in the Benelux region with gas, electricity, heat and energy services.

Cloud

Cloud Energy AWS Healthcare

Amazon EC2 Cluster GPU Instances - All Things Distributed

All Things Distributed

NOVEMBER 15, 2010

Werner Vogels weblog on building scalable and robust distributed systems. A third generation of APIs, however, left the graphics specifics interfaces behind and instead focused on exposing the pipeline as a generic highly parallel engine supporting task and data parallelism. Driving down the cost of Big-Data analytics.

AWS

AWS Programming Latency Architecture

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

All Things Distributed

JULY 13, 2010

Werner Vogels weblog on building scalable and robust distributed systems. Cluster Computer Instances are similar to other Amazon EC2 instances but have been specifically engineered to provide high performance compute and networking. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. Comments ().

Cloud

Cloud AWS Automotive Latency

What is Greenplum Database? Intro to the Big Data Database

In-Stream Big Data Processing

Trending Sources

A Recap of the Data Engineering Open Forum at Netflix

What is IT operations analytics? Extract more data insights from more sources

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Data Engineers of Netflix?—?Interview with Dhevi Rajendran

Driving down the cost of Big-Data analytics - All Things Distributed

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

Conducting log analysis with an observability platform and full data context

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Kubernetes for Big Data Workloads

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

What is container orchestration?

Optimizing dbt and Google’s BigQuery

Kubernetes in the wild report 2023

Any analysis, any time: Dynatrace Log Management and Analytics powered by Grail

How Netflix uses eBPF flow logs at scale for network insight

NoSQL Data Modeling Techniques

Incremental Processing using Netflix Maestro and Apache Iceberg

Ensuring Performance, Efficiency, and Scalability of Digital Transformation

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

MySQL vs MongoDB: Best Choice for You

Expanding the Cloud: Introducing Amazon QuickSight

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

Optimizing data warehouse storage

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Mastering Hybrid Cloud Strategy

Expanding the Cloud – An AWS Region is coming to Hong Kong

Music to my Ears - All Things Distributed

Välkommen till Stockholm – An AWS Region is coming to the Nordics

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

No Server Required - Jekyll & Amazon S3 - All Things Distributed

Expanding the Cloud ? Managing Cold Storage with Amazon Glacier

Redis vs Memcached in 2024

Expanding the Cloud with DNS - Introducing Amazon Route 53 - All.

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

The Need for Real-Time Device Tracking

Dutch Enterprises and The Cloud

Amazon EC2 Cluster GPU Instances - All Things Distributed

Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

Stay Connected