Big Data, Scalability and Tuning - Technology Performance Pulse

Write Optimized Spark Code for Big Data Applications

DZone

MARCH 7, 2023

Apache Spark is a powerful open-source distributed computing framework that provides a variety of APIs to support big data processing. In addition, pySpark applications can be tuned to optimize performance and achieve better execution time, scalability, and resource utilization.

Big Data

Big Data Code Tuning Open Source

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

The Netflix TechBlog

MAY 4, 2023

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. This technique facilitates validation on multiple fronts.

Traffic

Traffic Latency Tuning Systems

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

Central engineering teams provide paved paths (secure, vetted and supported options) and guard rails to help reduce variance in choices available for tools and technologies to support the development of scalable technical architectures.

Infrastructure

Infrastructure Big Data Transportation Architecture

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

Summary Providing network insight into the cloud network infrastructure using eBPF flow logs at scale is made possible with eBPF and a highly scalable and efficient flow collection pipeline. After several iterations of the architecture and some tuning, the solution has proven to be able to scale.

Network

Network Transportation AWS Cloud

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The processed data is typically stored as data warehouse tables in AWS S3. Moving data with Bulldozer at Netflix.

Latency

Latency Storage Big Data Tuning

Conducting log analysis with an observability platform and full data context

Dynatrace

APRIL 20, 2023

With the extent of observability data going beyond human capacity to manage, Grail is the first purpose-built causational data lakehouse that allows for immediate answers with cost-efficient, scalable storage. Business leaders can decide which logs they want to use and tune storage to their data needs.

Analytics

Analytics Infrastructure Storage Architecture

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

This talk will delve into the creative solutions Netflix deploys to manage this high-volume, real-time data requirement while balancing scalability and cost. Last but not least, thank you to the organizers of the Data Engineering Open Forum: Chris Colburn , Xinran Waibel , Jai Balani , Rashmi Shamprasad , and Patricia Ho.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

As Big data and ML became more prevalent and impactful, the scalability, reliability, and usability of the orchestrating ecosystem have increasingly become more important for our data scientists and the company. Another dimension of scalability to consider is the size of the workflow.

Java

Java Scalability Traffic Architecture

Mastering Distributed SQL™ Databases in 2025

Scalegrid

JANUARY 10, 2025

Heading into 2024, SQL databases will remain essential in data management, increasingly using distributed systems to meet growing needs for scalability and reliability. They keep the features that developers like but can handle much more data, similar to NoSQL systems.

Database

Database Scalability Best Practices Blockchain

Streaming SQL in Data Mesh

The Netflix TechBlog

NOVEMBER 3, 2023

This makes the query service lightweight, scalable, and execution agnostic. Stay tuned for more updates! Streaming SQL in Data Mesh was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Processing

Processing Engineering Infrastructure Latency

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

Whether in analyzing A/B tests, optimizing studio production, training algorithms, investing in content acquisition, detecting security breaches, or optimizing payments, well structured and accurate data is foundational. Backfill: Backfilling datasets is a common operation in big data processing.

Processing

Processing Big Data Efficiency Engineering

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

Orient: Gather tuning parameters for a particular table that changed. AutoOptimize relies on some of the Iceberg specific features such as snapshot and atomic operations to perform the optimizations in an accurate and scalable manner. AutoAnalyze In short, AutoAnalyze finds the best tuning/configuration parameters for a table.

Storage

Storage Latency Efficiency Data Engineering

Music to my Ears - All Things Distributed

All Things Distributed

MARCH 28, 2011

Werner Vogels weblog on building scalable and robust distributed systems. We see that with our Amazon customers; when they hear a great tune on a radio they may identify it using the Shazam or Soundhound apps on their mobile phone and buy that song instantly from the Amazon MP3 store. Driving down the cost of Big-Data analytics.

AWS

AWS Cloud Storage Internet

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 30, 2018

In this year's CFP we’re looking for topics covering the latest trends and best practices in cloud computing, containerization, machine learning, big data, infrastructure, scalability, DevOps, IT management, automation, reliability, monitoring, performance tuning, security, databases, programming, datacenters, and more.

DevOps

DevOps Network Best Practices Programming

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 29, 2018

In this year's CFP we’re looking for topics covering the latest trends and best practices in cloud computing, containerization, machine learning, big data, infrastructure, scalability, DevOps, IT management, automation, reliability, monitoring, performance tuning, security, databases, programming, datacenters, and more.

DevOps

DevOps Network Best Practices Programming

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

InfoQ

APRIL 15, 2024

He specifically delved into Venice DB, the NoSQL data store used for feature persistence. At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. By Rafal Gancarz

Artificial Intelligence

Artificial Intelligence Big Data Data Engineering Latency

World’s Top Web Performance Leaders To Watch

Rigor

SEPTEMBER 11, 2019

He has a keen interest in web technologies, performance tuning, security, and the practical use of technology. Doug is a freelance mobile performance expert, a popular speaker – particularly on the topic of web tuning and image optimization – and the author of High Performance Android Apps. Doug Sillars. Doug Sillars.

Performance

Performance Education Google Website

Will AWS Have Anything New To Say About Sustainability at re:Invent 2024?

Adrian Cockcroft

NOVEMBER 18, 2024

Learn how remote sensing, Internet of Things, and AI technologies on AWS can be used to detect and quantify methane sources, offering a cost-effective and efficient approach to scalable environmental monitoring. Discover how Scepter, Inc. Raman Pujani, Solutions Architect, AWS NOTE: This is an interesting new topic.

AWS

AWS Energy Lambda Government

Should You Use ClickHouse as a Main Operational Database?

Percona

JANUARY 14, 2019

However, ClickHouse is super efficient for timeseries and provides “sharding” out of the box (scalability beyond one node). Currently, an issue has been opened to make the “tailing” based on the primary key much faster: slow order by primary key with small limit on big data.

Database

Database Analytics Blockchain Healthcare

Technology Performance Pulse

Write Optimized Spark Code for Big Data Applications

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Trending Sources

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

How Netflix uses eBPF flow logs at scale for network insight

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Conducting log analysis with an observability platform and full data context

A Recap of the Data Engineering Open Forum at Netflix

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Mastering Distributed SQL™ Databases in 2025

Streaming SQL in Data Mesh

Incremental Processing using Netflix Maestro and Apache Iceberg

Optimizing data warehouse storage

Music to my Ears - All Things Distributed

USENIX LISA 2018: CFP Now Open

USENIX LISA 2018: CFP Now Open

QCon London: Lessons Learned From Building LinkedIn’s AI/ML Data Platform

World’s Top Web Performance Leaders To Watch

Will AWS Have Anything New To Say About Sustainability at re:Invent 2024?

Should You Use ClickHouse as a Main Operational Database?

Stay Connected