Remove Data Engineering Remove Latency Remove Scalability
article thumbnail

Supporting Diverse ML Systems at Netflix

The Netflix TechBlog

In addition to Spark, we want to support last-mile data processing in Python, addressing use cases such as feature transformations, batch inference, and training. Occasionally, these use cases involve terabytes of data, so we have to pay attention to performance. Internally, we use a production workflow orchestrator called Maestro.

Systems 238
article thumbnail

Optimizing data warehouse storage

The Netflix TechBlog

Some of the optimizations are prerequisites for a high-performance data warehouse. Sometimes Data Engineers write downstream ETLs on ingested data to optimize the data/metadata layouts to make other ETL processes cheaper and faster. Both automatic (event-driven) as well as manual (ad-hoc) optimization.

Storage 214
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

1pm-2pm NFX 207 Benchmarking stateful services in the cloud Vinay Chella , Data Platform Engineering Manager Abstract : AWS cloud services make it possible to achieve millions of operations per second in a scalable fashion across multiple regions. We explore all the systems necessary to make and stream content from Netflix.

AWS 38
article thumbnail

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

Whether in analyzing A/B tests, optimizing studio production, training algorithms, investing in content acquisition, detecting security breaches, or optimizing payments, well structured and accurate data is foundational. Users configure the workflow to read the data in a window (e.g. data arrives too late to be useful).

article thumbnail

Presentation: Azure Cosmos DB: Low Latency and High Availability at Planet Scale

InfoQ

Mei-Chin Tsai, Vinod discuss the internal architecture of Azure Cosmos DB and how it achieves high availability, low latency, and scalability. By Mei-Chin Tsai, Vinod Sridharan

Latency 52
article thumbnail

These 7 Edge Data Challenges Will Test Companies the Most in 2025

VoltDB

Edge computing has transformed how businesses and industries process and manage data. By bringing computation closer to the data source, edge-based deployments reduce latency, enhance real-time capabilities, and optimize network bandwidth. Increased latency during peak loads. High costs of training and retaining talent.

IoT 52
article thumbnail

Netflix at AWS re:Invent 2019

The Netflix TechBlog

1pm-2pm NFX 207 Benchmarking stateful services in the cloud Vinay Chella , Data Platform Engineering Manager Abstract : AWS cloud services make it possible to achieve millions of operations per second in a scalable fashion across multiple regions. We explore all the systems necessary to make and stream content from Netflix.

AWS 100