Big Data, Development and Scalability - Technology Performance Pulse

What is Greenplum Database? Intro to the Big Data Database

Scalegrid

MAY 13, 2020

Greenplum Database is an open-source , hardware-agnostic MPP database for analytics, based on PostgreSQL and developed by Pivotal who was later acquired by VMware. This feature-packed database provides powerful and rapid analytics on data that scales up to petabyte volumes. What Exactly is Greenplum? At a glance – TLDR.

Big Data

Big Data Database Artificial Intelligence Open Source

Write Optimized Spark Code for Big Data Applications

DZone

MARCH 7, 2023

Apache Spark is a powerful open-source distributed computing framework that provides a variety of APIs to support big data processing. PySpark is the Python API for Apache Spark , which allows Python developers to write Spark applications using Python instead of Scala or Java.

Big Data

Big Data Code Tuning Open Source

In-Stream Big Data Processing

Highly Scalable

AUGUST 20, 2013

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. The article is based on a research project developed at Grid Dynamics Labs. In addition, we survey the current and emerging technologies and provide a few implementation tips.

Big Data

Big Data Processing Lambda Database

What is IT operations analytics? Extract more data insights from more sources

Dynatrace

MAY 1, 2023

Then, big data analytics technologies, such as Hadoop, NoSQL, Spark, or Grail, the Dynatrace data lakehouse technology, interpret this information. Here are the six steps of a typical ITOA process : Define the data infrastructure strategy. Identify data use cases and develop a scalable delivery model with documentation.

Analytics

Analytics Artificial Intelligence Big Data Open Source

Driving down the cost of Big-Data analytics - All Things Distributed

All Things Distributed

AUGUST 18, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Driving down the cost of Big-Data analytics. The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud.

Big Data

Big Data Analytics AWS Cloud

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

The Netflix TechBlog

MARCH 25, 2019

We adopted the following mission statement to guide our investments: “Provide a complete and accurate data lineage system enabling decision-makers to win moments of truth.” Nonetheless, Netflix data landscape (see below) is complex and many teams collaborate effectively for sharing the responsibility of our data system management.

Infrastructure

Infrastructure Big Data Transportation Architecture

What is cloud monitoring? How to improve your full-stack visibility

Dynatrace

JANUARY 11, 2023

As cloud and big data complexity scales beyond the ability of traditional monitoring tools to handle, next-generation cloud monitoring and observability are becoming necessities for IT teams. Measure cloud resource consumption to ensure resources are scalable and keep up with business requirements. What is cloud monitoring?

Cloud

Cloud Monitoring Best Practices Infrastructure

Moving HPC to the Cloud: A Guide for 2020

High Scalability

SEPTEMBER 14, 2020

This is a guest post by Limor Maayan-Wainstein , a senior technical writer with 10 years of experience writing about cybersecurity, big data, cloud computing, web development, and more. High performance computing (HPC) enables you to solve complex problems which cannot be solved by regular computing.

Cloud

Cloud Big Data Virtualization Efficiency

Kubernetes for Big Data Workloads

Abhishek Tiwari

DECEMBER 27, 2017

Kubernetes has emerged as go to container orchestration platform for data engineering teams. In 2018, a widespread adaptation of Kubernetes for big data processing is anitcipated. Organisations are already using Kubernetes for a variety of workloads [1] [2] and data workloads are up next. Key challenges. Performance.

Big Data

Big Data Storage Benchmarking Hardware

Kubernetes in the wild report 2023

Dynatrace

JANUARY 16, 2023

Through effortless provisioning, a larger number of small hosts provide a cost-effective and scalable platform. On-premises data centers invest in higher capacity servers since they provide more flexibility in the long run, while the procurement price of hardware is only one of many cost factors.

Open Source

Open Source Java Operating System Programming

A Recap of the Data Engineering Open Forum at Netflix

The Netflix TechBlog

JUNE 20, 2024

On April 18th, 2024, we hosted the inaugural Data Engineering Open Forum at our Los Gatos office, bringing together data engineers from various industries to share, learn, and connect. At the conference, our speakers share their unique perspectives on modern developments, immediate challenges, and future prospects of data engineering.

Data Engineering

Data Engineering Engineering Entertainment Software Engineering

What is container orchestration?

Dynatrace

MARCH 24, 2023

By embracing public cloud and hybrid cloud computing environments, IT teams can further accelerate development and automate software deployment and management. Container technology enables organizations to efficiently develop cloud-native applications or to modernize legacy applications to take advantage of cloud services.

Infrastructure

Infrastructure Open Source Operating System Cloud

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

The Netflix TechBlog

OCTOBER 18, 2022

As Big data and ML became more prevalent and impactful, the scalability, reliability, and usability of the orchestrating ecosystem have increasingly become more important for our data scientists and the company. Another dimension of scalability to consider is the size of the workflow.

Java

Java Scalability Traffic Architecture

How Netflix uses eBPF flow logs at scale for network insight

The Netflix TechBlog

JUNE 7, 2021

By Alok Tiagi , Hariharan Ananthakrishnan , Ivan Porto Carrero and Keerti Lakshminarayan Netflix has developed a network observability sidecar called Flow Exporter that uses eBPF tracepoints to capture TCP flows at near real time. The data is also used by security and other partner teams for insight and incident analysis.

Network

Network Transportation AWS Cloud

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Netflix TechBlog

OCTOBER 27, 2020

Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. The processed data is typically stored as data warehouse tables in AWS S3. Moving data with Bulldozer at Netflix.

Latency

Latency Storage Big Data Tuning

Ensuring Performance, Efficiency, and Scalability of Digital Transformation

Alex Podelko

DECEMBER 19, 2019

which would be great to attend to keep up with recent developments and their impact on my area. How is DevOps changing the Modern Software Development Landscape? , – Today’s hottest question for development – how we build performance engineering into continuous integration. a Panel Discussion.

Efficiency

Efficiency Artificial Intelligence Scalability Performance

What is a Distributed Storage System

Scalegrid

FEBRUARY 8, 2024

Key Takeaways Distributed storage systems benefit organizations by enhancing data availability, fault tolerance, and system scalability, leading to cost savings from reduced hardware needs, energy consumption, and personnel. These distributed storage services also play a pivotal role in big data and analytics operations.

Storage

Storage Systems Big Data Azure

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

All Things Distributed

DECEMBER 13, 2016

With the launch of the AWS Europe (London) Region, AWS can enable many more UK enterprise, public sector and startup customers to reduce IT costs, address data locality needs, and embark on rapid transformations in critical new areas, such as big data analysis and Internet of Things. Fraud.net is a good example of this.

AWS

AWS Cloud Artificial Intelligence IoT

MySQL vs MongoDB: Best Choice for You

Scalegrid

FEBRUARY 11, 2025

This article will help you understand the core differences in data structure, scalability, and use cases. Whether you need a relational database for complex transactions or a NoSQL database for flexible data storage, weve got you covered. Choosing the right database often comes down to MongoDB vs MySQL.

Scalability

Scalability Database Storage IoT

What is RabbitMQ Used For

Scalegrid

JUNE 28, 2024

This system allows for scalability and efficiency, demonstrating RabbitMQ’s versatility in real-world applications where speed and reliability are crucial. This makes RabbitMQ an attractive option for developers and enterprises seeking to optimize their software architecture.

IoT

IoT Healthcare Programming Open Source

Mastering Distributed SQL™ Databases in 2025

Scalegrid

JANUARY 10, 2025

Heading into 2024, SQL databases will remain essential in data management, increasingly using distributed systems to meet growing needs for scalability and reliability. They keep the features that developers like but can handle much more data, similar to NoSQL systems.

Database

Database Scalability Best Practices Blockchain

Incremental Processing using Netflix Maestro and Apache Iceberg

The Netflix TechBlog

NOVEMBER 20, 2023

Whether in analyzing A/B tests, optimizing studio production, training algorithms, investing in content acquisition, detecting security breaches, or optimizing payments, well structured and accurate data is foundational. Backfill: Backfilling datasets is a common operation in big data processing. append, overwrite, etc.).

Processing

Processing Big Data Efficiency Engineering

Expanding the Cloud – An AWS Region is coming to Hong Kong

All Things Distributed

JUNE 20, 2017

After the launch of the AWS APAC (Hong Kong) Region, there will be 19 Availability Zones in Asia Pacific for customers to build flexible, scalable, secure, and highly available applications. They chose to use AWS in order to focus on developing their platform, instead of managing infrastructure.

AWS

AWS Logistics Cloud Social Media

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 14, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Scalability Engineering

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

All Things Distributed

APRIL 27, 2011

Werner Vogels weblog on building scalable and robust distributed systems. In the 2010 Shareholder Letter Jeff Bezos writes about the unique technologies developed at Amazon.com over the years. The storage systems weve pioneered demonstrate extreme scalability while maintaining tight control over performance, availability, and cost.

Technology

Technology Technology AWS Storage

New AWS feature: Run your website from Amazon S3 - All Things.

All Things Distributed

FEBRUARY 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. More details about the website feature of Amazon S3 can be found here and in Jeff Barrs blog post on the AWS developer blog. a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. All Things Distributed. Comments ().

AWS

AWS Website Storage Servers

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

APRIL 28, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Scalability Engineering

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

All Things Distributed

AUGUST 22, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Caching has become a standard component in many applications to achieve a fast and predictable performance, but maintaining a collection of cache servers in a reliable and scalable manner is not a simple task. Driving down the cost of Big-Data analytics.

Cloud

Cloud Cache AWS Storage

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

All Things Distributed

DECEMBER 8, 2016

Given this, enterprises, public sector bodies, startups, and small businesses are looking to adopt agile, scalable, and secure public cloud solutions. The new Canada (Central) Region offers a robust suite of infrastructure, management, and developer services that can enable innovators to deploy market-leading applications. Scalability.

AWS

AWS Cloud Lambda Innovation

Introducing the AWS South America - All Things Distributed

All Things Distributed

DECEMBER 14, 2011

Werner Vogels weblog on building scalable and robust distributed systems. These companies can now benefit from the fact that the new Sao Paulo Region is similar to all other AWS Regions, which enables software developed for other Regions to be quickly deployed in South America as well. Driving down the cost of Big-Data analytics.

AWS

AWS Latency Storage Cloud

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

All Things Distributed

JANUARY 19, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Flexibility is one of the key principles of Amazon Web Services - developers can select any programming language and software package, any operating system, any middleware and any database to build systems and applications that meet their requirements.

AWS

AWS Cloud Java Scalability

Redis vs Memcached in 2024

Scalegrid

MARCH 28, 2024

In this comparison of Redis vs Memcached, we strip away the complexity, focusing on each in-memory data store’s performance, scalability, and unique features. Redis is better suited for complex data models, and Memcached is better suited for high-throughput, string-based caching scenarios. Data transfer technology.

Cache

Cache Storage Architecture Scalability

Mastering Hybrid Cloud Strategy

Scalegrid

MARCH 14, 2024

This approach allows companies to combine the security and control of private clouds with public clouds’ scalability and innovation potential. Developing Your Hybrid Cloud Strategy When devising a strategy for a hybrid cloud, numerous critical elements must be considered. A hybrid cloud strategy could be your answer.

Strategy

Strategy Cloud Artificial Intelligence Infrastructure

No Server Required - Jekyll & Amazon S3 - All Things Distributed

All Things Distributed

AUGUST 17, 2011

Werner Vogels weblog on building scalable and robust distributed systems. If you have a largely static site you can rely on the enormous power of S3 to make serving your content highly scalable and storing it extremely durable. Cactus is a static website generator developed by Koen Bok of Made By Sofa (recently acquired by Facebook ).

Servers

Servers Social Media AWS Website

Expanding the Cloud: Introducing Amazon QuickSight

All Things Distributed

OCTOBER 7, 2015

In such a data intensive environment, making key business decisions such as running marketing and sales campaigns, logistic planning, financial analysis and ad targeting require deriving insights from these data. However, the data infrastructure to collect, store and process data is geared toward developers (e.g.,

Cloud

Cloud Big Data AWS Analytics

40+ Best Web Development Blogs of 2018

KeyCDN

OCTOBER 2, 2018

In the world of web development, those who become experts usually do so by learning from their predecessors. Reading and following the right web development blogs makes it much easier to get a solid education. That’s why we’ve compiled an exhaustive list of web development blogs and newsletters to make this process easier.

Development

Development Website Design Code

Välkommen till Stockholm – An AWS Region is coming to the Nordics

All Things Distributed

APRIL 4, 2017

After the launch of the AWS EU (Stockholm) Region, there will be 13 Availability Zones in Europe for customers to build flexible, scalable, secure, and highly available applications. It will also give customers another region where they can store their data with the knowledge that it will not leave the EU unless they move it.

AWS

AWS Airlines Latency Games

The AWS GovCloud (US) Region - All Things Distributed

All Things Distributed

AUGUST 16, 2011

Werner Vogels weblog on building scalable and robust distributed systems. Government and Big Data. One particular early use case for AWS GovCloud (US) will be massive data processing and analytics. The scalability, flexibility and the elasticity of AWS makes it an ideal environment for the agencies to run their analytics.

AWS

AWS Government Big Data Cloud

Simplifying IT - Create Your Application with AWS CloudFormation.

All Things Distributed

FEBRUARY 25, 2011

Werner Vogels weblog on building scalable and robust distributed systems. They had taken the approach that they would not only be offering their software as a scalable multi-tenant product but also as a single tenant environment for customers that want to have their own isolated environment. All Things Distributed. Comments ().

AWS

AWS Cloud Scalability Storage

Expanding the Cloud ? introducing the Asia Pacific (Sydney) Region.

All Things Distributed

NOVEMBER 12, 2012

Werner Vogels weblog on building scalable and robust distributed systems. These companies can now benefit from the fact that the new Asia Pacific (Sydney) Region is similar to all other AWS Regions, which enables software developed for other Regions to be quickly deployed in Australia as well. All Things Distributed. Comments ().

Cloud

Cloud AWS Ecommerce Latency

Optimizing data warehouse storage

The Netflix TechBlog

DECEMBER 21, 2020

At this scale, we can gain a significant amount of performance and cost benefits by optimizing the storage layout (records, objects, partitions) as the data lands into our warehouse. Ease of use: AutoOptimize provides a frictionless way to setup optimization with minimum maintenance overhead from Data Engineering.

Storage

Storage Latency Efficiency Data Engineering

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

High Scalability

MARCH 24, 2020

Scrapinghub is hiring a Senior Software Engineer (Big Data/AI). You will be designing and implementing distributed systems : large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc.

Education

Education Software Engineering Engineering Big Data

What is Greenplum Database? Intro to the Big Data Database

Write Optimized Spark Code for Big Data Applications

Trending Sources

In-Stream Big Data Processing

What is IT operations analytics? Extract more data insights from more sources

Driving down the cost of Big-Data analytics - All Things Distributed

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

What is cloud monitoring? How to improve your full-stack visibility

Moving HPC to the Cloud: A Guide for 2020

Kubernetes for Big Data Workloads

Kubernetes in the wild report 2023

A Recap of the Data Engineering Open Forum at Netflix

What is container orchestration?

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

How Netflix uses eBPF flow logs at scale for network insight

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Ensuring Performance, Efficiency, and Scalability of Digital Transformation

What is a Distributed Storage System

Expanding the AWS Cloud: Introducing the AWS Europe (London) Region

MySQL vs MongoDB: Best Choice for You

What is RabbitMQ Used For

Mastering Distributed SQL™ Databases in 2025

Incremental Processing using Netflix Maestro and Apache Iceberg

Expanding the Cloud – An AWS Region is coming to Hong Kong

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

The Amazon.com 2010 Shareholder Letter Focusses on Technology.

New AWS feature: Run your website from Amazon S3 - All Things.

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Expanding the Cloud - Introducing Amazon ElastiCache - All Things.

Expanding the AWS Cloud: Introducing the AWS Canada (Central) Region

Sponsored Post: InterviewCamp.io, Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Introducing the AWS South America - All Things Distributed

AWS Elastic Beanstalk: A Quick and Simple Way into the Cloud - All.

Redis vs Memcached in 2024

Mastering Hybrid Cloud Strategy

No Server Required - Jekyll & Amazon S3 - All Things Distributed

Expanding the Cloud: Introducing Amazon QuickSight

40+ Best Web Development Blogs of 2018

Välkommen till Stockholm – An AWS Region is coming to the Nordics

The AWS GovCloud (US) Region - All Things Distributed

Simplifying IT - Create Your Application with AWS CloudFormation.

Expanding the Cloud ? introducing the Asia Pacific (Sydney) Region.

Optimizing data warehouse storage

Post: Scrapinghub, Fauna, Sisu, Educative, PA File Sight, Etleap, Triplebyte, Stream

Stay Connected