article thumbnail

Alipay: Large-Scale Model Training on Billions of Files

DZone

With the exponential growth of data, we create and optimize infrastructure that enables large-scale model training and overcomes the performance bottleneck while reducing the cost of data storage and computation. The group owns the world’s largest mobile payment platform Alipay, which serves over 1.3

Storage 223
article thumbnail

Optimizing InfiniBand Bandwidth Utilization for NVIDIA DGX Systems Using Software RAID Solutions

DZone

Objectives Modern AI innovations require proper infrastructure, especially concerning data throughput and storage capabilities. While GPUs drive faster results, legacy storage solutions often lag behind, causing inefficient resource utilization and extended times in completing the project.

Systems 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Why growing AI adoption requires an AI observability strategy

Dynatrace

And an O’Reilly Media survey indicated that two-thirds of survey respondents have already adopted generative AI —a form of AI that uses training data to create text, images, code, or other types of content that reflect its users’ natural language queries. AI requires more compute and storage. AI performs frequent data transfers.

Strategy 226
article thumbnail

Measuring the importance of data quality to causal AI success

Dynatrace

While this approach can be effective if the model is trained with a large amount of data, even in the best-case scenarios, it amounts to an informed guess, rather than a certainty. Because IT systems change often, AI models trained only on historical data struggle to diagnose novel events. That’s where causal AI can help.

article thumbnail

What is Cloud Computing? According to ChatGPT.

High Scalability

Cloud computing is a model of computing that delivers computing services over the internet, including storage, data processing, and networking. It allows users to access and use shared computing resources, such as servers, storage, and applications, on demand and without the need to manage the underlying infrastructure.

Cloud 201
article thumbnail

Causal AI use cases for modern observability that can transform any business

Dynatrace

The logs, metrics, traces, and other metadata that applications and infrastructure generate have historically been captured in separate data stores, creating poorly integrated data silos. Data lakehouses combine a data lake’s flexible storage with a data warehouse’s fast performance.

article thumbnail

Stay ahead of the game: Forecast IT capacity with Dynatrace Grail and Davis AI

Dynatrace

Some of our customers run tens of thousands of storage disks in parallel, all needing continuous resizing. Davis AI analyzes the selected time series, automatically chooses the best prediction model based on the characteristics of the time series, and then trains a prediction model.

Games 218