The Need for Speed: How Fast Storage Shapes the Future of AI/ML Infrastructure

Author: Srinivas Kodali, Director- Systems

November 7, 2025

5 min read

The AI Data Explosion

Every generation of AI gets bigger, smarter — and hungrier.

A single foundation model today can consume petabytes of training data and generate terabytes of checkpoints in just days. GPUs have evolved to process data at terabytes per second, yet in most data centers, they spend nearly 40% of their time idle — waiting for IO data.

In today's AI landscape, performance is no longer limited by GPU compute power alone - it's defined by how efficiently GPUs move and access massive volumes of data.

"The next AI bottleneck isn't compute - it's I/O starvation."

The Hidden Bottleneck: Storage

When GPUs sit idle, it's not because they're slow - it's because data pipelines can't keep up.

In distributed AI clusters, hundreds of GPUs request millions of small files per second. Traditional file systems, designed for sequential I/O, crumble under that pressure.

Some of the latest GPU nodes may achieve up to 3 TB/s in compute throughput, but most storage subsystems barely deliver 10–20 GB/s shared across nodes. That's like fueling a rocket through a soda straw.

The result:

•Training inefficiencies due to stalled data loading

•Longer time-to-train models

•Inefficient checkpointing that slows recovery

•Underutilized GPUs, leading to lost ROI

"Fast storage is not about bigger drives — it's about removing friction between data and compute."

From HDDs to Optical Fabrics — The Evolution of Storage

Storage for AI didn't evolve overnight — it adapted through decades of performance demands.

HPC Era: Parallel FS (Lustre, GPFS) - Poor small file performance
Big Data Era: Object (S3, Ceph, HDFS) - Latency, metadata limits
AI/ML Era: Hybrid NVMe, NVMe-oF, RDMA - Cost, integration
Future Era – Optical: Disaggregated, composable - Emerging standards

Each transition reflected a simple truth: as compute gets faster, data needs to move closer to it.

What "Fast" Really Means in Modern AI Clusters

"Fast storage" isn't just about raw bandwidth. For AI/ML workloads, it's about balancing latency, IOPS, and parallelism.

IO Throughput (GB/s) - Defines how quickly massive datasets can be streamed to GPUs. Depending upon the size of the cluster, AI training could demand 100–400 GB/s of cluster throughput.
IOPS - Millions of small files (image shards, tokens) require millions of IOPS per cluster.
Latency - Inference and training loops are latency-sensitive. Sub-100µs latency can improve GPU utilization by up to 20%.
Parallelism & Scalability - Modern AI frameworks (PyTorch, TensorFlow) read data in parallel across hundreds of nodes — each needing independent access without contention.
Resilience - Checkpointing every few minutes means storage must handle high write bursts while maintaining consistency.

Open Standards + Off-the-Shelf Hardware = Scalable Speed

At Trndx, we believe that the next wave of performance won't come from proprietary black boxes - but from open standards combined with commodity hardware.

By designing around NVMe-over-Fabrics, RDMA, and open parallel file systems, we enable:

Hyperscaler-level throughput without vendor lock-in
Modular scaling (add capacity or speed independently)
Lower TCO through commodity components
Software-defined intelligence built on open APIs

"Trndx's architecture philosophy: open, flexible, and future-proof"

Integrating with the AI Pipeline

Modern frameworks like TensorFlow I/O, PyTorch DataLoader, and RAPIDS cuDF are evolving to exploit direct storage-to-GPU paths.

Technologies such as GPUDirect Storage (GDS) and PCIe Gen5 NVMe-overFabric reduce CPU mediation, cutting latency and boosting data delivery.

In large-scale training, this means terabytes of data can move directly from NVMe arrays to GPU memory at near wire speed.

Soon, CXL and optical interconnects will blur the lines between memory and storage — creating a truly composable infrastructure where compute dynamically attaches to the data it needs.

The Road Ahead - Intelligent, Composable, and Open

The future of AI storage won't just be about speed - it'll be about awareness. Storage systems will learn from workloads, predicting data access patterns and rebalancing bandwidth in real time.

We're heading toward a world where:

Storage is telemetry-driven, adapting to GPU demand.
Optical fabrics make distance irrelevant.
Disaggregated storage pools provide memory-like access at rack scale.
Software-defined control gives enterprises full transparency and choice.

Conclusion — Storage Is the New Accelerator

AI innovation depends on data movement as much as on math. The GPU may be the brain of AI, but storage is its heartbeat — and any delay between them slows the entire organism.

At Trndx, we're building architectures that embrace open standards, off-the-shelf performance, and data-centric design, ensuring that as AI evolves, storage evolves with it — faster, smarter, and more open than ever.

Srinivas Kodali

Director - Systems
TrndX Inc.