Nvidia and IBM working on new technology to connect GPUs straight to SSDs for performance boost

Nvidia and IBM working on new technology to connect GPUs straight to SSDs for performance boost

Image:
Nvidia and IBM working on new technology to connect GPUs straight to SSDs for performance boost

Promises to be particularly useful for resource-intensive workloads, such as machine learning, AI and analytics

Nvidia has teamed up IBM and university researchers to unveil a new technology that enables a GPU to connect directly to a computer's SSD storage, without having to go through the CPU.

Dubbed Big Accelerator Memory or BaM, the technology could result in higher GPU memory capacity and faster bandwidth while limiting CPU involvement, according to the researchers.

Connecting a GPU to an SSD directly would give a performance gain, particularly for resource-intensive workloads such as machine learning, AI and analytics.

In a typical system, storage I/O is linked to the CPU while the GPU works as a co-processor or sub-processor. When the GPU needs data, it has to get it from the CPU.

That isn't an issue as such, except that in some modern systems the GPU performs more work than the CPU does, and in such systems, using the CPU to coordinate everything collectively may decrease the performance of both processors.

AMD was the first to work in this field. In 2016, it unveiled Radeon Pro SSG, a workstation GPU with integrated M.2 SSDs. However, the Radeon Pro SSG was designed to be just a graphics solution, and Nvidia is now taking it a step further by attempting to deal with complex and heavy computational tasks.

While the intricacies of Nvidia's technology may seem complex, the bottom line is that the firm wants to depend less on the CPU and link directly to the data source.

BaM has two prominent features: a software-managed cache of GPU memory; and a software library for GPU threads to request data directly from NVMe SSDs by communicating directly with the drives.

According to the research paper, BaM reduces I/O traffic amplification by allowing GPU threads to read or write small amounts of data as needed, as determined by the computer's instructions.

'The goal of BaM is to extend GPU memory capacity and enhance the effective storage access bandwidth while providing high-level abstractions for the GPU threads to easily make on-demand, fine-grain access to massive data structures in the extended memory hierarchy.'

The researchers claim that the BaM infrastructure software running on GPUs can detect and communicate access at a high enough rate to fully use the underlying storage devices.

Moreover, even with consumer-grade SSDs, a BaM system can support application performance comparable to a considerably more costly DRAM-only solution. Finally, the researchers state that reducing I/O amplification could result in considerable performance gains.

The researchers used a prototype system with an Nvidia A100 40 GB PCIe GPU, 1 TB of DDR4-3200 memory and two AMD EPYC 7702 CPUs with 64 cores each to demonstrate BaM technology. The system runs Ubuntu 20.04 LTS.

In the best case scenario, BaM achieved a 4.9x performance boost over existing GPU I/O acceleration techniques.

The team plans to open-source its hardware and software designs in the future, allowing other firms to create similar solutions.