UPDATED: Nvidia promises power-efficient supercomputing with Kepler-based Tesla board
Chip maker brings dual GPU board to high-performance computing systems
Nvidia has brought its Kepler GPU architecture to its high-performance computing (HPC) Tesla range with the K10 accelerator.
The firm's Kepler architecture is best known for its use in three consumer products, including its GeForce GTX 680 desktop video card. But it has now slipped it into a Tesla general purpose computing on GPU (GPGPU) accelerator board.
The firm's Tesla K10 accelerator board features two Kepler GPUs and is aimed at speeding up single-precision floating point workloads.
Nvidia's Tesla K10 board has two Kepler GK104 1536-core GPUs, each providing 2.29 teraflops of single precision performance, and 0.095 teraflops of double precision performance.
Nvidia has increased total board memory to 8GB, meaning larger datasets can be accessed, however per-GPU memory bandwidth has actually dropped from the previous generation to 160GB/s.
Nvidia has not disclosed clock speeds for the two GK104 GPUs, but Sumit Gupta, senior director of Nvidia's Tesla business unit, said clock speeds would be half that of its Fermi-based Tesla cards. He added that scaling back on clock speeds helped the firm reduce power consumption.
Nvidia said that its Tesla K10 board, with two GPUs, has the same thermal design power as its single GPU Fermi-based Tesla M2090. While the shift from its 40nm architecture to 28nm had helped save some power, Gupta said that most of the savings came down to other aspects of system design.
"A lot of it was architecture redesign, for example by reducing the clocks [speeds]. We almost halved the clocks and it significantly reduced the power. We also improved the efficiency of the architecture," he said.
For Nvidia's customers the biggest boost actually comes with PCI-Express Gen3 support, meaning bandwidth has doubled to 16GB/s.
Nvidia's Tesla K10 accelerator card will allow firms to hit one petaflop of single precision compute capability using 400KW of power. According to Nvidia, that power consumption figure is close to a tenth of what is required if standard CPU-only cluster nodes were deployed.
Nvidia has also enabled its GPU Direct technology that allows GPUs in a cluster to access other GPUs' local memory, bringing direct access to vast amounts of memory.
Update: Nvidia has released Tesla K10 clock speeds to V3, with the GK104 GPU running at 745MHz and the GDDR5 memory running at 2.5GHz. As Nvidia said, the Kepler-based Tesla K10 GPU clock speeds are significantly lower than those of Fermi-based Tesla boards, with the Tesla M2090 board running its GPU at 1.3GHz.
Given that Nvidia's Tesla K10 boasts close to a four-fold increase in single-precision floating point performance, it highlights a big improvement in architectural efficiency in Kepler over Fermi.