Supercomputing firms see new challenges in cloud and exa-scale computing
Energy efficiency and programming difficulties loom, vendors warn
A panel of high-performance computing (HPC) hardware and software executives have outlined what they believe are the primary challenges for the supercomputing industry in the coming years.
Executives from AMD, Cray and other HPC specialists said at an event in San Francisco that energy efficiency and adapting software development platforms will be among the biggest concerns for the next generation of supercomputer clusters.
Chuck Moore, corporate fellow and technology group chief technology officer at AMD, told attendees that HPC systems and services must adopt a different model to cloud platforms.
Demand from researchers for compute time will help drive the rise of HPC-as-a-service, but Moore warned that such services are not well suited to the current cloud architecture.
"You tend to write applications that spread out among many systems and come back with a result," he said of cloud platforms.
"While certain types of HPC spread work out, they do so with a very different set of latency and constraints and thinking. It is not like you can just pick up that application and run it on a cloud."
The experts also see new challenges as the supercomputing industry seeks to develop more powerful exa-scale clusters.
Some on the panel believe that the performance and energy efficiency challenges of exa-scale supercomputers can be solved in the next six to seven years, but others maintain that software issues will push back the arrival of exa-scale systems.
Margaret Williams, senior vice president of HPC systems at Cray, said that programming challenges could delay exa-scale supercomputing until 2020, citing issues with managing the parallelism and hardware failures created when systems reach larger scales.
"The programming models that we have today assume I will have the same resources at the beginning that I will have when the job ends," she explained.
"There are some issues that we need to resolve in terms of the resiliency and programming models."