Even CERN has to queue for GPUs. Here's how they optimise what they have

'There's a tendency to say that all ML workloads need a GPU, but for inference you probably don't need them'

CERN. Source: Wikimedia

Image:
CERN. Source: Wikimedia

CERN, the European Organisation for Nuclear Research, has long had some of the biggest of big data needs. Its engineers historically built their own IT systems including for distributed and high performing computing (HPC), and sometimes still do. But the culture has largely shifted to engaging with open source communities with similar requirements, including adopting OpenStack for the computing needs of the Large Hadron Collider.

Latterly, CERN engineers have moved further down this road, embracing Kubernetes and the cloud native ecosystem .

"Cloud-native native is not just about the basic infrastructure is really all the layers, the platforms, the security part, the storage part, all of it," CERN computing engineer Ricardo Rocha told Computing.

"The culture change has been about first transitioning from ‘this is my hardware' to ‘I delegate the management of hardware to someone else with our own private cloud', and now even the rest of the tooling. Just engage with the communities and projects and start contributing to a much larger ecosystem."

CERN's scientists sift through enormous reams of data looking for the telltale signatures produced when subatomic particles collide. Increasingly this is being turned over to AI/ML, which the cloud native infrastructure supports. But underneath the software lies hardware, and for AI that means GPUs.

However, adopting GPUs at CERN scale has been a major challenge, in part due to their scarcity and long delivery times. Even renting GPUs from cloud providers is restricted by the current demand. So, while CERN runs several thousand GPU cores on-premises, this is a fraction of its total estate. CERN cannot pull rank and jump the queue, said Rocha; its techies have to wait in line for GPUs with everyone else (although they are privy to early test releases from manufacturers via CERN's openlab public-private partnership).

Optimising GPU usage

The scarcity of high-performance hardware means the Geneva-based organisation needs to turn every dial to ensure it is utilising its GPUs as efficiently as possible.

First, the engineers evaluate which workloads truly require GPUs, as not all machine learning tasks do.

"There's a tendency to say that all machine learning workloads will need a GPU, but for inference you probably don't need them," Rocha told Computing. "You might get a similar performance with CPUs, particularly new generation CPUs. And we already have that kind of capacity in-house."

Another optimisation is around spiky, interactive workloads, such as AI inference or CI/CD which result in very low overall GPU utilisation of around 20%-30%. Here, time-based partitioning techniques can be brought to bear to share the resources more efficiently. This is an area seeing a lot of interest in open source communities with similar needs and where rapid progress is being made.

Then there are batch-type workloads, which are more predictable but run for longer, for which queueing and scheduling are important to minimise idle time. CERN has a great deal of experience in the latter, owing to its work with internally developed HPC systems. All of this should allow scientists to make much greater use of machine learning in the future.

So far the main use of machine learning has been data filtering in detector farms, to separate the wheat from the voluminous digital chaff, but this is only around 5% of its potential, according to Rocha. The next step will be to expand that to a much larger fraction of the overall computing.

"Some of the experiments at CERN have been claiming that in a couple of years, they might be doing 50% of their workloads in machine learning of some sort, so this this is a huge change for us," he said.

Running hot

Because of the ultra-low latency requirements, many CERN experiments have been relying on custom hardware, but advances in GPUs and FPGAs (when they are available) and accompanying software are opening the door for many more applications using standard equipment.

But even if they were readily available, it's not a simple matter of replacing racks of CPU-powered servers with GPUs. They have very different requirements. GPUs, which run much hotter, require stacking at much lower densities or even water cooling.

"Our datacentres were designed for traditional CPU hosting machines and now we have a higher density of GPUs. It's not something that is very easy to solve because the design decisions for the data centres cannot be easily changed after you build them."

The only real workaround is to focus on flexibility, Rocha said.

"I think the main lesson we've learned is to stay as flexible as possible in terms of the infrastructure you can, where you can have your the majority of your predictable workloads running on premises, but also complement that with the ability to burst into external resources."