How TGAC is giving plant evolution a helping hand using SGI supercomputers
TGAC has taken delivery of two new SGI machines to speed up the analysis of the wheat genome
Norwich-based research institute The Genome Analysis Centre (TGAC) has upgraded its SGI supercomputers from the UV 100 model, which it has been using for the past five years, to two new UV 300 machines, each featuring 12TB of DDR4 RAM and Intel E7 Haswell processors. Also included in the package is Intel's latest 32TB P3700 solid state drives (SSDs). The purpose of this cutting-edge kit is to speed up the processing of the wheat genome, an incredibly memory- and processor-intensive task.
It comes as a bit of shock to find that when it comes to our genetic code, human beings are beaten hands down by a humble grass.
"The wheat genome is five times the size of a human genome - it has 17 billion base pairs compared with three billion for a human," said Dr Tim Stitt, head of scientific computing at TGAC.
"Not a lot of people know that," he added.
To assemble the enormous wheat genome means loading datasets of around a terabyte in size - the output of gene sequencers - into RAM and then analysing them with community-built multithreaded algorithms. This takes three or four weeks per genome using the current UV 100 machine.
A testament to the rate at which technology evolves, the new machines promise an 80 per cent increase in processing power as the Haswell processors replace the UV 100's Sandy Bridge CPUs. In addition, there is a lot more precious RAM to play with and the new SSDs should significantly increase I/O.
"We're probably the first to use this combination," Stitt told Computing, adding that Intel was keen for his department to try out the P3700s. "Intel knows we're working on really big datasets and they were happy for us to try the new SSDs."
Not only is the new SGI UV 300 machine more powerful, it also takes up much less space.
"It may not look like much but considering its footprint it's a beast," Stitt said. "It only takes 5U [rack units] of space whereas the UV 100 takes a whole rack, and something like a Cray distributed supercomputer would take more than a rack. So it's powerful, has more memory and it's more energy efficient too."
Unfortunately the rate of evolution of wheat, the foundation of a large part of the world's food supply, is much slower than that of technology. In addition, human activity is changing the environment many, many times faster through climate change, soil degradation and changing land use than traditional plant breeding and selection can keep up with. Yields are dropping and at the same time there are more mouths to feed.
"What we're looking at TGAC is food security," Stitt explained. "By decoding the wheat genome we can find genes that are susceptible to heat or pathogens. By making this information public, plant breeders can hopefully produce new lines of wheat that are less susceptible to disease and can be grown in warmer climates."
The combination of faster processors, improved I/O and increased RAM should speed the genome assembly process by 30 or 40 per cent, he believes, based on initial trials with existing algorithms.
"That's a significant improvement from a hardware upgrade, without even having to re-engineer the software," he said adding that the algorithms will be recompiled for the new Haswell processors.
At first the two new appliances will be used separately, but their modular design means that they can be joined together to double up on memory, storage and processing power, meaning that TGAC will be able to scale up when the demand requires it.