29 Jan 2010, Martin Courtney, Computing
Dr Rune Linding, leader of the Institute of Cancer Research (ICR) cellular and molecular logic team, tells Computing how biology research is rapidly overtaking physics in terms of the sheer volume of data being generated for research purposes. All this puts a massive strain on the underlying IT architecture, one reason why the ICR recently installed an SGI Altix UV supercomputer at its London data centre.
Computing: Why did the ICR need a new supercomputer?
Linding: It is part of an initiative which will enable a completely new approach
to biological and cancer research which will eventually lead to network based
cancer models used to streamline the process of drug development. The
supercomputer supports up to 16TB of shared memory in a single system image, and
will run alongside a traditional Linux cluster, with 256 cores and fast
interconnects, as well as a couple of other entry level high performance
computer (HPC) clusters we are putting into a separate cloud computing
environment.
C: What applications will the system support?
L: It handles extreme data generation on top of computational and physics
related projects within the ICR. Biology is becoming the new frontier for data
generation, with multiple types and quite significant data loads, including
magnetic resonance imaging (MRI), mass-spectrometry, phenotyping, genetics and
deep-sequencing across thousands of CPUs. We have around 10-20 instruments at
our Sanger centre which generates 2-3TB of data each every week, for example,
and imaging equipment that generates another 2TB per week as well.
C: Who uses ICR’s supercomputing facilities?
L: It is primarily designed for internal work conducted by our own researchers.
We have around 40 or so using it so far, but there are plans to federate the
system meaning up to 200 staff can use it eventually. The nice thing is that the
ICR is a broad institute so there are a lot of different groups involved, from
imaging and patient data, to physics models for radiotherapy, for example. In
the future we might also figure out a way to share our processing capacity with
other institutions [via grid computing].
C: How much did it all cost?
L: The computational infrastructure cost millions of pounds over ten years, but
another point is the human resource required to generate the data we need – that
involves employing hundreds of people over many years, and is a big, expensive
project. The money comes from charity funds, but we are now going out to
different agencies to ask for ongoing financial support for maintenance. It is
often easier to get money for installing large systems than it is to fund the
core people you need to run that environment, and supercomputing people are not
easy to find.
C: How much data does the ICR have to store and for how
long?
L: Storage capacity now is about 50TB and we will scale up that capacity to
around 250TB in the near future. Some of that data will be closely involved in
product development and clinical trials, and we will have to retain it for
twenty to thirty years. Other data will relate to specific research projects and
we’ll need to keep it for two to three years during the life of the project, and
five to ten years afterwards. Once we have some reasonably accurate growth
models, and as more money comes in and more researchers start to use it, we
expect to scale to petabytes of information in the next decade or so. As we do
spectrum matching and store the data in large SQL databases, that is extremely
compute intensive output.
C: What other IT challenges does your team face?
L: Getting all of that data into a computational format as fast as possible –
moving away from using Excel spreadsheets and reformatting the data into a
database is a waste of human resource. Security is always a concern. We work
with sensitive data so put a lot of effort into making sure we have high
security on all of our systems. We are also spending a lot of time on federating
ideas around cloud computing and the infrastructure needed to support that.
© Incisive Media Investments Limited 2012, Published by Incisive Financial Publishing Limited, Haymarket House, 28-29 Haymarket, London SW1Y 4RX, are companies registered in England and Wales with company registration numbers 04252091 & 04252093