Logo
Print this page
Save to disk

Q&A: Dr Rune Linding, leader of the Institute of Cancer Research cellular and molecular logic team

29 Jan 2010, Martin Courtney, Computing

http://www.computing.co.uk/ctg/analysis/1863071/q-a-dr-rune-linding-leader-institute-cancer-research-cellular-molecular-logic-team

Dr Rune Linding, leader of the Institute of Cancer Research (ICR) cellular and molecular logic team, tells Computing how biology research is rapidly overtaking physics in terms of the sheer volume of data being generated for research purposes. All this puts a massive strain on the underlying IT architecture, one reason why the ICR recently installed an SGI Altix UV supercomputer at its London data centre.

Computing: Why did the ICR need a new supercomputer?
Linding: It is part of an initiative which will enable a completely new approach to biological and cancer research which will eventually lead to network based cancer models used to streamline the process of drug development. The supercomputer supports up to 16TB of shared memory in a single system image, and will run alongside a traditional Linux cluster, with 256 cores and fast interconnects, as well as a couple of other entry level high performance computer (HPC) clusters we are putting into a separate cloud computing environment.

C: What applications will the system support?
L: It handles extreme data generation on top of computational and physics related projects within the ICR. Biology is becoming the new frontier for data generation, with multiple types and quite significant data loads, including magnetic resonance imaging (MRI), mass-spectrometry, phenotyping, genetics and deep-sequencing across thousands of CPUs. We have around 10-20 instruments at our Sanger centre which generates 2-3TB of data each every week, for example, and imaging equipment that generates another 2TB per week as well.

C: Who uses ICR’s supercomputing facilities?
L: It is primarily designed for internal work conducted by our own researchers. We have around 40 or so using it so far, but there are plans to federate the system meaning up to 200 staff can use it eventually. The nice thing is that the ICR is a broad institute so there are a lot of different groups involved, from imaging and patient data, to physics models for radiotherapy, for example. In the future we might also figure out a way to share our processing capacity with other institutions [via grid computing].

C: How much did it all cost?
L: The computational infrastructure cost millions of pounds over ten years, but another point is the human resource required to generate the data we need – that involves employing hundreds of people over many years, and is a big, expensive project. The money comes from charity funds, but we are now going out to different agencies to ask for ongoing financial support for maintenance. It is often easier to get money for installing large systems than it is to fund the core people you need to run that environment, and supercomputing people are not easy to find.

C: How much data does the ICR have to store and for how long?
L: Storage capacity now is about 50TB and we will scale up that capacity to around 250TB in the near future. Some of that data will be closely involved in product development and clinical trials, and we will have to retain it for twenty to thirty years. Other data will relate to specific research projects and we’ll need to keep it for two to three years during the life of the project, and five to ten years afterwards. Once we have some reasonably accurate growth models, and as more money comes in and more researchers start to use it, we expect to scale to petabytes of information in the next decade or so. As we do spectrum matching and store the data in large SQL databases, that is extremely compute intensive output.

C: What other IT challenges does your team face?
L: Getting all of that data into a computational format as fast as possible – moving away from using Excel spreadsheets and reformatting the data into a database is a waste of human resource. Security is always a concern. We work with sensitive data so put a lot of effort into making sure we have high security on all of our systems. We are also spending a lot of time on federating ideas around cloud computing and the infrastructure needed to support that.

© Incisive Media Investments Limited 2012, Published by Incisive Financial Publishing Limited, Haymarket House, 28-29 Haymarket, London SW1Y 4RX, are companies registered in England and Wales with company registration numbers 04252091 & 04252093