Q&A: Dr Rune Linding, leader of the Institute of Cancer Research cellular and molecular logic team

By Martin Courtney

29 Jan 2010

Be the first to comment

A Computing logo

Dr Rune Linding, leader of the Institute of Cancer Research (ICR) cellular and molecular logic team, tells Computing how biology research is rapidly overtaking physics in terms of the sheer volume of data being generated for research purposes. All this puts a massive strain on the underlying IT architecture, one reason why the ICR recently installed an SGI Altix UV supercomputer at its London data centre.

Computing: Why did the ICR need a new supercomputer?
Linding: It is part of an initiative which will enable a completely new approach to biological and cancer research which will eventually lead to network based cancer models used to streamline the process of drug development. The supercomputer supports up to 16TB of shared memory in a single system image, and will run alongside a traditional Linux cluster, with 256 cores and fast interconnects, as well as a couple of other entry level high performance computer (HPC) clusters we are putting into a separate cloud computing environment.

C: What applications will the system support?
L: It handles extreme data generation on top of computational and physics related projects within the ICR. Biology is becoming the new frontier for data generation, with multiple types and quite significant data loads, including magnetic resonance imaging (MRI), mass-spectrometry, phenotyping, genetics and deep-sequencing across thousands of CPUs. We have around 10-20 instruments at our Sanger centre which generates 2-3TB of data each every week, for example, and imaging equipment that generates another 2TB per week as well.

C: Who uses ICR’s supercomputing facilities?
L: It is primarily designed for internal work conducted by our own researchers. We have around 40 or so using it so far, but there are plans to federate the system meaning up to 200 staff can use it eventually. The nice thing is that the ICR is a broad institute so there are a lot of different groups involved, from imaging and patient data, to physics models for radiotherapy, for example. In the future we might also figure out a way to share our processing capacity with other institutions [via grid computing].

C: How much did it all cost?
L: The computational infrastructure cost millions of pounds over ten years, but another point is the human resource required to generate the data we need – that involves employing hundreds of people over many years, and is a big, expensive project. The money comes from charity funds, but we are now going out to different agencies to ask for ongoing financial support for maintenance. It is often easier to get money for installing large systems than it is to fund the core people you need to run that environment, and supercomputing people are not easy to find.

C: How much data does the ICR have to store and for how long?
L: Storage capacity now is about 50TB and we will scale up that capacity to around 250TB in the near future. Some of that data will be closely involved in product development and clinical trials, and we will have to retain it for twenty to thirty years. Other data will relate to specific research projects and we’ll need to keep it for two to three years during the life of the project, and five to ten years afterwards. Once we have some reasonably accurate growth models, and as more money comes in and more researchers start to use it, we expect to scale to petabytes of information in the next decade or so. As we do spectrum matching and store the data in large SQL databases, that is extremely compute intensive output.

C: What other IT challenges does your team face?
L: Getting all of that data into a computational format as fast as possible – moving away from using Excel spreadsheets and reformatting the data into a database is a waste of human resource. Security is always a concern. We work with sensitive data so put a lot of effort into making sure we have high security on all of our systems. We are also spending a lot of time on federating ideas around cloud computing and the infrastructure needed to support that.

Reader comments

Have your say on this article

All fields required. Your email address will not be displayed on the site.

By submitting a comment you agree to abide by our Terms & Conditions

  • Digg
  • Tweet

Newsletters

Have similar articles delivered to your inbox:

Will Facebook be able to continue its success as a public company?

Facebook has filed for an initial public offering (IPO) that plans to raise $5bn worth of shares on the US stock market, making it the biggest tech IPO ever. Will Facebook be able to continue its success as a public company?

63 %

1 %

7 %

28 %

1 %