Cass: data created by Large Hadron Collider experiments is processed on virtual servers

Q&A: CERN group leader of fabric infrastructure and operations, Tony Cass

Cass explains how CERN is processing data from Large Hadron Collider experiments in a virtual environment for the first time

Written by Martin Courtney

We have 5,000 physical systems each with 16 cores, which can potentially support 80,000 virtual machines

 

The European Organisation for Nuclear Research (CERN), the scientific research facility using the Large Hadron Collider (LHC) to explain the mysteries of the universe, is testing a virtualised server environment spanning multiple shared datacentres that could eventually see all of the facility’s grid-based number crunching applications run on virtual, rather than physical, machines.

CERN group leader of fabric infrastructure and operations, Tony Cass, tells Computing how this presents a significant management and security challenge, especially when it comes to convincing 10,000 exacting physicists that virtual servers can run their batch processing jobs properly.

Computing: What is CERN doing with server virtualisation?
Cass: We are working on a virtualisation model for a batch computing environment as a test at the moment, involving 20 machines offered to one of our existing customers, but we plan to roll out virtualisation much more widely next year. We want to convince the researchers accessing our systems that their [batch] jobs will carry on running much as they did in the past. We have 5,000 physical systems each with 16 cores, which can potentially support 80,000 virtual machines (VMs). But we need something to help us manage and understand what is going on in that virtual environment.

What runs on those CERN servers?
It is a batch environment consisting of two types: data from the LHC experiments in the form of ones and zeros, and raw images like those from a digital camera. Around 10,000 physicists around the world process and analyse snapshots of this using the grid, submitting jobs to process the data that is held on a mass disk subsystem at CERN. The specific thing that is relevant is how we manage those jobs, routing them to the compute capacity we have.

We also need to provide enough processing capacity to support the production and analysis of more than 15 petabytes of data per year, including all of the data we collect from the LHC experiments, which we expect to add to year on year.

Do any of the researchers using CERN resources worry that virtualising systems will mean a performance drop off?
They have expressed concerns, but people are always nervous about that sort of thing if there is any chance of disruption. The LCH started last year, ran for 10 days, then stopped and has only just started up again, and people do not want to think anything can stop their data processing. The CPU and network access provides no performance penalty, the biggest challenge is accessing data on local disk servers, because there is input/output (I/0) all over the network.

What are the management challenges posed by virtualising servers on this scale?
We have to make sure all of those VMs are up to date with security patches and everything, and provide a policy-driven approach to guarantee a minimum level of service for some customers, while making sure we can expand [compute] capacity for others as necessary. In any server environment that uses live migration with a hypervisor, you have to know where all the VMs are, where the physical hardware is, and what will be affected.

How is CERN addressing this management challenge?
We are using Platform Computing’s ISF adaptive cluster and LSF grid workload management software to manage both virtual and physical machines, both those short-lived VMs used for batch processing and long-life machines used for classical server processes running for many days or months.

Back in the mid 1990s CERN was one of the first to support running tens of thousands of jobs on Linux, but now that is mainstream. ISF offers a pretty comprehensive range of software but there are other things we might need, like Open Nebula, an open source toolkit for cloud computing, which works in a virtual management machine. The attraction here is that it will interoperate with other packages as necessary, because we have to have the whole range rather than a single hypervisor management system from one company.

  • Have your say
  • Send to a friend
  • Print this
  • Share

reader comments

related articles

LHCGovernment

Large Hadron Collider sets new power record

Beams smash past the trillion volt mark 01 Dec 2009

 

Large Hadron Collider back online

Cern scientists successfully complete particle circulation 21 Nov 2009

UK firms pitch for CERN business

Two roadshows this week will give UK companies the chance to work on the Large Hadron Collider and other CERN projects 29 Sep 2009

Crippled LHC to run at half power

Large Hadron Collider build problems worse than originally thought 08 Aug 2009

Large Hadron Collider scientists celebrate initial success

Beams of protons collide at greater force than ever before 31 Mar 2010

LHC breaks records with successful collision

Data from twin 3.5 TeV particle stream collision recorded for future study 31 Mar 2010

Large Hadron Collider back online

Cern scientists successfully complete particle circulation 21 Nov 2009

related white papers

today's top stories

Apple overhauls iPod Shuffle, Nano and Touch

New models come with iTunes update and social networking tool 02 Sep 2010

Scottish school shifts wholly to the iPad

Head of computing and IT at Cedars School gives the rationale behind his decision 01 Sep 2010

Salford's MediaCity pushes technology boundaries

In preperation for 3D, ultra HD and a tapeless workflow 02 Sep 2010

Google adds Priority Inbox to combat information overload

Gmail feature will "revolutionise the way we use email" 31 Aug 2010

Cost of Windows 7 migration will rise due to lack of skilled staff

As the deadline for moving to Windows 7 approaches, businesses will have to dig deep to keep their operating systems up-to-date 27 Aug 2010

Advertisement

Power and cooling management for the data centre
The principles for achieving power and cooling capacity management in the modern data centre

The value of virtual infrastructures to business continuity
This IDC paper examines the role of server and storage virtualisation in enabling application and data continuity at a lower overall cost

Advertisement

Citrix

Keep up to date with the latest products, services and technologies from the world's leading IT companies; IThound.com brings you thousands of white papers, case studies and analyst reports.

Advertisement

Newsletter signup

Sign up for our range of FREE newsletters:

More available - click 'submit' to view

Existing User

Newsletter user login:

Jobs

Related jobs

Job of the week

Job alerts

Sign up here

Find your next job

IT Salary Checker

Check salary here

Advertisement

Latest poll

How open are you to mobile banking via a smartphone?

How open are you to mobile banking via a smartphone?

In what capacity would you use your smartphone for mobile banking?

View poll results

Latest audio and video articles

A microphoneAudio

Computing Podcast: Tech Talk episode 5

Join Tech Talk for an overview of the week's top IT stories, and a debate on IT self-service. Will it provide value? 27 Aug 2010

A microphoneAudio

Computing podcast: Tech Talk episode 4

Join Tech Talk for an overview of the week's top IT stories, and a debate on IT skills. Is the UK slipping behind? 20 Aug 2010

Latest in-depth articles

picture of a TV studioAnalysis

Salford's MediaCity pushes technology boundaries

In preperation for 3D, ultra HD and a tapeless workflow 02 Sep 2010

Second Life avatarAnalysis

What are the business benefits of virtual worlds?

Experts cite collaboration and brainstorming, recruitment and training 26 Aug 2010

Primary Navigation