Cern looks beyond spinning hard disks as data demand explodes

Large Hadron Collider faces large storage problems as the pace of technology can't keep up

Increasing storage costs and slowing innovation has spurred Cern to re-think its data storage model as its Large Hadron Collider (LHC) is expected to create 400 petabytes (PB) of scientific data per year by 2023.

The research organisation has already outlined plans to create a distributed computing model of federalised data with Rackspace and has now revealed some of its motivations behind moving away from traditional data storage methods.

Ian Bird, project leader for LHC computing, said a number of factors in the world of physical storage were driving the organisation into the cloud.

"There's a technology crunch coming," he said, speaking at Cloud Expo Europe in London. "Over the last 30 years we've seen the dollar-to-storage performance cost rise by a factor of two every 18 months."

"The worrying thing is actually the I/O [input/output] performance, which hasn't increased. It means we have to buy more disks to keep the I/O performance up. The other problem is that disk technology has reached the end of the road."

Despite the increase of new storage technologies such as solid state, Bird said the pace of development is still too slow. "More worrying is that we don't expect to see the increase we've seen over the past few years to continue at the same rate, and this means this becomes a more significant cost factor again.

"We're already spending more money on disks than anything else - we spend 60 percent [of our storage budget] on disks compared to CPU and tape. So we start to rely more on the network for real-time data distribution."

Using federated data - whereby different parts of the same database are stored in different locations - and super high-speed networks, Bird hopes to cut down on the amount of data that needs to be copied when physicists want to access it, thereby reducing the performance need of its storage arrays.

Cern is also moving to a software-based storage system, meaning it hopes to no longer rely on huge RAID arrays of spinning disks, which are prone to failure.

Cern also expects the potential speed of its network connections between data centres to increase to 10Tbps, making its distributed computing model even more powerful. A detailed look at Cern's IT infrastructure can be found here.