Conquering the data mountain
We talk to IT leaders from a range of sectors about the storage challenges they face and their strategies for surmounting them
Bearpark: The value we add to the application makes the utilisation of the SSD phenomenally cost effective
Businesses today need to store and manage massive amounts of data as the explosive growth in digital content shows no sign of slowing.
The rapid obsolescence of content combined with the need to retain data long term is forcing organisations to look for effective ways of accessing this information, while trying to control the overall cost of storage and manage data efficiently. Unstructured data is also adding to the management challenges and storage overheads as Andrew Gorton, technical co-ordinator at Greater Manchester Police, has discovered.
“The demand for applications that use more rich content is accelerating storage growth,” he says. These include images, audio and video records, automated number plate recognition data, CCTV and digital voice recordings.
“At the same time, we are seeing growth in the structured data produced by central business systems and unstructured data produced by office applications. A key consideration for us is to address the long-term capacity risks of data residing on inappropriate storage. We need to plan ahead to make sure that data does not grow out of control and exceed physical capacity,” he says.
For some organisations, the economic climate is also putting pressure on storage systems. Dave Allerton, IT director at architect RHWL, uses large, mirrored central network attached storage (NAS) devices to support the work of staff located in London, Germany and Quatar.
“In leaner times, when we are hungry and looking for new work, we produce lots of proposal documentation, which far outstrips the regular construction records,” says Allerton. “The guys are creating 200MB architectural models on a daily basis and sometimes working on two or three iterations, so we quickly get into gigabytes of data being created every day.”
For RHWL, a tiered storage system from F5 has reduced the risk of running out of space while satisfying the legal requirements of retaining data for up to 12 years.
“A lot of our data needs to be kept live because of the needs of ongoing projects,” says Allerton.
“However, 60 per cent of our data hadn’t been touched for more than six months, so we were able to move it onto cheaper tier two storage. Overnight we regained 60 per cent of our tier one storage without users noticing, except that it worked faster because all of a sudden they had all this additional storage to play with. Currently, we have 7TB of data and that is growing by about 2TB a year.”
For many businesses the benefits of adopting a virtualised storage environment are becoming more compelling. With the move away from a silo-based infrastructure model of storage towards a centralised strategy of tiered storage and information lifecycle management (ILM), virtualisation adds flexibility, management control and cost effectiveness.
According to John Grove, senior technical architect at utility group Scottish and Southern Energy (SSE), the continuing trend of server virtualisation is having a significant impact on decision-making around storage.
“SSE has made a significant investment in virtualisation technology and the net effect of this is that all the storage that used to be internal to the server has moved into the SAN [storage area network] environment,” he says. “This shift has meant that more blade servers are connected to the SAN, increasing the number of physical switches and naturally the volumes of data moving within the SAN fabric.”
Managing a storage estate fast approaching a petabyte in size is also particularly challenging.
“Storage management tools are still immature; vendors do produce tools that have good capabilities when monitoring their own kit, but monitoring heterogeneous environments is a real challenge as there is not one tool that provides enough depth across multiple vendor products,” says Grove.
Landmark Solutions uses enormous amounts of data to supply a wide range of services around geospatial data, geographic applications, web mapping and geocoding. It has just upgraded its entire storage infrastructure to be more scalable, virtualised and to remove vendor lock-in restrictions.
“We wanted a platform to build multiple solutions without creating islands of storage and a need for constant upgrades a system that would scale out to a relatively large size but in small increments,” says the firm’s Unix systems administrator, Vic Cornell.
“OCF designed for us a vendor-independent, scalable, flexible virtualised storage environment that has reduced administrative overheads and yet allows us to grow our business without requiring additional staff and resources.”
But for Landmark it is not just a case of throwing more disks at the solution; speed of operations is also important. “While the cost of disk space may be falling, the cost of IOPS [input/output operations per second] is not,” says Cornell. “Having sufficient space isn’t the problem – it’s more about the speed at which you can access your data.”
The cost of storage is not just the price of the drives – what really digs into budgets is the management of the data, which is where those pricey systems from specialist storage companies come into the equation. You may think you are paying over the odds for disk capacity, but what you are really buying is the software that provides the intelligence to manage the storage space effectively.
Conquering the data mountain
We talk to IT leaders from a range of sectors about the storage challenges they face and their strategies for surmounting them
Roger Bearpark, assistant head of IT at the London Borough of Hillingdon, is one of the first UK users of solid state drive (SSD) technology in a tiered environment.
“We have splashed out on three 350GB drives for our Compellent SAN environment,” he says. “The additional performance varies but it is at least 10 times faster and sometimes 14 times quicker based on putting as little as 30MB of data on the drive – an amazing performance improvement.”
The intelligence built into the SAN automatically migrates blocks of data to the appropriate tier of SSD, Fibre Channel or Sata storage based on usage.
“This automated process is delivered in such a way that the value we add to the application makes the utilisation of the SSD phenomenally cost effective,” says Bearpark.
For many organisations, a key consideration is addressing the long-term capacity risks of data residing on inappropriate storage.
“Greater Manchester Police has recognised the need to create data management policies that dictate which pieces of old data can be deleted to save space, and which need to be retained,” says Gorton. “Best practice for us means more automation, ease of storage management and straightforward retrieval of archived data.”
A tiered, virtualised storage environment together with clear policies for the movement of data through different tiers of storage is providing a reliable, flexible and scalable strategy to address the problems of data storage. But management and speed of access still remain challenges to be addressed.
In part two, we look at real-life examples of cutting-edge storage technology in action
Five key advances in storage technology
Multi-protocol storage access
These new systems will simultaneously support Fibre Channel, iSCSI, NFS, CIFS and other protocols offering cost savings through the use of a unified pool of storage. The advantages of multiprotocol storage include the ability to adapt to the requirements of various applications, reducing cost and complexity without limiting functionality.
Solid state
Solid state storage is a non-volatile, removable storage medium that employs integrated circuits rather than magnetic or optical media and contains no mechanical parts as everything is held electronically. The result is a much higher transfer speed to and from the solid-state media, and the absence of moving parts may offer a longer operating life if the drives are well looked after. Currently, it is an expensive technology but the price gap is narrowing and it makes for a convenient, compact and fast option especially when integrated into a tiered-storage solution.
Storage virtualisation
Storage virtualisation pools physical storage from multiple storage devices and presents them as a single storage space that can be managed from a central console. Commonly used in a storage area network (SAN) environment, virtualisation removes many of the management issues associated with storage. Administrators can identify, provision and manage distributed storage as if it were a single, consolidated resource, increasing availability across applications and the organisation.
Data de-duplication
Data de-duplication is a method of reducing storage needs by identifying duplicate blocks of data and replacing it with a pointer to the unique or master copy of the data block. Email systems often contain multiple instances of the same file attachment – by removing the duplicate copies gigabytes of storage space can easily be reclaimed. Data de-duplication allows more efficient use of disk space while reducing backup time and allowing faster restoration of data. Data de-duplication can be an effective means of reducing the impact of data growth and improving storage utilisation.
Intelligent management of tiered storage
Large-scale storage systems are required to handle and intelligently manage an ever-growing amount of data held within an organisation. Tiered storage systems keep business-critical data in high-end (tier one), more expensive storage, while less critical data is kept in cheaper, lower tiers of storage. The next generation of intelligent storage management systems will be able to automatically move blocks of data according to information life cycle management rules set by the organisation.