Latest Virtualisation posts

It’s all in the detail – just what is cloud recovery all about?

11 Sep 2012

In a previous post, Quocirca discussed how the cloud can be used to provide levels of business continuity and disaster recovery to meet an organisation’s needs around its own business risk profile.

However, data can be stored in many formats, and the granularity of this storage can have an impact on how well an organisation can recover information, function or transactions. There are three basic levels to consider: files, storage images and applications.

First, at the file level, the most common form of a need of data recovery is the loss of a single file. A user may have deleted the file by mistake, may have over-written it or may just have mislaid it. The best way to recover such a file is to have a mirrored copy of the primary file store it resides in – provided that there is a degree of intelligence built in. Direct mirroring of all actions carried out on a file store not only ensures that all files saved are replicated – it also means that all files deleted or modified are reflected in the mirror as well. Therefore, a user deleting a file in one file store deletes it in all mirrors – and so is no better off when trying to recover it.  

Phased mirroring can be implemented, but is not of much use. Here, a time delay is built in to the mirroring, so that if a file is deleted or changed, there is a grace period in which the user can change their mind before the action is reflected in the mirror – but this also applies to file saves – and just how long should such a grace period be – a few seconds, minutes, hours, days?

A far better way is to build in basic versioning; here, a number of copies of the file can be kept as they are saved. This should reflect the importance of the data held within the file – information of lower importance may just have one earlier version stored, whereas more important project documents or information that may be required to feed into governance and compliance systems may have many more versions enabled.

Doing such version control within an organisation can rapidly give rise to massive growth in storage requirements, and many would struggle to put in place the infrastructure required to manage this. This is where cloud storage comes in to its own as it is easy to "thin provision" storage volumes and manage them dynamically, sharing the underlying costs of a mass storage platform between the cloud provider’s many customers. Another key benefit of using cloud storage in this way is that it provides an abstraction of the file from the user’s immediate environment (i.e. offsite) – the data is protected from device failure or even from site failure. Data replicated within an organisation’s own datacentre may not survive a catastrophic large-scale failure.

Beyond replicating files, the second level is the need to backup disk images. A user with a full-function device (such as a PC or laptop with installed software and data) can find themselves incapable of working should such a device fail. Rebuilding a machine can take a long time – and if the associated data has been lost, then the cost to the organisation can be high.

Taking a full image of a device’s storage systems means that on failure, the device can be rebuilt very rapidly – or a new device can be provisioned using the saved image and the employee can soon be working again. An image can also be mounted as a virtual device, giving the user access to a virtual desktop while a new physical device is provisioned. Again, the cloud enables a cost-effective means of enabling such functionality, without the customer organisation having to own all the underlying hardware, operating systems and software stacks that underpin this, all again with the benefit of storage being off-site.

Some may think that using such image files negates the need for file mirrors. As all files are included in an image backup, it is possible for individual files to be recovered from these. However, immediate imaging is not an easy task, and so files can be lost between the image creation times. To recover a file, the correct image will have to be identified, mounted and opened, with the file system then being interrogated in order to provide the user with the capability to recover the one file they are looking for – this may not be the best way to do things, and does not easily allow for versioning.

The third level is the need to keep full applications running. In the world of virtualisation, it is now possible to package a complete application up as a virtual machine – this includes anything from the operating system upwards in the stack, and may include application server platform, middleware connectors, additional services the application is dependent on, and so on. Such virtual machines (VMs) should not include any live data, however, as this means that the standby VM has to be kept synchronised with the live instance at all times. Data should be stored outside of the VM and mirrored separately. By creating backups of VMs, should anything happen to the live instance (e.g. a failure in the physical underpinnings, corruption of the image or whatever), then a new instance of the image can be spun up rapidly to enable work to continue.

Each of the three levels of granularity has its part to play in how an organisation should seek to ensure it has the best approach to business continuity and disaster recovery. Although all three could be carried out in-house, cloud computing brings technical and business benefits to the fore – from domain expertise skills in how to manage data through economies in scale in providing large storage capabilities to multi-level data management through the provider’s own backup and restore policies to build on your organisation’s own. For many organisations struggling to ìdo more with lessî, the cloud is the only way to gain access to such levels of technical information assurance – cloud brings such large organisation capabilities into the reach of many mid-market and small and medium enterprise (SME) organisations.

In fact, such capabilities are increasingly available from specialist providers of business continuity and disaster recovery services, and many of these do not even run their own storage infrastructure. How? You’ve guessed it; they turn to other cloud service providers for the functionality itself.

Originally posted at Lunacloud Compute & Storage Blog

When an apple is not an apple

31 Jul 2012

When considering two or more items, there is the concept of “comparing apples with apples” – i.e. making sure that what is under consideration is being compared objectively. Therefore, comparing a car journey against an air flight for getting between London and Edinburgh is reasonable, but the same is not true between London and New York.

The same problems come up in the world of virtualised hosting. Here, the concept of a standard unit of compute power has been wrestled with for some time, and the results have led to confusion. Amazon Web Services (AWS) works against an EC2 Compute Unit (ECU), Lunacloud against a virtual CPU (vCPU). Others have their own units, such as Hybrid Compute Units (HCUs) or Universal Compute Units (UCUs) – while others do not make a statement of a nominal unit at all.

Behind the confusion lies a real problem; the underlying physical hardware is not a constant.  As new servers and CPU chips emerge, hosting companies will procure the best price/performance option for their general workhorse servers. Therefore, over time there could be a range of older and newer generation Xeon CPUs with different chipsets and different memory types on the motherboard. Abstracting these systems into a pool of virtual resource should allow for a method of providing comparable units of compute power – but each provider seems to have decided that their own choice of unit is the one to stick with – and so true comparisons are difficult to work with.  Even if a single comparative unit could be agreed on, it would remain pretty meaningless.

Let’s take two of the examples listed earlier – AWS and Lunacloud.  1 AWS ECU is stated as being the “equivalent of a 1.0-1.2 GHz 2007 (AMD) Opteron or 2007 (Intel) Xeon processor”. AWS then goes on to say that this is also the “equivalent of an early-2006 1.7GHz Xeon processor referenced in our original documentation”.  No reference to memory or any other resource, so just a pure CPU measure here.  Further, Amazon’s documentation states that AWS reserves the right to add, change or delete any definitions as time progresses.

Lunacloud presents its vCPU as the equivalent of a 2010 1.5GHz Xeon processor – again, a pure CPU measure. 

Note the problem here – the CPUs being compared are 3 years apart, and with a 50% spread on clock speed.  Here’s where the granularity also gets dirty – a 2007 Xeon chip could have been manufactured to the Allendale, Kentsfield, Wolfdale or Harpertown Intel architectures.  The first two of these were 65 nm architectures, the second two 45nm.  The differences in possible performance were up to 30% across these architectures – depending on workload.  A 2010 Xeon processor would have been to the Beckton 45nm architecture. 

Now, here’s a bit of a challenge: Intel’s comprehensive list of Xeon processors (see here) does not list a 2007 (or any other date) 1.0-1.2 GHz Xeon processor, other than a Pentium III Xeon from 2000. Where has this mysterious 1.0 or 1.2GHz Xeon processor come from? What we see is the creation of a nominal convenient unit of compute power that the hosting company can use as a commercial unit.  The value to the purchaser is in being able to order more of the same from the one hosting company – not to be able to compare any actual capabilities between providers.

Furthermore, the CPU (or a virtual equivalent) is not the end of the problem.  Any compute environment has dependencies between the CPU, its supporting chipsets, the memory and storage systems and the network knitting everything together.  Surely, though, a gigabyte of memory is a gigabyte of memory, and 10GB of storage is 10GB of storage?  Unfortunately not – there are many different types of memory that can be used – and the acronyms get more technical and confusing here.  As a base physical memory technology, is the hosting company using DDR RDIMMS or DDR2 FBDIMMS or even DDR3?  Is the base storage just a RAIDed JBOD, DAS, NAS, a high-speed SAN or an SSD-based PCI-X attached array? How are such resources virtualised, and how are the virtual resource pools then allocated and managed?

How is the physical network addressed?  Many hosting companies do not use a virtualised network, so network performance is purely down to how the physical network is managed.  Others have implemented full fabric networking with automated virtual routing and failover, providing different levels of priority and quality of service capabilities.

To come up with a single definition of a “compute unit” that allows off-the-page comparisons between the capabilities of one environment and another to deal with a specific workload is unlikely to happen.  Even if it could be done, it still wouldn’t help to define the complete end user experience, as the wide area network connectivity then comes in to play.

Can anything be done?  Yes – back in the dim, dark depths of the physical world, a data centre manager would take servers from different vendors when looking to carry out a comparison and run some benchmarks or standard workloads against them.  As the servers were being tested in a standardised manner under the control of the organisation, the results were comparable – so apples were being compared to apples.

The same approach has to be taken when it comes to hosting providers.  Any prospective buyer should set themselves a financial ceiling and then try and create an environment for testing that fits within that ceiling.

This ceiling is not necessarily aimed at creating a full run-time environment, and may be as low as a few tens of pounds.  Once an environment has been created, then load up a standardised workload that is similar to what the run-time workload is likely to be and measure key performance metrics.  Comparing these key metrics will then provide the real-world comparison that is needed – and arguments around ECU, vCPU, HCU, UCU or any other nominal unit becomes a moot point.

Only through such real-world measurement will an apple be seen to be an apple – as sure as eggs are eggs.

Originally posted at Lunacloud Compute & Storage Blog

Clive Longbottom, Service Director, Business Process Analysis

The changing face of the datacentre

31 Jan 2012

In February and November 2011, Quocirca carried out research amongst large (>$100m revenues) and very large (>$1b revenues) organisations around the world to better understand what they were doing when it came to addressing the changing needs of the business through investments in their datacentres.  The research was carried out on behalf of Oracle, and the full report on the findings can be downloaded free of charge here.

When the research is cross-referenced, it shows how small changes are leading to a far better overall technical platform.  Although each change in itself does not seem to be major, the aggregation of the changes is providing the business with higher levels of systems availability.  This is based on small changes in the capacity for organisation to control patching and upgrading of systems and applications, fewer systems outages and less impact through individual IT component failure through better architecting of the overall platform.

However, one of the more interesting findings is around virtualisation.  When looking at the press, vendor information and analyst reports, it would be easy to feel that virtualisation is a done deal for the majority of organisations.  However, figure 1 shows that this is by no means the case.

Although between February (Cycle I) and November (Cycle II) 2011, the numbers of those having minimal (<10%) virtualisation had fallen from 25% to 13%, the numbers with more than 50% of their servers virtualised had only grown from 31% to 34% - hardly the ubiquitous virtualisation many would have us believe has happened. 

However, when this is compared with stated server utilisation rates, it appears that the average utilisation rate has increased by around 5% across the board.  By this, Quocirca means that those who previously stated a 50% server utilisation rate were now running at a 55% rate – a 10% improvement.  For those who previously stated a 10% utilisation rate, however, this was now a 15% rate – a 50% improvement.

The impact of this combination of virtualisation and better server utilisation rates can have a deep impact on the datacentre itself.  Firstly, a 50% improvement in utilisation rates can lower the amount of servers required, enabling equipment to be deprovisioned, which then means that less energy is required to power the remaining servers and less cooling may well be required to maintain the datacentre within safe working temperatures.  On top of this, fewer operating system and application licences will be required – and fewer technical staff to manage the whole environment.

Another aspect of this incremental growth of virtualisation is a concomitant growth in heterogeneity of IT platform.  During the 1990s, there was a move to try and attain a homogeneous platform in order to simplify what was under management.  Now, with virtualisation allowing an abstraction of the virtual from the physical platform, the research shows that organisations are moving back to choosing physical platforms through how they support technical workloads, and are then using common application server platforms in order to create a cohesive overall environment. This is also leading to increased use of common systems management tools that enable faster root cause analysis and problem resolution.

Another finding is around the use of external datacentre facilities, both through co-location and public cloud services.  The use of external facilities where it makes sense, combined with the planned introduction of higher density systems as existing equipment ages as part of an IT lifecycle management (ITLM) strategy,  can avoid the need for the build of a new datacentre, so saving the business not only hard cash, but also the impact of a major forklift upgrade from one datacentre platform to another.

It is apparent that even under highly constrained financial conditions, IT is going through some small but fundamental changes in how it is being positioned to support the organisation dependent upon it.  There does not appear as yet to be a virtualisation or cloud computing revolution in progress – but there does seem to be slow but positive progress in moving towards a more dynamic, highly available technical platform to support the business.

Clive Longbottom, Service Director, Business Process Analysis, Quocirca

The intelligent management of computing workloads

01 Feb 2011

The rapid increase in the availability of on-demand IT infrastructure (infrastructure as a service/IaaS) gives IT departments the flexibility to cope with the ever-changing demands of the businesses they serve. In the future, the majority of larger businesses will be running hybrid IT platforms that rely on a mix of privately owned infrastructure plus that of service providers, while some small business will rely exclusively on on-demand IT services.

Even when it comes to the privately owned stuff, the increasing use of virtualisation means it should be easier to make more efficient use of resources through sharing than has been the case in the past.  Quocirca has seen server utilisation rise from around 10% to 70% in some cases where systems have been virtualised. There will of course always be some applications that are allocated dedicated physical resources for reasons of performance and/or security.

Any given IT workload must be run in one of these three fundamental computing environments; dedicated physical, private virtualised and shared virtualised (that latter being part of the so-called “public cloud”).

However, the benefits of this flexibility to deploy computing workloads will only be fully realised if the right tools are in place to manage it. In fact, without such tools, costs could start to be driven back up. For example, if the resources of an IaaS provider are used to cope with peak demand and workloads are not de-provisioned as soon as the peak has past, unnecessary resources will be consumed and paid for.

A workload can be defined as a discrete computing task to which four basic resources can be allocated; processing power, storage, disk input/output (i/o) and network bandwidth. There are five workload types:

1.      Desktop workloads provide users with their interface to IT
2.      Application workloads run business applications, web servers etc.
3.      Database workloads handle the storage and retrieval of data
4.      Appliance workloads deal with certain network and security requirements and are either self-contained items of hardware or a virtual machine
5.      Commodity workloads are utility tasks provided by third parties usually called up as web services

A series of linked workloads interact to drive business processes. Each workload type requires a different mix of resources and this can change with varying demand. For example, a retail web site may see peak demand in the run-up to festivities and require many times the compute power and network bandwidth it needs the rest of the time; a database that relies heavily on fast i/o may need to be run in a dedicated physical environment; virtualised desktop workloads may need plenty of storage allocated to ensure users can always save their work (thin provisioning allows such storage to be allocated, but not dedicated).

To ensure the right resources are allocated requires an understanding of the likely future requirements when the workload is provisioned, this is also the time to ensure appropriate security is in place and that the software used by the workload is fully licensed. Once workloads are deployed, it is necessary to measure their activity and monitor the environment they are running in, sometimes allocating more resources or perhaps moving the workload from one environment to another and of course ensuring security is maintained and that the workload always remains compliant (for example, making sure personal data is only processed and stored in permitted locations).

The intelligent management of workloads is fundamental to achieving best practice in the use of the hybrid public/private infrastructure that is here to stay. To manage workloads in such an environment requires either generic tools from vendors such as Novell, CA or BMC or virtualisation platform specific tools from VMware or Microsoft. Such products of course have a cost, but this is offset by more efficient use of resources, avoiding problems with security and compliance and providing the flexibility for IT departments to better serve the on-going IT requirements of the businesses they serve.

Quocirca’s report, Intelligent workload management, is freely available here.

Bob Tarzey, 
Analyst and Director,