The cloud - business continuity at affordable pricing?

Many organisations look to the cloud to provide some level of contingency against their own systems going down, be it off-site data backup, failover servers for business applications, or the use of high-availability servers and software. The level of disaster recovery (DR) and business continuity (BC) a given organisation chooses to put in place will vary according to its own risk appetite and budget.

There degree to which cloud services are suitable for providing a safety blanket will vary from one case to another. So which one is right for your organisation?

The following use case scenarios provide some guidance, starting with the most basic level of data backup and moving to full business continuity.

1. Simple data backup - the cloud can act as an external storage system where files can be stored so that if there is a problem with on-premise storage, individual files can be recovered, or images of specific machines can be restored to a device. This can be very cost effective - but as with similar on-premise solutions, there will be a level of down-time while the data is identified and restored to the live environment. Also, large amounts of data will take a long time to be recovered over the internet - which is why Quocirca recommends that data be recovered from the cloud to a local physical device which is then couriered to the customer's site and then recovered to the target storage system at local area network (LAN) speeds. However, the service provider may be able to offer additional archiving services that could work well for compliance needs (as Quocirca points out in a previous blog post).

2. Secondary data storage. The cloud can be used as the place where a mirror of existing data is kept. Then, when there is a failure in an on-premise data storage device, systems can failover to use the data being stored in the cloud. Although this may look as if it provides good levels of business continuity, organisations must bear in mind that providing data to on-premise applications from outside the data centre may lead to latency issues, and that the synchronisation of live data may not be as easy as first thought.

3. Primary data storage - no data is stored on-premise, instead being held directly in the cloud. Although this should provide better data availability due to how the cloud provider architects its storage platform, the latency from the on-premise application to the data will generally make this a non-viable option. However, data backup and restore is now being carried out at LAN speed.

4. Applications and data are held in the cloud, with data back-up and restore being integrated. This moves the application and data closer to each other so that direct latency is no longer an issue. As long as the application supports web-based access effectively, the user experience should by good. Should the prime data storage be impacted, restores can be carried out at LAN speed so recovery time objective (RTO) is shortened. However, this only provides data continuity - if the application goes down, the organisation will still be unable to carry out its business.

5. Applications being used as virtual machines with data being mirrored. This is getting closer to real business continuity. By using applications that have been packaged as virtual machine, the failure of a single instance of the application can be rapidly fixed through just spinning up a new instance. Data needs to be covered as well, and should be mirrored to a different storage environment so that there is a high level of data availability in place. Such an approach can lead to recovery times measured in a few minutes, and will be enough for many organisations. This is also known as a "cold standby", as standby virtual machines are not running all the time.

6. Stand-by business continuity. Here, the stand-by application virtual machine is permanently "spinning" (i.e. provisioned), but is not part of the live environment. On the failure of the live image, pointers can be moved over to the stand-by image in a matter of seconds, using existing or mirrored data storage. Also known as "hot standby", as the virtual machines are ready to take over as soon as a failure occurs.

7. Full business continuity. Here, everything is provisioned to at least an "N+1" level. Multiple data storage silos are mirrored on a live basis and multiple live application virtual machines are maintained. Workloads are balanced between the virtual machines, and two-level commit is used on data to ensure that any problem with the data itself is not mirrored across all the data stores at the same time. This is the approach used by large organisations that have to have the capability to continue working through a systems failure - but is outside of the cost capabilities of the majority of other organisations. Cloud computing can bring such a capability into the reach of more organisations through the economies of scale.

Obviously, there are cost issues as the amount of cover increases through the table. This is why any organisation must first understand its corporate risk profile, building up a picture of exactly what business risks it cannot afford to carry and that which it is capable of carrying. Once a risk profile has been created, the right level of technical "insurance" can be found from a cloud or hosting provider. The cloud makes the costs less of an issue, as each level can be offset through the number of organisations that are sharing the infrastructure. Therefore, an organisation that has previously regarded business continuity out of its reach and has settled for disaster recovery can now look to the cloud to create a more business-capable platform.

Originally posted at Lunacloud Compute & Storage Blog

Clive Longbottom, Service Director, Business Process Analysis, Quocirca