Amazon's EC2 and Microsoft's BPOS are felled by lightning

Important that we learn lessons from these events, says Rackspace manager

Both Amazon's EC2 cloud service and Microsoft's BPOS online desktop application suite in Ireland went down yesterday. The outages were the result of lightning during storms.

According to a service health dashboard on Amazon's web site, there are still problems with the services in the affected EU-West region, although they are operating normally in other availability zones.

After the outage, some of the electric block storage servers that support EC2 had to be recovered manually, and on the same dashboard Amazon said last night that restoring the service required it to make an extra copy of all data and that this had consumed most of its spare capacity and slowed the recovery process.

It said: "We anticipate that it will take 24 to 48 hours until the process is completed."

Microsoft released a statement on the outage which read: "On Sunday, August 7, 2011, beginning at approximately 10:50AM PDT / 5:50PM UTC, a widespread power outage in Dublin caused connectivity issues for European BPOS customers.

"Services were restored to all customers by 5:45PM PDT/ 12:45AM UTC. Throughout the incident, we updated our customers regularly on the issue via our normal communication channels."

Riccardo Degli Effetti, datacentre manager at cloud provider Rackspace, argued that the outages were not due to the services being cloud services per se, but the result of problems with Amazon and Microsoft's infrastructure.

"It does not matter whether the computer hardware or software you use is located in your home or office, or miles away in the datacentre of a hosting company, things can still go wrong.

"That said, there are ways to protect datacentres from lightning strikes, such as installing lightning and surge protection.

"Both cloud and traditional managed hosting have to rely on well-maintained infrastructure to function," he said.

He was also critical of the way in which Amazon dealt with the outages: "Customers rightly demand a high level of service and transparency when things do go wrong. It is no longer acceptable to post updates on a web site and not communicate directly or through multiple channels.

"Outages, although rare, are painful, and they should motivate cloud and hosting service providers to improve both preventive measures and the overall level of care provided to customers. As the saying goes, lightning never strikes twice, but Amazon has been hit before, in the US in 2009. It is important that lessons are learned from these events," he added.