Disasters come in many shapes and sizes, from the floods, fires and major incidents that most people associate with the word, through to isolated, unpredictable events that can nevertheless spell disaster for the organisation whose systems, core data or operations have been compromised.
The job of a disaster recovery (DR) strategy is to ensure that whatever the disruptive event, vital data can be recovered and mission-critical applications brought back online in the shortest possible time. Particularly difficult to plan for are compound events in which one seemingly minor incident leads to others and creates a domino effect.
Computing conducted a survey of 147 senior IT decision-makers to identify both the perceived risks and the actual problems that have caused data centre downtime and data loss in medium and large organisations. We also sought to identify the consequences of such events, the lessons learned and the measures taken to prevent them from happening again.
Causes of data loss and downtime
Respondents were asked about their experiences of failures in the IT infrastructure and also about unexpected occurrences in the wider environment. The top five results in terms of frequency of impact are presented in Figures 1 and 2.
The first thing to notice is that most of the problems suffered by firms originate within the data centre and its support systems, with hardware failure and loss of connectivity topping the list. These events are also the most likely to be included in the DR plan (although some, such as software problems, are poorly represented).