Peter Cochrane: Is it even possible to properly test complex systems?

Systems have reached such a state of complexity that only AI can bring some kind of order, argues Professor Peter Cochrane

Over thousands of years, humankind experimented with materials in the making of tools, weapons and artefacts. The refinement of base metals, saw the creation of alloys and even chrome plating at the pinnacles of empirical metallurgy and craftsmanship.

And, at some uncertain point, mechanisms and machines were produced to amplify our limited abilities, while further/continual refinement achieved remarkable advances in reliability, resilience, repeatability and performance.

During the last 100 years or so we have also mastered the art of creating highly reliable hardware systems from relatively low cost and, sometimes, not so reliable components. And this has been mostly achieved by being able to specify, design, manufacture with precision, and exhaustively test all the individual components and complete systems.

So when we look at, and consider that software has only really been around and evolving for 60 years or so, it is worth remembering its juvenile nature in this historical context of hardware. And whilst it was possible to exhaustively test software, and even software-hardware combinations in the early days, that is no longer true! We are now defeated by complexity, ‘stochastic' complexity!

Everything from operating systems, networks, apps, games, websites, along with the hardware they live on, are now fundamentally beyond our ability to thoroughly and exhaustively test and qualify. Consider this example:

A battalion of 1,000 troops in a battle with sensor systems, radios, mobile phones, weapons systems, ground vehicles, drones and helicopters continually updating an electronic HQ with situation reports. Dead, wounded, fully fit, food, water, ammunition status, stationary, moving, under fire or not, resources expended….ie: full dynamic awareness reporting by the second to be analysed by some situational machine presenting situational awareness reports to commanders.

Here, the total number of ‘states' far exceeds the number of atoms in the Universe. And so any exhaustive testing through war games and software cycling is fundamentally impossible. The best you might achieve is a statistical view on the back of many Monte Carlo runs.

This general condition is even manifest on weapons platforms such as aircraft and tanks with the infamous ‘Big Red Button' to be hit when a software/system glitch renders weaponry and or the platform dysfunctional in some way. Yes, the ‘in flight, mid-battle' software reboot is a reality.

There is no perfect solution to such complexity. We can expect servers, machines, mobiles, systems and apps to continue locking up and crashing for some considerable time yet - especially when they're needed most. So what to do? We have only one tool in prospect that might just master this situation and that is artificial intelligence or AI.

At a fundamental level, AI is supreme at identifying patterns, cause and effect, in near real-time across vast tracts of data. And so, the monitoring of data flows and data states in our ‘battalion scenario', or indeed any large system or network presents a perfect fit. The key is to gather data and analyse continuously, non-stop 24x7.

So for a fleet of ground vehicles, aircraft, robots, industrial plant or weapons; gathering the data from all, and not just one, is the vital step. When one machine or system experiences a glitch, then the entire population learns and gains from that experience.

One further and obvious step here is the deliberate introduction of random and purposeful fails. These will illicit a wealth of extra data in the form of new patterns. In short: all events and failures present precursor indicators if only we could find them.

But we can't, whereas the AI can!

Professor Peter Cochrane OBE is an ex-CTO of BT who now works as a consultant focusing on solving problems and improving the world through the application of technology