How do you solve a problem like [REDACTED]?

Developing complex digital solutions to reach the proof-of-concept stage is even more of a challenge when you don’t have access to the sensitive data you need.

The answer lies in creating surrogate problems to solve.

Complex organisational challenges require complex solutions, and this is especially the case where digital, data and technology are concerned. Invariably, they involve focusing on intricate networks of systems, human behaviour, and often secure or sensitive data.

When companies make the decision about where and what to invest in - and balancing the risk of failure against the hope of success while also winning engagement - it's critical that digital professionals can deliver a proof-of-concept to help. This useful tool not only demonstrates the capability of a technology, a data analytics process or a behavioural scenario to achieve a pre-set goal, but it also lends valuable confidence and security to inform sound investment decisions.

Testing the recipe first

In the same way that a chef wouldn't add a new dish to a menu without testing it out beforehand, in our world, no data scientist worth their salt would prepare their algorithms without first demonstrating that the output is ‘true', understandable and provides whatever answer their client is looking for, either.

But how can you develop a proof-of-concept when you don't have all the ingredients you need to hand? For understandable reasons, some clients with sensitive or confidential data may not be able to release this during concept design stage. So, when this happens, it's reassuring to know there are methods available that can achieve a proof-of-concept without the need to access sensitive data.

One is ‘surrogate problem solving' - the term used when data is effectively replaced with similar, but non-sensitive, ‘fake' data. This enables a closely matched scenario to be built from the ground up, to build a solution that, once proven to be efficient, enables the broader task to begin. It's a low-risk and high-reward solution, and should be a logical first step when considering secure data innovation.

Sensitive and unavailable data

A recent example AtkinsRéalis encountered was with a water company client, who approached our data intelligence team to build a wastewater optimisation solution to reduce flooding. The client required an open-source solution to speed up and improve the quality of their results, which used an expensive third-party provider. However, the input data on wastewater loads is highly sensitive, and given a short timeframe of just a few weeks, it wasn't possible to release it to the data scientists handling the task.

The necessary step to take was to create a surrogate optimisation ‘problem', based on redacted test data generated by AtkinsRéalis. This problem could then be tested to create an initial solution and investigate possible software and design options. Whilst the redacted problem had to be kept relatively simple to produce a solution quickly, it also had to be suitably complex to provide a meaningful comparison to the current provider's implementation on behalf of the client.

To ensure the surrogate problem captured the essence of the original, we created a multi-objective optimisation problem based on a configuration for a pipe network that had two competing goals or objectives. These goals were aligned with the real-life problem: the need to minimise the cost of the design, whilst also minimising the flooding produced by the network. We also introduced realistic constraints, such as a maximum amount of money that could be spent, and a restriction on how many sizes of pipes the design could use.

Mirroring natural selection

Having inputted these parameters, the data scientists opted for a suitable open-source computing code library, and following some research selected a genetic algorithm as the best solution. This algorithm is so-called because it's inspired by the process of natural selection and works by prompting individual designs of the network to compete against each other, similar to the survival of the fittest theory, to discover the optimal solution.

Following this process, an initial implementation code was written, tested, and modified to speed up results.

In an evaluation the new surrogate flooding solution method was compared with results from software used by the current provider. The results proved that the new method wasn't just comparable, but in some instances was of higher quality, with outputs produced more quickly. Furthermore, since the new approach also only used license-free software, the cost of implementing it in full for the client was much cheaper than continuing with the current arrangement they had with their provider.

This also means new features can be built in much more flexibly. This is one example of the power of surrogate problems: the team in this task was able to demonstrate a time, cost, and quality benefit to the client without ever having access to the real-life, sensitive data. It produced not only a fast initial proof of concept, but also a valuable tool to allow decision-makers to unlock innovation without taking any undue risks.

Sensitive data shouldn't be a barrier to achieving these same benefits in sectors with information security or confidentiality concerns. Surrogate problem solving enables a removal of this barrier, opening feasibility testing and facilitating a potential step change in innovation acceleration across the board. Test cases like these are now proving how we can develop faster and better solutions to complex digital, data and technology problems, that we can deliver with lower risk to clients.

Image

Description

Matthew House and Oscar Holemans

Matthew House and Oscar Holemans are data scientists at AtkinsRéalis.