Computer says no': How open data can mitigate the dangers of black box AI

AI is only as good as its training data; making that data as open as possible can make it better, fairer and more trustworthy

Artificial intelligence (AI) is currently enjoying a renaissance in industry and popular imagination. For the first time, we have enough large-scale data for training AI systems: public datasets for computer vision, natural language, speech and many more non-public datasets within businesses and governments. Recent improvements in hardware are also making it more cost-effective to train and run machine learning models.

Currently, most AI-centred innovation is based on a business model where training data is considered protected intellectual property (IP), and AI systems are generally provided as inscrutable 'black boxes' with little way of understanding their internal workings.

This however, is problematic. This kind of business model homogeneity can have a chilling effect on innovation and cause a thriving AI sector to stall. Opacity by default also exacerbates the risk that AI systems, with unfair bias encoded in their black boxes, could be misused when making decisions that affect people's lives.

The key to the AI's inner-working resides in the training data: the bias in what is included, as well as what is not, can sometimes be translated into prejudicial systems, as engineers unknowingly encode historic and current data into algorithms that maintain the status quo, reflecting our current economies and societies.

This is what technologist Maciej Ceglowski calls "money laundering for bias": the risk that blind faith in the superiority and efficiency of AI will end up crystallising data about the past and the present into future systemic unfairness. High crime rates in a given post code may end up condemning people who live there to an endless string of automated rejections - both an egregious misunderstanding of statistics, and a terrifying case of "computer says no".

This is not a hypothetical scenario. Some US police departments have been enthusiastic about AI systems that promise crime detection or more efficient sentencing. Without careful design this can lead to the adoption of flawed, often ineffective and sometimes unfair systems.

At the ODI, we have been looking at AI as part of our innovation programme, which aims to advance knowledge and expertise in how data can shape the next generation of services, and create economic growth. We believe that fostering AI innovation requires an open approach that includes open data, open source code and open culture, so that we can capture the benefits of AI while mitigating the risks. How do we do that? We have two suggestions.

1. Better access to data will unlock the potential of data-hungry machine learning systems, but it is also a way of ensuring that the systems we create are safe.

It is important that this focus on opening data is not just limited to government and scientific research data. At the ODI, we have been ramping up our efforts to make data held by the private sector more broadly available, making it as open as possible while protecting people's privacy, commercial confidentiality and national security.

This includes the notion of "data trusts", as suggested in the UK Government's AI review last October, but we need more experimentation to find the right mechanisms for data sharing across the whole data spectrum.

2. We want to create a data economy where rights and responsibilities are adequately distributed, and where more control over the usage and sharing of data is given to the individual.

This is particularly important for fueling the uptake of AI: these algorithms tend to use personal data as training sets. The ability of AI algorithms to spot patterns also makes them very effective at re-identifying personal data in "anonymised" data sets, causing significant concerns about individual and group privacy.

There, too, more work will be needed to avoid the twin pitfalls of callous —and possibly illegal— sharing of personally identifiable data, and of a closed future motivated by the fear of doing wrong.

The future for AI is not predetermined: it is up to us to create and shape the future we want. We hope our efforts will help create a future which is as open as possible and benefits everyone.

Olivier Thereaux is head of technology at the Open Data Institute (ODI)