Coders must ensure AI doesn't interpret 'prevent human suffering' as 'kill all humans', warns researcher

The potential for AI to harm people - even inadvertently - isn't low enough to be ignored, warns Dr Stuart Armstrong

The creators of general artificial intelligence will need to be cautious about the instructions they code into intelligent machines because even a benign command such as 'prevent human suffering' could be translated as 'kill all humans', unless technical and philosophical boundaries are properly defined in coding.

That's according to Dr Stuart Armstrong, a James Martin Research Fellow at the University of Oxford's Future of Humanity Institute, and specialist researcher into the potential and risks of artificial intelligence (AI).

The warning comes after Professor Nick Bostrom, director of the Future of Humanity Institute, recently warned that humanity must ensure that it doesn't let AI overpower it.

It sounds like science-fiction, but ultimately AI can only decide on its actions based on its internal coding. Yet without strict coding dictating what it should and shouldn't do to ‘prevent human suffering', the AI could decide that the best method of achieving this is to ‘kill all humans', Armstrong warned at an AI debate hosted by Gartner.

Open goal

"The problem with goals is that it is very hard to specify a goal like ‘prevent human suffering'. It's a very hard goal to describe; human and suffering are very difficult concepts, but what you've just programmed is ‘kill all humans'; this is the best way of preventing human suffering," he said.

Armstrong explained that a coder might know what they mean when they write code to issue an instruction to prevent harm coming to humans.

However, he warned that it is "naïve" to ignore the fact that the AI is likely to have a completely different understanding for what the command means, especially without building in failsafe mechanisms to intelligent machines.

"We're almost there with AI, we're almost there at the point of generating an artificial mind that can think as well, or better than, a human," said Armstrong.

"And they appropriate no safety precautions and are giving it goals that would have been lethal if it had actually worked," said Armstrong, who joked that in code "a lot of things end up with ‘kill all humans', it's sort of an informal rule".

Even if AI coders "try to be a bit more cunning", it's possible that outcomes could still be harmful. For example, Armstrong said that if artificial general intelligence was coded with the command "keep humans safe" then the AI could potentially believe the best way to do this would be by entombing humans in a concrete bunker, keeping them alive with the odd food drop.

Armstrong admitted that there are those who "think the argument is inherently flawed", but he dismissed their views as "not particularly convincing".

The probability of an artificial general intelligence misunderstanding its commands and potentially harming humans as a result is "not low enough that it can be ignored because of the stark impact that it would have", he warned.

The problem, he continued, is that humans have a shared understanding of what we mean when we communicate things in words. "We can tell people to prevent human suffering without adding the caveat ‘and by the way, don't kill everyone'," he said.

A human understands linguistics to an extent that they automatically know what they shouldn't do if they're asked to prevent human suffering. "If we say 'prevent human suffering', we're also not going to institute a dictatorship that orders people around to prevent human suffering," said Armstrong.

"But nowhere in the ‘prevent human suffering' did we say ‘and don't do it via dictatorship, don't do it via hypnosis, don't do it via convincing people that suffering is good', or something like that," he explained.

However, an artificial intelligence may not necessarily understand the complicated semantics around the exact meanings of words, especially when "these caveats are even more complex because they're culturally dependent".

Computer languages

According to Armstrong, "the ultimate challenge is to try and translate this into code" but, he warned, even that might not be able to completely solve the problem.

Nonetheless, Armstrong is conducting research at the Future of Humanity Institute and suggested that the answer might be "some version of AIs that can be corrected if they're going wrong". Something like Isaac Asimov's very general "three laws of robotics" have already been widely dismissed as impractical.

Ultimately though, the problem might not be solved until an artificial intelligence can be coded so that it comes to the same philosophical conclusions as a person would if it were told to prevent harm coming to others.

"There might be some traction from the philosophical problems, if we can define these terms like prevent human suffering," he concluded.

While Dr Armstrong conducts research into the risks surrounding artificial intelligence, there are those such as DeepMind CEO Dr Demis Hassabis who believe that AI could help solve humanity's biggest issues.