Scaling up fact checking: AI's role

Andy Dudfield of Full Fact

Image:
Andy Dudfield of Full Fact

Full Fact's automation chief Andy Dudfield on the web-scale challenge of correcting misinformation

A lie is halfway round the world before the truth has got its boots on. This famous aphorism has never felt more apt, but who coined it? That is less clear. Similar adages bemoaning the human propensity to believe a titillating lie over a more commonplace truth go back through time, but if boots are the factor that makes this one stick in the brain, then we're probably looking at US newspaper The Portland Gazette circa 1820s where it appeared thus: "Falsehood will fly from Maine to Georgia, while truth is pulling her boots on."

In the modern era, attributing sources to statements and helping people navigate the landscape between fact and fiction - or as Andy Dudfield, head of automated fact checking at London-based charity Full Fact puts it, good information and bad information - is the core remit of fact checking organisations.

'Bad information' may be the result of a simple misquote, an error, a joke taken out of context, or satire sailing over the heads of its audience. Or it could be deliberately propagated disinformation: cherry-picked or invented statistics, half-truths or barefaced lies. Whatever the root cause of its wrongness, once bad information is in the wild it takes on a life of its own, so there's little value in trying to categorise it by intent (although intent may play a part when deciding how to address and rectify it). The most important consideration, said Dudfield, is its impact. As we have seen, bad information can have serious online and real-world consequences.

"What is the harm caused by the bad information? Where is it propagating? Who said it? What standards can we hold them to?"

Once information has been identified as factually inaccurate, the race is on to reduce its impact. Fact checks can be published as a corrective wherever it appears online, and originators and spreaders called out and asked to correct misinformation. Like the erratum section in a newspaper, fact checks provide an opportunity for redress and to correct misunderstandings, but as a product of the modern information landscape their focus is much broader than a single publication.

A web-scale challenge

The nature of falsehoods may not have changed all that much since the 1820s, but both their volume and the speed of their propagation have escalated massively.

Confronting bad information in a timely manner is "a web-scale challenge", according Dudfield, who heads up a team that's developing technologies to help Full Fact's 35 employees cope with the misinformation tsunami, and it's one in which automation is inevitably playing a vital role.

Not all facts are equal, and Full Fact first needs to parse the 80,000 to 90,000 pieces of information that pass through its systems every day for checkability and relevance. Having identified likely candidates, the next step in the pipeline is to filter out opinions and predictions, which are not in the checker's remit, and to home in on quantity claims, those with numerical values attached, and voting record claims which can be verified using trusted sources.

The system then seeks to enrich the data through entity extraction, looking for names, places and topics and augmenting them with third-party information. Also at this stage any quotations are attributed, following the trail back to the actual source of the quote, as Dudfield elucidated: "Did somebody say something? Did somebody say that somebody else said something, or did a newspaper report that somebody said somebody else said something?"

The firehose of information thus reduced to a manageable trickle AI's job is done, and the work of analysing chains of events, verifying sources and pinpointing where the bad information crept is performed by the all-important human fact-checkers at the end of the pipeline.

"I always want to make sure that checkers and other staff have the best tools possible, but it's never to fully automate processes. That is technically very difficult," Dudfield explained.

Because while AI is "great at pattern matching and spotting emerging trends", such as a sudden rise in the frequency of a certain word or phrase, people are simply much better at "understanding context, caveats and nuance; these are the things that humans are brilliant at doing, and it's a lot quicker and easier to do the actual fact checking there."

For example, much of the false information circulating around Covid could have been, and was, foreseen by looking at past outbreaks of disease.

"Vaccine hesitancy was always going to be something that we could spot. That's not an AI model telling us that, it's just something that is a predictable part of what the information landscape would look like."

Tools of the checking trade

To automatically sort, filter and enrich the data, Full Fact has developed an AI system based on BERT, the natural language processing (NLP) model developed by Google and later open sourced.

"It's a large scale language model which was specifically trained to identify claims, using annotations provided by fact checkers," Dudfield said.

"That means when we're monitoring hundreds of thousands of sentences in different web pages, we can identify the things that seem claim-like, and then we can classify those using this model into different types of claims."

Full Fact's goal is for its fact checks to achieve the maximum possible impact. This is about timing, which at web-scale means providing a vehicle for rapid propagation of fact checks. In common with other fact checkers, it has relationships with the big tech companies, and a fact checking ecosystem that has grown up to ensure the truth can get its boots on as quickly as possible.

Schema.org, the web reference site, has a structured standard markup for fact checks, meaning they are treated and displayed in a certain way by search engines and social media sites. Then there's ClaimReview where fact checkers can annotate their articles so they get picked up and displayed alongside relevant search results or feeds in real-time.

There are strategic alliances too, including with other fact checkers around the world through information sharing and annotation using platforms like the International Fact Checking Network and also collaborations with other organisations, including fact checkers and tech platforms.

Full Fact and its AI team is working with an organisation called Africa Check in Kenya, Nigeria and South Africa, including checking information about elections.

"25% of the fact checks published by Africa Check were identified by some form of AI model that had been produced by Full Facts. So it's really exciting to make a significant difference in addressing this information," said Dudfield.

The fact checking future

The work with Africa Check shows that facts, too, can get quickly around the world - but so far this has mainly been in restricted to English and other widely spoken languages. Social media disinformation has been used to whip up hatred, leading to mass killings in Myanmar and Ethiopia, with automated content filters on platforms like Facebook far less effective in languages that aren't English. This is something Dudfield and his team are looking to tackle, including working with Meta to improve its systems.

"Can we take what we've done in the English language and make that work in other languages using the same underlying model? That's where we're going to be finding ourselves focusing on the next couple of years."

Given the pace of advancement of NLP and predictive analytics, will the lie one day travel halfway around the world to find a fact check already waiting for it? No. Misinformation will always have a head start, but those tools certainly give truth a fighting chance of staying in the race to stabilise the information landscape.

"In times of crisis and anxiety, people can be susceptible to conspiracy theories, particularly when they feel powerless," said Dudfield. "And so we really want to make sure that people react and have the best information available to them when they're consuming that kind of information."