Peter Cochrane: Tackling fake news and propaganda with AI and machine learning
This year saw the first global 'datathon' to apply natural language processing to identify 'fake news'. Professor Peter Cochrane was there as an advisor
Every now and then, serendipity draws me into something new, unusual and exciting. And so it was that following my previous opinion article on the Truth Engine I received an invitation to become an advisor to the first global datathon to apply ‘natural language processing' (NLP) to a representative sample of fake news and propaganda, all of which that has appeared in the media and on social networks.
The event was co-organised by The Data Sciences Society and QCRI Qatar, with dedicated platforms hosted in Doha, Bangalore, Riyadh and Sofia. A foundation component of the Datathon was the QCRI/MIT-CSAIL Tanbih project, focussed on detecting bias, and propaganda in news publications.
By the time I was engaged as an advisor more than 200 participants across 30 countries had already registered and formed 40 teams of students and professionals. Their primary Datathon challenge was as follows:
"To develop intelligent systems able to classify entire articles and text fragments as propagandistic or not"
All the teams were challenged by the same standardised AI training datasets comprising 451 news articles with sentence-level annotations indicating the content as propaganda or not. And they had fewer than five days (and nights) to come up with workable solutions, demonstrably viable capabilities, fully compiled reports, and report back/present to the judges.
The basic thinking of the organisers is further exemplified by the following definitions and categories:
Propaganda and fake news definition: The spreading of ideas, facts, or allegations deliberately to influence opinions with reference to predetermined ends.
Or, as PolitiFact would have it: "Fake news is made-up stuff, masterfully manipulated to look like credible journalistic reports that are easily spread online to large audiences willing to believe the fictions and spread the word."
And for CBS: "Stories that are provably false, with enormous traction in the culture, and consumed by millions of people".
Difficulty level 1: Build an intelligent system able to detect any propagandistic article;
Difficulty level 2: Detect whether each of sentences propagandistic or not; and,
Difficulty level 3: Locate and identify each propagandistic technique.
So, how did the teams do? The top ten achieved detection accuracies just below or just above a remarkable 86 per cent. This surprised everyone, including me.
How did they do it in such a short time? They configured standard, or readily available, NLP engines and/or components with AI packages that learned to identify emergent patterns of words, phrases, statement and headline types.
If AI is good at anything it is pattern recognition and matching. This is a particularly important quality when identifying patterns hidden in massive data sets that escape human ability. The big question now is; could we significantly improve on these results?
My guess is that the old engineering mantra applies: You get 80 per cent of the result for 20 per cent of the effort, and achieving 100 per cent is probably impossible.
The reality is that fake news and propaganda will most likely need five to ten distinctly different techniques applied at the same time, as inferred in my previous article: How to Build a Truth Engine.
Here, I identified fact checkers and long-term historical analysis of publications and behaviours, employment, employer, organisation, motivation, and hidden agendas as accessible and workable trending metrics. I am now adding AI applied to NPL to that list.
If there is a negative here it has to be the ‘dark side' watching and accessing the Hackathon in order to learn about new defence strategies. However, the good news is that human habituality is very hard to hide, and AI will continue to learn and adjust accordingly in near real time. So I think this is a war we might just win provided we consolidate our global resources.
Professor Peter Cochrane OBE is the ex-CTO of BT, who now works as a consultant focusing on solving problems and improving the world through the application of technology.