App designed to safeguard children online was the result of two years' toil at AI's cutting edge, says product manager Jon Howard
The BBC is in a unique position when it comes to developing AI, says Jon Howard, executive product manager children's future media at the broadcaster.
Large enough to make the most of relationships with the tech giants working at the cutting edge of the craft like Apple and Google's TensorFlow engineers, its remit also enables the broadcaster to step into areas frequently neglected by VC-funded start-ups.
When looking at ways to tackle the epidemic of online bullying and childhood mental health problems related to living life on screen, Howard's team asked 50 parents and 50 children what they'd like to see in an app dedicated to children's' wellbeing. The parents universally wanted a monitoring app while the children universally did not, so that was that. In any case, there are already plenty of parental monitoring apps, said Howard, because that's where the money is, but there was nothing to help children to manage their own experiences on social media and the wider online world.
"We saw this as a key space where the BBC could help out," said Howard (pictured), speaking to Computing after the Deskflix Public Sector event where he delivered a keynote presentation. "Here's where our public service principles could play their part."
New worlds
Many kids receive their first smartphone between the ages of 8 and 12 and become active on social media, entering a world governed by a very different set of rules, norms and expectations. It's a time of life when they are exploring new horizons, and unlike in the physical world where a disapproving glance from an adult is enough to tell them they may be acting inappropriately, cyberspace offers little in the way of guardrails.
Children's charity NSPCC has registered a sharp rise in referrals to mental health services, some of which is related to the use of social media. The BBC's Own IT app seeks to replace the adult in the room with an AI in the phone, an advisor who can warn the child that the message they're about to send may be hurtful to the recipient, or that sending their phone number to a new contact may not be a good idea.
Own It is a keyboard for iOS and Android plus a companion app that enables the user to record their feelings. Designed in consultation with child psychologists, the Turing Institute and Public Health England, the idea is to help children to better understand their behaviour online and its effect on others, and to navigate the new world safely. It adopts a ‘nudge' approach rather than being proscriptive and is focused purely on output - what children are saying rather than what they're seeing. In this way, it can work in conjunction with monitoring apps if parents insist.
You say it you own it
The product of two years' development work, the Own It keyboard deploys machine learning models to support autocorrect, autocomplete and next-word-prediction functions based on how children really communicate online. In this, it works in a similar way to Google's GBoard keyboard, but using kids' vocabulary. But there's also a proactive element: start typing a message with ‘I hate you…' and the guardrails kick in. The UI border turns red and a worried-looking emoticon pops up with a message "How would you feel if someone sent you that?" The keyboard does not stop the hateful message being sent because Own It is about encouraging children to be responsible for their own behaviour, offering guidance rather than policing. Instead it prompts the user to try a different set of words until a happy face and calm blue border appears.
"Kids generally want to be good, they generally want to be nice people," said Howard. "We won't stop them sending a message, we just provide a bit of friction."
The companion app, meanwhile, allows the user to keep a private record of their thoughts and feelings, which has been shown to be an effective psychological approach to combating stress.
The app has won many plaudits, including a CogX award for best innovation in natural language processing (NLP), a UXUK gong for best design for education, and a Banff World Media award for best interactive content for young people, plus there's been international interest from other publishers and broadcasters.
"A lot of people are taking notice", Howard said.
Own it was developed by ten BBC product and project managers, technical architects and UX and editorial content producers who worked with five engineers from Swiss consultancy Privately to develop an SDK containing the machine learning, AI and business logic, and a similar number from Glasgow-based UX specialists Chunk Digital. Other specialists and consultants were brought in as the needs arose, including Apple and Google engineers - who were consulted to ensure the app would work cross-platform with only one development workflow - child psychologists and experts in AI ethics.
"As much as possible we tried to work as a single team, ensuring that communications were constant and dependencies were tightly managed," said Howard.
Keeping up with the yeets
The app's functionality is simple, intentionally so, but this simplicity is the result of a great deal of research, experimentation and the jettisoning of many a bell and a whistle.
Howard's team encountered a number of tricky challenges as they developed the MVP, the first of which was the lack of any coherent dataset on the way young people speak online with which to train the machine learning models. A core dictionary had to be painstakingly pieced together by analysing young peoples' messages on social media and comparing word frequencies and usage with adult equivalents.
The way new words arrive is absolutely fascinating
And of course, language doesn't stand still. Howard mentions yeet, a word meaning to forcefully throw away, which suddenly became popular after a Vine video of a kid dancing and then another featuring a girl hurling a can of soda while yelling ‘yeet!' went viral. New words can spring up on social media anywhere in the world and become part of the global lexicon within a few short months, sometimes with local variants. Incidentally, the Urban Dictionary shows that yeet peaked in 2019 and is now, presumably, being used ironically by the cool kids. The app's models need to be able to keep up with nuanced changes like these, or it will find itself yeeted.
"The way new words arrive is absolutely fascinating," Howard said. "I went off on a massive track of learning all about this stuff, building glossaries and then taking words that have just been introduced by children and refining the models."
Neutral territory
Then there was the perennial problem of data-driven bias in machine learning models. Own It deploys four models designed to recognise hate, toxicity, emotion sentiment and emotion, which it uses to decide on the appropriateness of a message. But during development the team uncovered a number of areas where it overreacted or made a mistake. For example, a message beginning ‘Men are..' would cause the concerned face and red border to immediately pop up before any qualifying words were added. To tackle this issue, the team replaced certain gender-, religion- and ethnicity-specific terms with ‘neutral' and were surprised at the results.
"That was a really simple trick and we thought this is never going to work, but it worked really well and it was way better than before," Howard said.
The privacy conundrum
Trust is all-important with an app like this, and Own It adheres strictly to all the principles of Privacy by Design. No information that might identify the user or the device is collected or stored, and all messages are deleted from the keyboard as soon as they have been sent.
Great, but for a machine learning app this presents a problem: how to integrate feedback to improve the models?
For now, this is a semi-manual process using the feedback provided by users, for example a complaint when a warning has been shown during casual friendly banter, to pinpoint areas for improvement. That information is then used to source better data to cover the problem areas. Differential privacy techniques will eventually solve these problems, but they are not ready for the primetime just yet.
3.5 billion into 15 million will go
In line with its privacy-preserving credentials, Own It is completely decentralised. Its functions are fully encapsulated on the device and it does not communicate with the models in the cloud. This design presented another major challenge: the 3.5GB machine learning models were far too large for a smartphone app. So, Howard's team got to work whittling it down to size. Using a variety of methods including FastText compression and shaving off a few model layers they succeeded in shrinking the 3.5GB to 40mb, and with further attention to the data reduced the final ML ensemble still further to a mere 15mb, with accuracy actually improved by making the machine learning less generic more specific.
The result is a package that can be easily updated over the air every few weeks, which is frequent enough even to keep up with the rapid flow of teen memes and text-speak.
We were at the cutting edge 18 months ago
What's next?
NLP is improving at an exponential rate. "We were at the cutting edge 18 months ago" quipped, Howard, adding that the team is constantly reviewing new developments to see how they might be incorporated, including differential privacy techniques - which would allow models to learn from anonymous user data without the danger of de-anonymisation - and federated learning, in which changes in weights and biases are synchronised between on-device and cloud models. The latter would ultimately allow some of the machine learning training to be performed on the smartphone rather than in the cloud, something Google is already doing with Gboard: "It's really interesting and Google are nailing it now," said Howard.
Another area Howard is keeping an eye on is synthetic data, deploying text-generating neural networks like OpenAI's GPT-2 and the latest GPT-3, which is currently making headlines for its ability to generate meaningful and contextually appropriate sentences with only very small example sets to work from. Models like these could be used to fill in gaps in the data, currently a manual process. "If you give these models some of our phrases they churn out terms which are like them. Experimentally we've found they're really good."
As well as rounding out the dataset, such models can also be used to check for their own biases too.
Future iterations will be informed by the results of ongoing efficacy tests by Manchester University which are expected to conclude in the next few months. The team is also developing an efficacy framework to gauge the effectiveness of Own It in helping children make the most of what the web has to offer while avoiding the pitfalls.
"It's about finding out which interventions are most effective and how best to present them," Howard said.
Jon Howard will be speaking at Computing's IT Leaders Festival 2020 - register today!