AI smartwatch tech can detect speech tone

Researchers working on mood AI

Researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and Institute of Medical Engineering and Science (IMES) have developed a smartwatch that can detect conversational tone has been developed by r.

The wearable, claim the researchers, can identify whether a conversation is happy, sad or neutral based on a person's speech patterns and other vital signs. The researchers suggest that such a device could help people with conditions such as Aspergers Syndrome who can struggle to fully comprehend communication.

The system can analyse audio, text, and physiological signals to determine the overall tone of the story with an 83 per cent accuracy. Using deep-learning techniques, the system can also provide a ‘sentiment score' for specific five-second intervals within a conversation.

"As far as we know, this is the first experiment that collects both physical data and speech data in a passive but robust way, even while subjects are having natural, unstructured interactions," says Mohammad Ghassemi, a PhD student at the University.

He added: "Our results show that it's possible to classify the emotional tone of conversations in real-time."

If multiple people in a conversation were to use the devices at the same time, the system's performance would be improved still further.

The prototype system is based on a Samsung Simband, which captures features such as movement, heart rate, blood pressure, blood flow, and skin temperature. The system also captured audio data and text transcripts to analyse the speaker's tone, pitch, energy, and vocabulary.

After capturing 31 different conversations of several minutes each, the team trained two algorithms on the data: One classified the overall nature of a conversation as either happy or sad, while the second classified each five-second block of every conversation as positive, negative, or neutral.

The devices could be further honed in the future to provide more detailed analysis.

Tuka Alhanai, a graduate student who worked on the project, suggested that the researchers had taken a different approach to artificial intelligence. In traditional neural networks, she said, all features about the data are provided to the algorithm at the base of the network. In contrast, her team found that they could improve performance by organising different features at the various layers of the network.

"The system picks up on how, for example, the sentiment in the text transcription was more abstract than the raw accelerometer data," said Alhanai. "It's quite remarkable that a machine could approximate how we humans perceive these interactions, without significant input from us as researchers."

She added that while the system is not yet reliable enough to provide "social coaching", that is the goal.

"Our next step is to improve the algorithm's emotional granularity so that it is more accurate at calling out boring, tense, and exciting moments, rather than just labelling interactions as ‘positive' or ‘negative'," said Alhanai. "Developing technology that can take the pulse of human emotions has the potential to dramatically improve how we communicate with each other."