Ctdit23 1125 125 website image.jpg

Long Reads: Scarlett Johansson claims OpenAI copied her voice

But first, the company ignored it

Scarlett Johansson Credit: Shutterstock

Image:
Scarlett Johansson Credit: Shutterstock

Johansson versus Altman is symptomatic of some much deeper problems with the development of AI chatbots

If you haven't yet spotted the smoke arising from the Scarlett Johansson versus OpenAI legal binfire here's your TLDR summary:

Last week, around the time of the launch of GPT-4o, Sam Altman of OpenAI tweeted this.

This refers to a 2013 movie in which Scarlett Johansson voiced the character of an AI assistant.

Soon after the launch, lots of people spotted the fact that ‘Sky,' one of the five voice options for ChatGPT - and the one that OpenAI picked for the demo – bore an uncanny resemblance to the voice of Scarlett Johansson, not least among them Johansson herself.

At the weekend, OpenAI put out a post that specifically denied having copied Johansson's voice. Indeed, it said, "AI voices should not deliberately mimic a celebrity's distinctive voice".

OpenAI then announced that it has chosen to "pause" Sky (rest assured that Breeze, Cove, Ember and Juniper are still available.) The company said:

"We've heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them."

It appears that some of those questions came from Johansson, who set out her version of events on Monday night, confirming that OpenAI CEO Sam Altman asked her to voice GPT-4o last September. She said:

"He [Altman] told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and Al. He said he felt that my voice would be comforting to people."

Johansson declined the offer. Two days prior to last week's demo, Altman asked her to reconsider but before they could connect, "the system was out there." Johansson said Sky's voice was "so eerily similar to mine that close friends and news outlets could not tell the difference".

There's already a lot to unpack here.

Ignore the answers you don't like

Firstly, it appears that OpenAI asked a woman if it could use her voice. She said no. OpenAI ignored her and went ahead anyway. This fact alone speaks volumes about the attitude of decision makers at OpenAI towards women.

OpenAI's decision to lead with Sky as the voice for the GPT-4o launch also speaks of AI developers' attitudes to paying creatives fairly and properly when their work, image, voice or other unique attribute is used to train models. Or rather, not paying them.

Also troubling is the fact that OpenAI's version of events doesn't quite match Johansson's. Somebody isn't being wholly transparent.

In a later statement. Altman claimed that:

We cast the actor behind Sky's voice before any outreach to Ms. Johansson.."

In which case we're being asked to believe that OpenAI noticed the similarity of voices and went to Johansson after the fact. Quite the coincidence, especially given that Altman tweeted, on the day of the launch, an obvious reference to a film that Johansson starred in. A film about the relationship between AI and humanity. Perhaps that was another coincidence.

It beggars belief that OpenAI thought it would get away with all of this. Did the company think Johansson wouldn't notice? Or is it that people like Altman feel legally untouchable?

It also begs the question, if OpenAI hasn't been wholly transparent about this, what else hasn't it been wholly transparent about? Consider this latest legal debacle alongside the fact that Sam Altman leads a company that is trying to build an artificial general intelligence (AGI). Consider also the resignations of Ilya Sutskever and Jan Leike from the team responsible for ensuring the governance and safety of that potential AGI.

It might take more than a woman's tones to soothe away concerns about how that's likely to pan out with the current crop of tech bros at the helm.

Gendered AI

Of course, this is all before we reach the troubled question of gendered AI assistants.

Much female eyerolling accompanied the GPT-4o launch and its subsequent coverage. "Is it me, or does GPT-4o sound hot?" was the general vibe, one that OpenAI seems to have done plenty to encourage. Reporters, commentators and social media users all noted the "warm" and "flirty" tones of the chatbot.

The fact that Altman thought a female voice would "comfort" those uncomfortable with AI-driven voice assistants is a product of all sorts of cultural biases. It's also not new. People (mainly women) have been complaining about the biases reflected and perpetuated by female "presenting" voice assistants for as long as voice assistants have been around (there was one in the seventies called Harpy, which certainly sets a tone) and the companies behind them do now provide more masculine alternatives. But the default option typically remains female.

That women's voices are somehow easier on the ear also seems to have been broadly accepted. But how different is that really from accepting that it's ok for young, attractive women to sell technology or front customer service because they're easy on the eye? That still goes on, of course, but it does at least raise eyebrows nowadays.

Academic, author and speaker Dr Kate Devlin shared her thoughts with Computing as to why the female voice is still the digital default.

"There are a number of supposed scientific explanations about why a female voice is the better choice for a conversational interface, but these don't stand up to much scrutiny and, on examining the evidence, it's clear that the main reason is that the men who design and build these systems simply prefer a woman's voice in the subservient role.

"The new LLM voices are yet more of the same. Of course, this time we're being told that there are male voices too, and that we don't have to stick with the female one, but why make it flirtatious and coquettish at all? It's manipulative marketing."

Why indeed? If trust is the issue, as Altman's appeal to Johansson implied, those creating chatbots would be well advised to voice them as a British male with received pronunciation (although the idea that posh men know what they're doing has been tested to destruction in recent years), or perhaps a woman in her sixties with a Yorkshire or Edinburgh accent - both poll at the top of those considered reassuring and kind. But they don't.

Image
Figure image
Description
Why doesn't AI sound like this?

Reductive stereotypes

If OpenAI's handing of this proves anything, it's that there were too few women involved in the decision making. Despite the considerable efforts of lots of people and organisations, there are nowhere near enough women developing AI, or shaping technology more broadly.

A big part of the reason for the lack of women embarking on careers in technology, is the stereotypes we all absorb from childhood. Girls subconsciously rule themselves out of this line of work because they simply can't envision themselves in it. Nobody sets out to make this happen, but a multitude of cultural, familial and societal factors means it happens anyway.

This is why the actions of people like Sam Altman matter.

Image
Figure image
Description
Sue Turner, OBE

Sue Turner OBE, AI governance and ethics specialist, explains why:

"The issue goes beyond annoying some people. When AI voices consistently portray women as obsequious and flirty, it reinforces harmful stereotypes.

"Some of us are working hard to attract young women into the world of data and AI. If they are interacting with AI bots that are so limited in how they portray women, that's not going to help the next generation of women in AI feel powerful in this sector. Worse still, if women's voices are always overly eager to please or suggestive, people can stop noticing that this is not normal behaviour.

"We need people to notice when AI is badly set up so that we - humans - can intervene and improve it."

TechSheCan is a charity, working with industry, government and schools to change the persistently low ratio of women to men working in tech. Dr Claire Thorne, Co-CEO, explained why what some people still perceive as minor decisions can be so damaging.

"Decisions about how ‘flirty' or ‘warm' and what gender an assistant's voice are, are not simply minor design features; the creators have a responsibility to get it right because when ubiquitous tech has an upgrade, the consequences are instant and global. This relatively small number of organisations have disproportionate power to define and perpetuate (or quash) harmful stereotypes.

"We've all seen the way humanoids have been positioned as predominantly white, ‘female' and subservient to men in films and the media for decades. Is the same now happening - by stealth - with AI-enabled assistants?

"Tech She Can exists because only one in four workers in tech are women, which means the world being created doesn't work for everyone. Our recent research with Templeton shows that harmful gender stereotypes of STEM careers start even earlier than we suspected: before primary age. Meanwhile, we're racing to keep up with demand from schools for inclusive, industry-relevant tech careers education featuring relatable women role models and the urgent need from our industry partners for more women in their tech workforce, now."

Image
Figure image
Description
Dr. Claire Thorne. Co-CEO TechSheCan

"The release of Sky feels like a huge setback. You might think of role models as people we can see. But actually what we hear matters too, perhaps more so when it's multiple times a day. What signal does Sky send to a young girl who hears this in her pocket, in her school, in her home, again and again? It can undo years of work we've done to dismantle harmful gender stereotypes in tech.

"It's not surprising but it should be a wake-up call, because this is not just about a voice. It very publicly illustrates the bias already being baked into tech. We need to take a step back and question who's designing, developing, testing, deploying and regulating these tools? And, crucially, who isn't? Who's involved (and who's not) at every level of the organisation?

"Tech's image problem and talent problem go hand-in-hand. We need inclusive cultures, diverse teams, and accountability. Representation and language (the words, semantics, tone and intent) matters - to create a world that works for all.

"You often hear about ‘responsible AI' and 'safe AI' but, in fact, neither is possible without ‘inclusive AI'. And inclusion is a choice."

Whether this latest PR disaster will have any real impact on the fortunes of OpenAI remains to be seen. Early indications are that it won't – the launch saw a big spike in revenue for the paying tiers of the ChatGPT mobile app.

Nonetheless, if OpenAI and those racing to develop competitive technology want to ward off the next PR calamity they would do well to consider this: If they value women's voices so highly then maybe they should start listening to them.

You may also like

ChatGPT maker OpenAI could lose $5bn in 2024, report
/news/4340185/chatgpt-maker-openai-lose-usd5bn-2024-report

Finance

ChatGPT maker OpenAI could lose $5bn in 2024, report

Another round of funding may be needed to keep it afloat

Meta's 'pay or consent' model likely to be on borrowed time in EU
/news/4339147/metas-pay-consent-model-borrowed-eu

Artificial Intelligence

Is this why release of newest LLM is confined to the US?

Health and safety gone rad: Fogsphere brings AI chops to workplace monitoring
/feature/4338679/heath-safety-gone-rad-fogsphere-brings-ai-chops-workplace-monitoring

Artificial Intelligence

Health and safety gone rad: Fogsphere brings AI chops to workplace monitoring

London-based startup Fogsphere uses standard 4K security cameras - the type that can be purchased for £70 - to monitor workplaces and create immediate alerts as soon as dangerous conditions are spotted.