OpenAI: Creation of AI tools 'impossible' without copyrighted material

As analysis shows cloud giants are failing to protect customers from IP claims

clock • 3 min read
OpenAI: Creation of AI tools 'impossible' without copyrighted material

OpenAI has said it would be "impossible" to develop AI tools like its chatbot, ChatGPT, without access to copyrighted material.

Several AI firms are currently facing lawsuits over the content used to train their products.

Chatbots and image generators, such as ChatGPT and Stable Diffusion, rely on vast datasets sourced from the internet, much of which falls under copyright protection.

The New York Times recently filed a lawsuit against OpenAI and its investor Microsoft, accusing them of "unlawful use" of its work in creating their AI products.

The NYT claimed "millions" of its articles were used in the training of ChatGPT, accusing OpenAI of "massive copyright infringement, commercial exploitation, and misappropriation" of its intellectual property.

The newspaper further argued that the AI tool now competes with it as an information source.

OpenAI defended its practices in a submission [pdf] to the House of Lords communications and digital select committee, pointing out that without access to copyrighted materials, it would be impossible to develop large language models like GPT-4.

"Because copyright today covers virtually every sort of human expression... it would be impossible to train today's leading AI models without using copyrighted materials," stated OpenAI in its submission.

The organisation argued that limiting training data to out-of-copyright works would lead to AI systems that could not meet the needs of contemporary society.

The defence presented by AI companies, including OpenAI, often hinges on the legal doctrine of "fair use," allowing the use of copyrighted content in specific circumstances without obtaining the owner's permission.

OpenAI reiterated in its submission that it believes "legally, copyright law does not forbid training."

A new era for copyright law

The New York Times lawsuit is not the only legal challenge launched against OpenAI and its competitors.

Last year, the company faced a federal class-action lawsuit in California, accusing the company of unlawfully using personal data for training purposes. The lawsuit cited multiple violations, including breaches of the US Computer Fraud and Abuse Act and the Electronic Communications Privacy Act.

Getty Images is suing Stability AI, the creator of Stable Diffusion, for alleged copyright breaches.

Responding to concerns about AI safety, OpenAI expressed support for independent analysis of its security measures. The organisation advocates for "red-teaming," where third-party researchers assess the safety of AI products by simulating the behaviour of rogue actors.

Cloud giants failing to protect AI customers

While cloud giants such as Amazon, Microsoft and Google are eager to promote their new AI tools, they are leaving their business customers exposed to the risk of copyright lawsuits, a new report by The Financial Times has warned.

It says that while the big three cloud companies boast of defending customers from IP claims, analysis of their indemnity clauses shows that these protections only apply to the use of AI models that were developed by or with the oversight of Google, Amazon and Microsoft.

This means that businesses that use AI models developed by other companies are not protected from copyright lawsuits.

So, if you're using an AI tool from, say, Anthropic (backed, but not developed, by Amazon and Google), a copyright lawsuit could land at your doorstep, even though you're using it on their platform.

This selective protection has businesses wary.

According to the FT, Amazon only extends coverage to content generated by its proprietary models, such as Titan, and various AI applications it has developed. Likewise, Microsoft offers protection exclusively for tools operating on its internal models and those created by OpenAI.

Despite the limited protection, there are silver linings for users. Legal experts believe claims might be difficult to win.

A recent US court case dismissed part of a lawsuit against AI companies, highlighting the "problem" of proving every generated image relies on copyrighted material.

While generative AI technology holds immense potential, the unprecedented claims in copyright law require caution. Businesses that are considering using AI tools should carefully review the terms of service and indemnity clauses before making a decision.

You may also like
'Cybersecurity is a team sport, but it could do with a glow up'

Careers and Skills

Lacework and AWS challenge outdated perceptions of cybersecurity and attract new talent

clock 27 February 2024 • 5 min read
How DevOps and AI can work together


Sometimes two separate things can combine to make something that's even better.

clock 26 February 2024 • 3 min read
Asian Tech Roundup: Australia passes right to disconnect


Plus, Beijing leans on domestic tech industry

clock 23 February 2024 • 4 min read
Most read

Cyber incident disrupts another UK university

25 February 2024 • 2 min read

LockBit re-emerges a week after takedown

26 February 2024 • 2 min read

Inside KKR's $3.8bn offer for VMware EUC

27 February 2024 • 5 min read

Sign up to our newsletter

The best news, stories, features and photos from the day in one perfectly formed email.

More on Big Data and Analytics

Google to use Reddit posts for training AI models

Google to use Reddit posts for training AI models

Reddit will get access to Vertex AI as part of the deal as it heads for IPO

clock 23 February 2024 • 2 min read
Sora: OpenAI unveils text-to-video AI tool

Sora: OpenAI unveils text-to-video AI tool

Access is currently limited to researchers and content curators

clock 19 February 2024 • 3 min read
'Insurers are the original big data companies,' says AXA UK data chief Paul Hollands

'Insurers are the original big data companies,' says AXA UK data chief Paul Hollands

GenAI has created a step change in data perception

Penny Horwood
clock 12 February 2024 • 7 min read