OpenAI: Creation of AI tools 'impossible' without copyrighted material

Several AI firms are currently facing lawsuits over the content used to train their products.

Chatbots and image generators, such as ChatGPT and Stable Diffusion, rely on vast datasets sourced from the internet, much of which falls under copyright protection.

The New York Times recently filed a lawsuit against OpenAI and its investor Microsoft, accusing them of "unlawful use" of its work in creating their AI products.

The NYT claimed "millions" of its articles were used in the training of ChatGPT, accusing OpenAI of "massive copyright infringement, commercial exploitation, and misappropriation" of its intellectual property.

The newspaper further argued that the AI tool now competes with it as an information source.

OpenAI defended its practices in a submission [pdf] to the House of Lords communications and digital select committee, pointing out that without access to copyrighted materials, it would be impossible to develop large language models like GPT-4.

"Because copyright today covers virtually every sort of human expression... it would be impossible to train today's leading AI models without using copyrighted materials," stated OpenAI in its submission.

The organisation argued that limiting training data to out-of-copyright works would lead to AI systems that could not meet the needs of contemporary society.

The defence presented by AI companies, including OpenAI, often hinges on the legal doctrine of "fair use," allowing the use of copyrighted content in specific circumstances without obtaining the owner's permission.

OpenAI reiterated in its submission that it believes "legally, copyright law does not forbid training."

A new era for copyright law

The New York Times lawsuit is not the only legal challenge launched against OpenAI and its competitors.

Last year, the company faced a federal class-action lawsuit in California, accusing the company of unlawfully using personal data for training purposes. The lawsuit cited multiple violations, including breaches of the US Computer Fraud and Abuse Act and the Electronic Communications Privacy Act.

Getty Images is suing Stability AI, the creator of Stable Diffusion, for alleged copyright breaches.

Responding to concerns about AI safety, OpenAI expressed support for independent analysis of its security measures. The organisation advocates for "red-teaming," where third-party researchers assess the safety of AI products by simulating the behaviour of rogue actors.

Cloud giants failing to protect AI customers

While cloud giants such as Amazon, Microsoft and Google are eager to promote their new AI tools, they are leaving their business customers exposed to the risk of copyright lawsuits, a new report by The Financial Times has warned.

It says that while the big three cloud companies boast of defending customers from IP claims, analysis of their indemnity clauses shows that these protections only apply to the use of AI models that were developed by or with the oversight of Google, Amazon and Microsoft.

This means that businesses that use AI models developed by other companies are not protected from copyright lawsuits.

So, if you're using an AI tool from, say, Anthropic (backed, but not developed, by Amazon and Google), a copyright lawsuit could land at your doorstep, even though you're using it on their platform.

This selective protection has businesses wary.

According to the FT, Amazon only extends coverage to content generated by its proprietary models, such as Titan, and various AI applications it has developed. Likewise, Microsoft offers protection exclusively for tools operating on its internal models and those created by OpenAI.

Despite the limited protection, there are silver linings for users. Legal experts believe claims might be difficult to win.

A recent US court case dismissed part of a lawsuit against AI companies, highlighting the "problem" of proving every generated image relies on copyrighted material.

While generative AI technology holds immense potential, the unprecedented claims in copyright law require caution. Businesses that are considering using AI tools should carefully review the terms of service and indemnity clauses before making a decision.