House of Lords Committee slams government inaction on LLM copyright misuse

'Government's reticence to take meaningful action amounts to a de facto endorsement of tech firms' practices,' it says

House of Lords Committee slams government inaction on LLM copyright misuse

Image:
House of Lords Committee slams government inaction on LLM copyright misuse

On Thursday, the House of Lords Select Committee on Communications and Digital sent a letter to science and technology minister Michelle Donelan, criticising the government's "failed series of roundtables" on the issue.

"The issues with copyright are manifesting right now and problematic business models are fast becoming entrenched and normalised. It is worth exploring whether these trends suggest a few larger publishers will obtain some licensing deals while a longer tail of smaller outlets lose out," Baroness Stowell, chair of the committee, wrote in the letter.

"Government's reticence to take meaningful action amounts to a de facto endorsement of tech firms' practices. That reflects poorly on this Government's commitment to British businesses, fair play and the equal application of the law," she added.

The dispute stems from the Committee's in-depth inquiry into LLMs and generative AI, published in February.

While the report acknowledged the potential of AI, it raised serious concerns about copyright infringement by companies training these models.

The Committee found evidence that AI vendors were scraping the web – essentially copying vast amounts of content – to train their systems, potentially violating copyright laws and harming rights-holding businesses.

The report also raised concerns about the potential for market dominance by a small number of tech giants, urging policymakers to prioritise open competition and transparency to prevent stifling innovation.

"The government has a duty to act. It cannot sit on its hands for the next decade until sufficient case law has emerged," the February report said.

In April, the UK Government responded to the Committee's findings, suggesting that the issue was complex and required further legal clarification.

"The basic position under copyright law is that making copies of protected material will infringe copyright unless it is licenced, or an exception applies. However, this is a complex and challenging area, and the interpretation of copyright law and its application to AI models is disputed; both in the UK and internationally," Michelle Donelan said in its letter to the Committee chair.

The government's response also pointed to ongoing legal cases in the US as one reason for their cautious approach.

However, in her last week's letter, Baroness Stowell argued that the government could still take action without prejudicing ongoing legal cases by expressing its commitment to upholding copyright principles in the digital age.

The letter highlights the dissonance between the government's high-profile initiatives, like the recently announced £400 million AI Safety Institute, and its lack of action on copyright concerns.

Concerns around copyrighted material being used to train LLMs, like text, audio and images, have been present since the launch of popular models like ChatGPT in November 2022.

In December, the New York Times sued OpenAI and Microsoft, alleging copyright infringement and IP abuse over AI training materials.

The NYT claimed "millions" of its articles were used in the training of ChatGPT, accusing OpenAI of "massive copyright infringement, commercial exploitation, and misappropriation" of its intellectual property.

This week, eight American newspapers joined the NYT in suing OpenAI and Microsoft.

OpenAI defended its practices in a submission to the House of Lords communications and digital select committee, pointing out that without access to copyrighted materials, it would be impossible to develop large language models like GPT-4.

"Because copyright today covers virtually every sort of human expression... it would be impossible to train today's leading AI models without using copyrighted materials," OpenAI said.

It argued that limiting training data to out-of-copyright works would lead to AI systems that could not meet the needs of contemporary society.