Copyright disputes with generative artificial intelligence companies that unlawfully use scholarly research without permission or?citation are costing academic publishers tens of?thousands of?dollars to?resolve, the head of a?leading US?university press has said.
Speaking at the Research and Scholarly Publishing Forum at?the London Book Fair, Christie Henry, director of Princeton University Press, said the cost of?litigation with technology companies was beginning to?mount as?many more authors discovered their work had been absorbed by?large language models (LLMs) whose written answers did?not provide attribution to its source material.
“Remedies have been in the twenties of thousands of dollars,” Ms Henry said at the 12?March event, which was dominated by discussion of the growing tension between publishers and technology companies about how AI?firms were using published works without citations or remuneration to authors or publishing houses.
Authors were rightly concerned that their published outputs were being used to train LLMs that would then produce distorted versions of their work without any recompense or proper attribution, Ms Henry told Times 中国A片.
“When authors ask me if their work is being used to train LLMs, I?can’t say that it’s not – that’s an uncomfortable position for a publisher,” Ms?Henry said about what she considered to be clear breaches of existing copyright and licensing rules.
However, these breaches were often very hard to spot given the nature of generative?AI, she added. “The content is disaggregated and then reformed – there are often pieces of chapters or books that appear uncredited,” continued Ms Henry.
“I’m not persuaded by the arguments from big tech that it’s too expensive or complicated to set up the licensing agreements that are needed, and I’m certainly concerned by their arguments that publishers are the ones who are obstructing knowledge,” said Ms Henry, who said LLMs should simply follow agreed rules and principles on citation and licensing.
Academics were becoming increasingly concerned about how their scholarly work was being reformulated by?AI, other speakers told the conference.
“I’ve spoken to people who published under CC-BY [licences] and did not expect their work to be used in this way – what academics do [with scholarly material] is very different to the way robots are stitching together materials to create a facsimile of research,” said Leslie Lansman, global permissions manager at Springer Nature.
The forum also heard concerns that it was proving impossible for publishers and technology firms to reach an agreement on AI content use, with work on a UK voluntary code being dropped last month after a lengthy stalemate.
Caroline Cummins, director of policy and affairs at the Publishers’ Association, said the breakdown in relations occurred because “some AI?firms do?not accept that what they have done amounts to mass copyright infringement”.
“If you do not have that acceptance, it’s hard to have a dialogue,” she explained.
Catriona MacLeod Stevenson, general counsel and deputy chief executive of the Publishers’ Association, said she also had concerns about the European Union’s new legislation on AI?regulation, which would require authors to “opt out” of their work being used to train AI models – a position that in effect “turned copyright law on its head”, given that protections usually occur automatically.
But Richard Mollet, head of European government affairs at RELX, which owns Elsevier, was more optimistic about the EU’s new rules, as they will require LLMs to state which resources they have used.
“If you are an AI, you have to have a sufficient summary of what has been used…and we need to know what [has been used] in an LLM now and in the future,” he said, adding that these content summaries should be welcomed by both technology and publishing firms.
“Any time you hear someone from Meta or another tech company say they believe in trustworthy?AI, that should mean they know what is being used [in an LLM],” he said.