Advertisement

If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.

ENJOY UNLIMITED ACCES TO C&EN

Publishing

Monitoring the papers that are fed to AI

New tool tracks which academic publishers’ content has been used to train large language models

by Dalmeet Singh Chawla, special to C&EN
October 23, 2024 | A version of this story appeared in Volume 102, Issue 34

A collage evoking the feeding of academic papers into a large language model.
Credit: Madeline Monroe/Shutterstock/C&EN

A new tool is tracking which scholarly publishers are signing deals that allow tech firms to train large language models on academic papers and data.

The deals have drawn attention from scientists and bodies representing scholars, who argue that researchers should be compensated if their work is used to train artificial intelligence and that they should also be able to opt out of having their work used in this way.

In addition to compensation, other concerns are the accuracy of generative AI tools and whether the work will be provided with appropriate context.

The tracker, launched on Oct. 15 by Ithaka S+R, a nonprofit that serves the academic and cultural communities, so far lists seven deals between academic publishers and tech firms. The agreements were or will be signed by Cambridge University Press, Oxford University Press, Taylor & Francis, Sage Publications, and John Wiley & Sons, according to the tracker.

“These deals are not just happening as a one-off thing—there’s a pattern here,” says Roger Schonfeld, vice president of libraries, scholarly communications, and museums at Ithaka S+R. “Scholarly publishers are interested in this market. There’s strategic engagement, and there will be more of it going forward.”

For now, the tool’s creators are keeping tabs on such deals and manually adding them to the tool. Schonfeld says some deals have not been made public and thus aren’t in the tracker. The team is looking for ways to communicate them while maintaining confidentiality, he says.

A spokesperson for the American Chemical Society, which publishes C&EN, says ACS hasn’t yet entered any licensing agreements to train AI tools. “Such a decision involves a number of strategic, technical, business, and legal considerations,” the spokesperson says.

While most of the scholarly publishers struck their deals without informing authors, Cambridge University Press has started consulting authors who publish in its journals on whether they would allow AI models to be trained on their work. “I think this is a response to some of that reaction we’ve seen by authors in the market,” says Maya Dayan, program manager for strategic research and market analysis at Ithaka S+R.

Dayan says the tracker will note when publishers let authors opt out of participation and when they are remunerated for allowing use. “The details of these deals can be a little hard to find,” she says.

A spokesperson for Wiley, which has three deals listed in the tracker, says the firm has “thoughtfully considered” licensing journal content for AI training.

“This includes which content to include for specific use cases, clearly defined limited grants of rights, protections against misuse and copyright infringement, compensation in accordance with contractual terms, and advocating for attribution, where possible and appropriate,” the Wiley spokesperson says.

“With these appropriate guardrails in place, we believe that AI has the potential to transform knowledge-based industries and that it is in the public interest for these emerging technologies to be trained on high-quality, reliable information,” the spokesperson says.

Article:

This article has been sent to the following recipient:

0 /1 FREE ARTICLES LEFT THIS MONTH Remaining
Chemistry matters. Join us to get the news you need.