Unlock the Editor’s Digest for free
Roula Khalaf, Editor of the FT, selects her favourite stories in this weekly newsletter.
OpenAI has alleged The New York Times “intentionally manipulated” its chatbot to regurgitate whole lines from the newspaper’s articles, as it fights a copyright lawsuit from the newspaper that poses a threat to how it develops its technology.
The lawsuit, filed just after Christmas, was “without merit”, according to a blog post published by the artificial intelligence company on Monday, which added that the newspaper was not “telling the full story”.
In the lawsuit filed on December 27, the US media company accused the AI start-up and its chief backer Microsoft of taking a “free ride” by using millions of articles to build its chatbot technology, which is capable of responding in detail to natural language prompts.
Copyright is an increasingly fraught issue for AI companies such as OpenAI whose models work by ingesting huge amounts of data from across the internet. The suit, which is seeking billions of dollars in damages, claims that OpenAI has profited from the “exploitation and misappropriation of The Times’s intellectual property”.
That has been followed by proposed class action from a pair of non-fiction authors, who claim OpenAI infringed their copyright by training its large language model on their work. Notable fiction authors including John Grisham and Jodi Picoult previously filed a similar lawsuit.
In its blog, OpenAI claims to have first heard about the Times’ lawsuit from a news article published by the paper on December 27. Before that, it claims, it had been engaged in productive discussions with the media organisation about a partnership, and had explained that Times “content didn’t meaningfully contribute to the training of our existing models”.
In its copyright case, the Times claimed OpenAI’s chatbot had regurgitated whole excerpts of its articles — a phenomenon described by OpenAI as “inadvertent memorisation”, which the company has explicitly attempted to avoid.
The Times also called on OpenAI to destroy any training data and chatbot models that had used its copyrighted material.
The examples put forward by the Times are from old articles that have been published on a number of third-party sites, according to OpenAI. “It seems [the Times] intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate.”
“Our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts,” OpenAI wrote.
OpenAI and other AI companies have argued that processing reams of publicly available data from the internet constitutes protected “fair use” under US copyright law.
The brewing conflict comes as OpenAI seeks to strike a series of deals with other news organisations to license their content. In early December, the company reached a landmark agreement with German publisher Axel Springer, worth tens of millions of euros a year, which could provide a template for future tie-ups between publishers and AI companies.
“We regard The New York Times’ lawsuit to be without merit. Still, we are hopeful for a constructive partnership with The New York Times and respect its long history,” OpenAI wrote in Monday’s blog.
The New York Times did not immediately respond to a request for comment.
Read the full article here