OpenAI asked a judge to throw out parts of the New York Times” lawsuit against him, alleging that the media company “paid someone to hack OpenAI products,” such as ChatGPT, to generate 100 examples of copyright infringement for its case.
In a filing Monday in Manhattan federal court, OpenAI supposed the Times said it took “tens of thousands of trials to generate the highly anomalous results” and that the company did so using “deceptive prompts that flagrantly violate OpenAI’s terms of use.”
“Normal people do not use OpenAI products in this way,” OpenAI writes in the documentation.
The “hacking” that OpenAI claims in the filing can also be called rapid engineering or “red teaming,” a common way for AI trust and safety teams, ethicists, academics and tech companies to “stress test” AI systems for vulnerabilities . This is a common practice in the AI industry and a popular way to alert companies to problems in their systems, similar to how cybersecurity professionals test company websites for weaknesses.
“In this filing, OpenAI does not dispute—nor can it—that they copied millions of The Times works to build and power their commercial products without our permission,” Ian Crosby, a partner at Susman Godfrey and lead adviser to the Times, said in a statement on CNBC.
He added: “What OpenAI strangely mischaracterizes as ‘hacking’ is simply the use of OpenAI products to look for evidence that they have stolen and reproduced The Times’s copyrighted works.” And that’s exactly what we found. In fact, the scale of OpenAI’s copying is far greater than the 100 examples cited in the complaint.”
The filing comes as a broader battle rages between OpenAI and publishers, authors and artists over the use of copyrighted material for AI training data, including the high-profile Times lawsuit that some see as a watershed moment for the industry. The news outlet’s lawsuit, filed in December, seeks to have it upheld Microsoft and OpenAI responsible for billions of dollars in damages.
OpenAI has said in the past that it is “impossible” to train top AI models without copyrighted works.
“Because copyright today covers almost every kind of human expression—including blog posts, photos, forum posts, snippets of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted right materials,” OpenAI wrote in a filing last month in the United Kingdom, in response to an inquiry by the United Kingdom’s House of Lords.
“Limiting training data to books and blueprints in the public domain created over a century ago may make for interesting experimentation, but will not provide AI systems that meet the needs of today’s citizens,” OpenAI continues in the documentation.
As recently as last month in Davos, Switzerland, OpenAI CEO Sam Altman said he was “surprised” by the Times’ lawsuit, saying OpenAI models should not be trained on the publisher’s data.
“We don’t really need to train on their data,” Altman said at an event hosted by Bloomberg in Davos. “I think that’s something that people don’t understand. Any particular source of learning, it doesn’t move the needle for us that much.”
While a single publisher may not change ChatGPT’s operational capabilities, OpenAI’s documentation suggests that the decision of many publishers to opt out may have an effect. In recent months, the company has begun courting publishers to allow the content to be used for training data.
The company already has deals with Axel Springer, the German media conglomerate that owns Business Insider, Morning Brew and other publications, and is also it is reported in talks with CNN, Fox Corp. and Time for licensing their work.
“We expect our ongoing negotiations with others to lead to additional partnerships soon,” OpenAI wrote in the filing.
In its filing and blog posts, OpenAI highlighted its opt-out process for publishers, which allows outlets to deny the company’s web crawler access to their websites. But in the documentation, OpenAI says that content is vital to training today’s AI models.
“While we look forward to continuing to develop additional mechanisms to enable rights holders to opt out, we are actively engaging with them to find mutually beneficial arrangements to gain access to material that is otherwise unavailable , as well as to display content in ways that go beyond what copyright law otherwise allows,” the company wrote.
— CNBC’s Ryan Brown contributed to this report.
Don’t miss these stories from CNBC PRO:
https://www.cnbc.com/2024/02/27/openai-alleges-new-york-times-hacked-chatgpt-for-lawsuit-evidence.html