400 newspapers sue OpenAI, Microsoft over AI training data use

A coalition of nearly 400 local and regional newspapers sued OpenAI and Microsoft on Wednesday, alleging the companies systematically scraped copyrighted articles to train ChatGPT and Copilot without permission or payment.

A coalition of nearly 400 newspapers sued OpenAI and Microsoft for systematically scraping copyrighted articles to train ChatGPT and Microsoft Copilot, alleging the practice threatens the survival of local journalism.

"The publishers' journalism was essential to the defendants' explosive growth, and unless defendants are held accountable for stealing, stripping and misusing the publishers' content, the AI boom will be a death knell for local journalism," the publishers said in their complaint filed in the US District Court for the Southern District of New York.

The lawsuit, led by Long Island-based Richner Communications and represented by former New Jersey Attorney General Matthew Platkin, accuses OpenAI and Microsoft of copying hundreds of thousands of articles — including content behind paywalls — onto their servers, stripping copyright management information such as author credits and publication names, and using the material to train large language models that reproduce the work in response to user prompts.

The case adds to a growing wave of copyright litigation against AI developers that includes lawsuits by the New York Times, CNN, Reddit, and Merriam-Webster against Perplexity AI, as well as a separate suit by Encyclopedia Britannica and Merriam-Webster against OpenAI. OpenAI, which reported an $852 billion valuation after a $122 billion fundraising round in March, said its models are "grounded in fair use."

The Scale of the Allegations

The publishers claim the defendants "systematically and secretly crawled" their websites, copying original works onto their own servers while simultaneously stripping out copyright management information. The complaint seeks statutory damages and injunctive relief for copyright infringement and violations of the Digital Millennium Copyright Act.

Microsoft, which made an initial $1 billion investment in OpenAI in 2019, is described in the complaint as "an indispensable partner in virtually every aspect of OpenAI's commercial enterprise." The publishers argue that the generative AI products built using their content have generated billions of dollars in market value for the defendants, with not "a cent of it" going to the content creators.

What's at Stake for the AI Industry

This lawsuit represents the largest legal effort led by local and regional newspapers in the fight over AI training data. Previous litigation, including the New York Times' suit against OpenAI, has focused on national outlets, leaving local journalism largely absent from the conversation.

"Local news is a trusted news source for the vast majority of Americans," Platkin said in an interview. "It's the lifeblood of our democracy, and this business model has really put local news at risk of extinction."

The outcome could reshape how AI companies source training data. If courts reject OpenAI's fair use defense, the industry may face billions of dollars in retroactive licensing costs and be forced to negotiate content deals with thousands of publishers. OpenAI has already struck licensing agreements with some outlets including the Associated Press and Axel Springer, but the coalition of local publishers argues those deals don't cover the vast majority of newsrooms.

OpenAI spokesperson Drew Pusateri said in a statement that the company's "models empower innovation, are trained on publicly available data, and are grounded in fair use." Microsoft didn't immediately respond to a request for comment.

The case is Richner Commc'ns, Inc. v. Microsoft Corp., No. 1:26-cv-05320, in the Southern District of New York.

This article is for informational purposes only and does not constitute investment advice.