OpenAI Loses Privacy Gambit: 20 Million ChatGPT Logs Likely Headed To Copyright Plaintiffs

Article Insights

Andrew R. Lee’s articles from Jones Walker are most popular:

in United States
with readers working within the Environment & Waste Management, Telecomms and Utilities industries

Jones Walker are most popular:

within Antitrust/Competition Law and Law Practice Management topic(s)

When OpenAI proposed producing 20 million anonymized ChatGPT logs in the sprawling AI copyright litigation against it, the generative AI behemoth likely assumed it could control what those logs would reveal. That assumption appears to have backfired. This week (January 5, 2026), US District Judge Sidney Stein affirmed a magistrate judge's order compelling OpenAI to produce the entire 20 million-log sample, not just the cherry-picked conversations implicating plaintiffs' works that OpenAI wanted to hand over.

The ruling marks a significant discovery victory for the news organizations and authors suing OpenAI, and it offers a window into how courts treat user-privacy claims when AI companies face copyright liability.

The Play That Didn't Work

The discovery dispute arose inIn re: OpenAI, Inc. Copyright Infringement Litigation (MDL, SDNY), a consolidated action that combines 16 copyright lawsuits. Plaintiffs include The New York Times, the Chicago Tribune, and numerous authors whose works were allegedly used to train ChatGPT without permission.

As we have previously discussed here, these lawsuits pose a multi-billion-dollar question: Can AI developers train their models on copyrighted works without permission under fair use?

News plaintiffs initially requested 120 million ChatGPT logs from the tens of billions of OpenAI logs that it has preserved. OpenAI countered with 20 million — 0.5% of its logs — arguing that was "surely more than enough." The plaintiffs agreed. Then, OpenAI changed course in October 2025, proposing to run keyword searches and produce only conversations that implicated plaintiffs' specific works.

Magistrate Judge Ona T. Wang rejected that approach in November 2025. District Judge Sidney Stein has now affirmed her ruling in full.

Why "Unrelated" Logs Matter

OpenAI's central argument was that logs that did not contain plaintiffs' works were irrelevant and that producing them would unnecessarily invade the privacy of ChatGPT users. The court disagreed on both counts.

On relevance, Judge Wang found that even output logs without reproductions of plaintiffs' works are discoverable because they bear on OpenAI's fair use defense. Fair use analysis examines, among other factors, how the challenged use affects the market for the original works. Logs showing what ChatGPT produces across a broad range of queries (not just those involving plaintiffs' content) could reveal patterns relevant to whether ChatGPT's outputs compete with or substitute for copyrighted works.

On privacy, the court acknowledged that ChatGPT users have "sincere" privacy interests in their conversations. But Judge Stein found those interests adequately protected by three safeguards: reducing the sample from tens of billions to 20 million logs, OpenAI's de-identification process removing personally identifiable information, and the existing protective order governing discovery materials.

Critically, Judge Stein distinguished the situation from a securities case that OpenAI had relied on, in which wiretapped phone calls were at issue. ChatGPT users, unlike wiretap subjects, "voluntarily submitted their communications" to OpenAI. That distinction proved fatal to OpenAI's privacy objection.

Implications for AI Companies and In-House Counsel

For AI companies facing discovery, this decision confirms that user-privacy arguments will not automatically shield internal data from production use. Courts will weigh privacy interests against relevance and expect de-identification and protective order safeguards rather than wholesale withholding.

For in-house counsel, the ruling reinforces what we outlined in our AI legal hold survival guide: AI conversation logs are discoverable electronically stored information. OpenAI's preservation of tens of billions of logs made this dispute possible; the question was never whether the data existed, but how much would be produced.

What Happens Next

OpenAI must now produce 20 million de-identified ChatGPT logs to both news plaintiffs and class plaintiffs. Plaintiffs' experts will analyze these logs for evidence bearing on fair use, market harm, and the nature of ChatGPT's outputs.

This discovery could prove pivotal. If plaintiffs demonstrate that ChatGPT routinely generates outputs that compete with or substitute for copyrighted content — even when users aren't specifically requesting plaintiffs' works — OpenAI's fair use defense becomes considerably harder to sustain.

The multidistrict litigation remains a high-stakes test of how copyright law applies to generative AI. With 20 million data points now headed to the plaintiffs, the evidence base for answering that question just expanded dramatically.

Key Takeaways:

Voluntary submission limits privacy protection: Courts distinguish between users who voluntarily share information with AI systems and subjects of covert surveillance. This distinction will likely shape discovery disputes across AI litigation.

De-identification, along with protective orders, may not fully address privacy concerns:Companies cannot simply refuse to engage in discovery based on privacy concerns; they must propose and implement sensible safeguards that allow discovery to proceed while prioritizing privacy protection. But can they really? It entirely depends on what a user reveals in a chatbot conversation.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

[View Source]

OpenAI Loses Privacy Gambit: 20 Million ChatGPT Logs Likely Headed To Copyright Plaintiffs

Contributor

The Play That Didn't Work

Why "Unrelated" Logs Matter

Implications for AI Companies and In-House Counsel

What Happens Next

Technology

Contributor

United States