- within Technology topic(s)
- with Inhouse Counsel
- in United States
- with readers working within the Banking & Credit industries
Remember that time you asked ChatGPT to review your company's litigation strategy? Or when your HR team used it to draft termination letters? Those conversations, along with 20 million others, just became fair game in a lawsuit that your company has nothing to do with.
Earlier this month, US Magistrate Judge Ona Wang of the Southern District of New York ordered OpenAI to hand over 20 million ChatGPT conversations to lawyers representing The New York Times and other publishers seeking these conversations to support copyright infringement claims. The magistrate judge dismissed OpenAI's privacy concerns, apparently convinced that "anonymization" is the solution. While the court indicated it was attempting to balance discovery needs with privacy protection, its order reveals how challenging it is to apply traditional legal frameworks to AI-generated data.
The Unprecedented Order
The magistrate judge ordered OpenAI to turn over a sample of 20 million chat logs as part of the sprawling multidistrict litigation where publishers are suing AI companies. The users behind these 20 million conversations weren't asked or notified. They have no say in the matter.
OpenAI has sought reconsideration, arguing in part:
As OpenAI repeatedly argued before this Court, neither common sense nor the Federal Rules justify the forced production of a massive trove of irrelevant personal user conversations. ... The Court overruled these important concerns in an order that does not discuss relevance nor proportionality, but nonetheless directs OpenAI to produce millions of conversations belonging to individuals who have no role, voice, or stake in these proceedings.
The Anonymization Challenge
Unlike traditional datasets, ChatGPT conversations contain uniquely personal information that's extraordinarily difficult to truly anonymize.
As reported in Techdirt, research underscores this challenge. Researchers downloaded and analyzed 1,000 of the leaked conversations, spanning over 43 million words. Among them, they discovered multiple chats that explicitly mentioned personally identifiable information (PII), such as full names, addresses, and ID numbers.
The Washington Post also illustrated the seriousness of the issue in its analysis of 47,000 ChatGPT chat logs that were accidentally revealed through ChatGPT's "share" feature. The logs revealed reams of personal information, including email addresses, phone numbers, and intimate personal details that could be used to identify individuals even with names removed.
Why This Matters for Your Organization
When people interact with AI, they often share information they'd never put in an email or search engine — from mental health concerns to confidential business strategies. Anyone in the world who has used ChatGPT since its launch must now face the possibility that their personal conversations will be handed over to a legion of lawyers and their clients in a copyright action in which they have no stake or interest.
The practical implications are sobering:
- Every AI interaction is potentially discoverable evidence.
- "Anonymization" offers limited protection for conversational data.
- Your organization's AI policies likely need immediate updating.
As we explored in a previous post about the challenges of AI legal holds, the intersection of AI and discovery obligations creates unprecedented complexity. This ruling amplifies those concerns exponentially.
The Path Forward
If upheld, the New York court's ruling represents a troubling precedent: it suggests that anyone who files a lawsuit against an AI company can demand production of tens of millions of conversations without first narrowing for relevance.
For in-house counsel, the message is clear: treat every AI interaction as if it could become public. Audit current AI usage, review and update your policies as needed, train your teams, and consider implementing AI tools with stronger privacy protections.
The courts are grappling with how to apply 20th-century discovery rules to 21st-century technology. Until the law catches up, the burden falls on organizations to protect themselves.
"Whether or not the parties had reached agreement to produce the 20 million Consumer ChatGPT Logs in whole—which the parties vehemently dispute—such production here is appropriate. OpenAI has failed to explain how its consumers' privacy rights are not adequately protected by: (1) the existing protective order in this multidistrict litigation or (2) OpenAI's exhaustive de-identification of all of the 20 million Consumer ChatGPT Logs." ... U.S. Magistrate Judge Ona Wang
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.