In a significant development for AI developers, on May 24, U.S. District Judge William Alsup has ruled in Bartz v. Anthropic PBC that training AI models on copyrighted books may qualify as fair use under U.S. copyright law—if done for a transformative purpose. The ruling provides a measure of legal clarity for developers and data processors alike, though it draws a sharp boundary around the lawful acquisition and storage of training data.
Key Holding: AI Training is "Exceedingly Transformative"
Plaintiffs Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson alleged Anthropic unlawfully used pirated versions of their books to train its Claude AI model. Judge Alsup held that the act of training the model was not intended to replace the plaintiffs' creative works, but rather to produce outputs that are "new, different, and non-infringing." The court analogized AI training to a human writer studying prior works for inspiration—emphasizing that no substantial part of any plaintiff's book was reproduced verbatim by Claude in downstream applications.
The court thus ruled that the AI training process itself qualified as fair use. For AI developers, this decision affirms that the transformation and purpose of the training process—rather than the commercial nature of the model—are critical in the fair use analysis.
Storage of Pirated Books Remains Copyright Infringement
While training use may be protected, the court also found that Anthropic's centralized storage of more than 7 million allegedly pirated books was unlawful. According to Judge Alsup, storing entire copyrighted works in a way "unconnected to the direct purpose of training" constitutes infringement, regardless of whether those works were later used in a transformative manner. A trial is set for December 2025 to determine the damages, which could reach up to $150,000 per infringed work under the Copyright Act.
Implications for Developers, Distributors, and Telecom-Adjacent AI Platforms
This ruling has substantial implications for platform operators and service providers in telecommunications, cloud AI, and analytics spaces. AI developers should take care to:
- Ensure that training data is sourced lawfully, preferably under license or from public domain sources;
- Keep thorough documentation distinguishing datasets used strictly for training versus those retained for other business purposes;
- Establish rigorous data lifecycle protocols to avoid passive storage of infringing content;
- Consider the inclusion of indemnity clauses and warranties in vendor or licensing agreements involving third-party data.
For telecom providers who offer integrated AI services—such as customer analytics or intelligent routing—this decision underscores the need to vet third-party model suppliers and avoid exposure through indirect access to infringing data repositories.
Next Steps for Business:
While the court's fair use determination offers a favorable signal to the AI industry, the unresolved storage claim and potential appellate review leave uncertainty. All AI stakeholders should:
- Review internal data acquisition and storage practices;
- Audit compliance with copyright licensing obligations;
- Prepare for heightened scrutiny over how training datasets are acquired, stored, and shared.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.