Court Finds AI Training a "Quintessentially Transformative" Fair Use, But Distinguishes Library of Pirated Works
In a significant development for the field of artificial intelligence and copyright law, the U.S. District Court for the Northern District of California has issued a ruling in a case brought by a group of authors against AI company Anthropic. In Bartz v. Anthropic, Judge William Alsup granted summary judgment in favor of Anthropic on the key question of whether using copyrighted books to train its large language model (LLM), Claude, constitutes fair use. The court deemed the training process "quintessentially transformative," offering a potential shield for AI companies facing similar copyright challenges. However, the court's decision was not a complete victory for the AI developer, as it found that Anthropic's separate act of creating a library of training material using pirated copies of books constituted copyright infringement.
Regarding the first fair use factor, the purpose and character of the use, Judge Alsup drew a compelling analogy, stating, "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different." This reasoning suggests that when copyrighted materials are used as part of a learning or training process to develop a new and original system, rather than to merely reproduce the original content, the use is transformative. The court also found that the digitization of print books that Anthropic had legally purchased was a fair use, viewing it as a permissible format change to create more convenient and searchable digital copies for its library.
Despite these findings for Anthropic, the court drew a firm line at the use of illicitly sourced materials for purposes other than immediate training. The plaintiffs alleged that Anthropic's training data was derived in part from "The Pile," an open-source dataset which includes "Books3," a collection described as a "trove of pirated books." Judge Alsup rejected Anthropic's argument that its creation of a permanent, general-purpose library from these pirated copies was also a fair use. The court determined that in this context, "piracy was the point: To build a central library that one could have paid for... but without paying for it." This part of the ruling underscores that the fair use defense has its limits and may not protect the wholesale acquisition and retention of infringing works, even if they might eventually be used for a transformative purpose.
The case will now proceed to a trial scheduled for December to determine the damages Anthropic must pay for the infringement related to its library of pirated material. This ruling is one of the first to substantively address the fair use doctrine in the context of generative AI training, and its distinction between transformative training and infringing library-building will be closely watched. While the case is ongoing, it provides guidance for both copyright holders and AI developers navigating the intersection of intellectual property and cutting-edge technology.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.