US Federal Judges Back AI Training As Fair Use — But Questions Remain - Copyright

In consecutive decisions last week, two federal judges in California issued landmark rulings in favor of generative artificial intelligence (AI) developers, finding that their use of copyrighted books to train large language models (LLMs) can qualify as fair use. While the decisions in< em>Bartz et al. v. Anthropic PBC, Case No. 3:24-cv-05417, and Kadrey v. Meta Platforms Inc., No. 3:23-cv-03417, mark a significant turning point in the debate over how copyright law applies to AI training and LLMs, many questions remain undecided.

Fair use is an affirmative defense to copyright infringement that allows limited use of copyrighted material without permission in certain contexts, such as commentary, education or research. Courts evaluate four nonexclusive and flexible factors: (1) the purpose and character of the use (including whether it is transformative); (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used; and (4) the effect of the use on the market for the original.

In Anthropic, Judge William Alsup held that Anthropic's ingestion of over seven million pirated and scanned books — used to build training datasets for its Claude chatbot — was a transformative use. As such, it qualified as fair use. Alsup emphasized that training a model to generate language is fundamentally different from copying or reading a book, calling the purpose "quintessentially transformative." Even though Anthropic used full works, the court found this justified by nature of the training process and the absence of evidence of risk of market harm to the copyright holders, favoring fair use. The court concluded that "[i]f this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use" and therefore were protected as fair use. And while the plaintiff contended that such copying would result in a deluge of competing works, Alsup derided this argument as "no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works."

Just two days later, Judge Vince Chhabria reached a similar result in Meta's favor in the Kadrey v. Meta Platforms, Inc. case, allowing Meta's use of works by authors, including Sarah Silverman and Ta-Nehisi Coates, in training its LLaMA models. His analysis, however, diverged from Alsup's. While Chhabria agreed that Meta's use of copyrighted works for training was transformative, his finding of fair use was forced by the plaintiffs' failure to present meaningful evidence of market harm. Chhabria criticized the plaintiffs for presenting "half-hearted" arguments regarding the fourth fair use factor, adding that although his conclusion might be "in significant tension with reality," it was compelled by the plaintiffs' underdeveloped theories of market harm. Chhabria cautioned that his holding was narrow: "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one."

Chhabria refused to brush aside the potential market effects of "enabling the rapid generation of countless works that compete with the originals," prompting him to directly challenge Alsup's analogies. Chhabria wrote that comparing AI model training to teaching children to write was "inapt." He further warned that generative AI could dramatically undermine creative markets if left unchecked, which in turn would "dramatically undermine the incentive for human beings to create things the old-fashioned way." Chhabria also noted that his decision was not issued in a class action and would only affect the 13 authors who were plaintiffs in that case, stating that "in many [other] circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission."

Both Anthropic and Meta welcomed the rulings, with an Anthropic spokesperson stating, "We are pleased that the Court recognized that using 'works to train LLMs was transformative — spectacularly so'" and affirming that "Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different." A Meta spokesperson similarly stated: "Open-source AI models are powering transformative innovations, productivity and creativity for individuals and companies, and fair use of copyright material is a vital legal framework for building this transformative technology."

Questions remain over piracy claims

Despite the courts' fair use findings, neither case is over. In Anthropic, Alsup allowed separate "piracy" claims to proceed to trial, based on allegations that the company retained and distributed infringing works and enabled its chatbot to reproduce excerpts verbatim. In Meta, claims relating to Meta's alleged downloading of pirated books and removal of authorship information are still pending.

Key takeaways

These two rulings offer groundbreaking, though nuanced, judicial support for the idea that AI training using copyrighted works may be considered fair use — especially when the use is highly transformative and plaintiffs cannot demonstrate actual market harm. However, both courts acknowledged limits to the application of fair use to LLM training: Outputs that simply copy protected material remain an ongoing issue, and these courts disagreed on how to evaluate piracy-related actions and downstream risks, with Chhabria openly reluctant to issue a ruling of fair use.

As scores of similar cases continue to unfold across the country, the divergent tone between these two rulings underscores the legal uncertainty surrounding AI and copyright law. Appeals are likely, and eventual Supreme Court review seems almost certain. In the meantime, AI developers should continue to refine safeguards to prevent memorization and unauthorized reproduction of protected works. And content creators should closely follow these developments to understand how their works may be used — and what legal remedies may remain available. License agreements may be one way to protect against unauthorized training. Both sides have a stake in how the boundaries of fair use in the AI context are ultimately drawn.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

US Federal Judges Back AI Training As Fair Use — But Questions Remain

Contributor

Questions remain over piracy claims

Key takeaways

Intellectual Property

Contributor

United States