Second District Court Finds Fair Use In Evaluating Use Of Copyrighted Works For LLM Training

On June 25, 2025, the U.S. District Court for the Northern District of California issued an order in Kadrey v. Meta Platforms, Inc., granting the defendant's motion for summary judgment on its fair use defense to claims that using the plaintiffs' books for large language model (LLM) training is copyright infringement.

The court found that because the defendant's use of copyrighted works for LLM training is "highly transformative," the plaintiffs needed to win decisively on the fourth fair use factor. According to the court, however, the plaintiffs did not prevail on the fourth factor and presented no meaningful evidence on the court's market dilution theory.

Background

Defendant Meta Platforms (Meta) is the developer of a series of large language models (LLMs) named "Llama." An LLM is a particular type of generative AI model designed to understand and generate text. Thirteen authors sued Meta for downloading their books from online "shadow libraries" and using the books to train Meta's Llama models. The plaintiffs filed suit seeking to represent a class of all owners of copyrighted works used as training data for Llama.

The court's order characterized Meta's use of plaintiffs' books as follows:

Meta procured plaintiffs' books by downloading them from "shadow libraries." A shadow library is an online repository that provides things like books, academic journal articles, music, or films for free download.
To download large datasets, Meta torrented them. "Torrenting" is a filesharing technique that entails the simultaneous distribution of small portions of a larger file from many different sources.
Meta added the books it downloaded to the datasets it used to train the Llama models. Meta also post-trained its models to prevent them from "memorizing" and outputting certain text from their training data, including copyrighted material.

Fair Use Defense

The parties filed cross-motions for partial summary judgment, with the plaintiffs arguing that Meta's conduct is not fair use, and with Meta responding that its conduct must be considered fair use as a matter of law. Meta also moved for summary judgment regarding the plaintiffs' claim under the Digital Millennium Copyright Act (DMCA), which the court said it will grant in a separate order.

With respect to the plaintiffs' claim that Meta infringed their copyrights by distributing their works in connection to the torrenting, neither side moved for summary judgment, so it remains a live issue in the case.

The court evaluated Meta's use of the copyrighted works for training under each of the four fair use factors.

First Factor: Purpose and Character of the Use:

The court concluded that this factor favors a finding of fair use, finding that "[t]here is no serious question that Meta's use of the plaintiffs' books had a 'further purpose' and 'different character' than the books—that it was highly transformative." The court found that the purpose of Meta's copying was to train its LLMs, which are innovative tools that can be used to generate diverse text and perform a wide range of functions; by contrast, the purpose of the plaintiffs' books is to be read for entertainment or education.
The court rejected the plaintiffs' position that Meta's use is not transformative because Llama will output material that "mimics" the plaintiffs' work or writing styles. The court noted that style is not copyrightable, and that even using "adversarial" prompts designed to get Llama to regurgitate its training data, Llama will not produce more than 50 words of any of the plaintiffs' books.
Noting that while downloading the books was a different "use" from using them to train Llama, the court held that "downloading must still be considered in light of its ultimate, highly transformative purpose: training Llama." Specifically, "[b]ecause Meta's ultimate use of the plaintiffs' books was transformative, so too was Meta's downloading of those books."

Second Factor: Nature of the Copyrighted Work

The court found that this factor favored the plaintiffs, as their books are "highly expressive works." The court noted, however, that the second factor "has rarely played a significant role in the determination of a fair use dispute," and so having the second factor favor the plaintiffs "doesn't mean much for the analysis as a whole."
The court disagreed that Meta only used the plaintiffs' books to gain access to their "functional elements," not to capitalize on their creative expression. Although LLMs may only learn about "statistical relationships" from training data, the court found "those relationships are the product of creative expression."

Third Factor: Amount and Substantiality of the Portion Used

The court found that this factor favored fair use, despite Meta copying the plaintiffs' entire books. Noting that "feeding a whole book to an LLM does more to train it than would feeding it only half of that book," the court held that the amount Meta copied was reasonable given its relationship to Meta's transformative purpose.

Fourth Factor: Effect of the Use Upon the Market for or Value of the Copyrighted Work

The court discussed three ways a plaintiff might assert that using copyrighted works to train generative AI models harms the market for the copyrighted works: (1) claiming the model "regurgitates" the copyrighted works; (2) pointing to the market for licensing their works for AI training and contending that the unauthorized copying harms that market (or precludes the development of that market); and (3) arguing the market is "diluted" by generating works that are similar enough (in subject matter or genre) that they will compete with the originals and thereby indirectly substitute for them. Applying these approaches to this case, the court rejected the first and second approaches, and held that while the third is more promising, "the plaintiffs' presentation is so weak that it does not move the needle, or even raise a dispute of fact sufficient to defeat summary judgment." Accordingly, the court found this factor weighed in favor of finding fair use.
- Approach #1: The court rejected this approach, noting that Llama does not allow users to generate any meaningful portion of the plaintiffs' books, and that even if manipulated, Llama will not produce more than 50 words of any of the plaintiffs' books. Accordingly, the court found that Llama's ability to regurgitate minuscule portions of the plaintiffs' books if manipulated into doing so does not threaten to have a "meaningful or significant effect 'upon the potential market for or value of the plaintiffs' books."
- Approach #2: The court rejected this approach, stating that if the "potential market is defined as the theoretical market for licensing the use at issue," copyright holders will always succeed in arguing market harm. Therefore, to prevent the fourth factor analysis from becoming circular and favoring the rightsholder in every case, the court noted that "harm from the loss of fees paid to license a work for a transformative purpose is not cognizable."
- Approach #3: The court found that "the concept of market dilution becomes highly relevant," as this case involves a technology that "can generate literally millions of secondary works, with a miniscule fraction of the time and creativity used to create the original works it was trained on." The court determined, however, that plaintiffs failed to show market dilution (or assert market dilution as a harm in their Complaint or in their own summary judgment motion). The court disagreed that market harm could be inferred, as that requires too great an inferential leap: "inferring that Llama (and not just any LLM) can be used to create such books, that it will be used to create such books, that consumers will purchase those books instead of books written by human authors, that consumers will buy those books instead of the plaintiffs' books in particular, and that Llama is meaningfully better at creating those books because it was trained on copyrighted material."

Weighing the Fair Use Factors

Determining that the first, third, and fourth factors weighed in favor of finding fair use, the court granted Meta's motion for summary judgment on its fair use defense.

Takeaways

Because Meta proposed seeking summary judgment regarding the individual claims of the named plaintiffs before class certification, this court decision binds only the individual named plaintiffs, "leaving all other members of the proposed class free to sue on the same claims." In addition, the court's examination of the fair use factors may influence how other courts confronting use of copyrighted works for AI model training may view this issue — and there are numerous such cases pending.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

[View Source]

Second District Court Finds Fair Use In Evaluating Use Of Copyrighted Works For LLM Training

Contributor

Background

Fair Use Defense

Takeaways

Intellectual Property

Contributor

United States