District Court Finds That Using Copyrighted Works To Train Large Language Models Is Fair Use

Article Insights

Anna B. Chauvet’s articles from Finnegan, Henderson, Farabow, Garrett & Dunner, LLP are most popular:

in Africa

In a significant ruling issued on June 23, 2025, the U.S. District Court for the Northern District of California held that using copyrighted works to train large language models (LLMs) is fair use under U.S. copyright law. The court found such use is "exceedingly transformative" and does not displace demand for or harm the market for the original works.

The court granted summary judgment for defendant Anthropic PBC, finding both that using copyrighted works for training LLMs was a fair use, and that Anthropic's creation of a digital library by purchasing and digitizing millions of print books was a fair use. The court, however, found the use of downloaded pirated copies to build Anthropic's central library was not justified by fair use, and thus denied summary judgment for Anthropic that the pirated library copies must be treated as training copies.

Background

Defendant Anthropic is an AI software firm that offers an AI software service named Claude. When a user prompts Claude with text, Claude quickly responds with text — mimicking human reading and writing. To enable such functionality, Anthropic trained Claude — or rather trained large language models (LLMs) underlying various versions of Claude — using books and other texts selected from a central library Anthropic assembled.

Plaintiffs are the authors of multiple copyrighted books. The plaintiffs filed a class action complaint against Anthropic, alleging that Anthropic had infringed their copyrights by, among other things, using their copyrighted books to train its Claude LLMs. The plaintiffs did not, however, allege that any infringing copy of their works was or would ever be provided to users by the Claude service.

The court's order characterized Anthropic's use of plaintiffs' books as follows:

Assembling copies of plaintiffs' books into a central library, copying further various sets and subsets of those library copies to include in various "data mixes," and using these mixes to train various LLMs.
Keeping the library copies in place as a permanent, general-purpose resource (whether the copies were to be used for training or not). Anthropic assembled its library by downloading pirated copies of copyrighted books and purchasing millions of print books, which were scanned into digital form.
Each work selected for training any given LLM was copied in four main ways:
1. "[C]opied from the central library to create a working copy for the training set."
2. "[C]leaned to remove a small amount of lower-valued or repeating text (like headers, footers, or page numbers), with a 'cleaned' copy resulting."
3. "[T]ranslated into a 'tokenized' copy. Some words were 'stemmed' or 'lemmatized' into simpler forms (e.g., 'studying' to 'study'). And, all characters were grouped into short sequences and translated into corresponding number sequences or 'tokens' according to an Anthropic-made dictionary. The resulting tokenized copies were then copied repeatedly during training."
4. "Compressed" copies were retained in each fully trained LLM (which the court's order took for granted for purposes of deciding Anthropic's motion for summary judgment).

Fair Use Defense

Anthropic filed a motion for summary judgment, asserting that its uses of the plaintiffs' books were fair use and thus noninfringing.

The court evaluated Anthropic's use of the copyrighted works for training and to create a central library (from pirated and purchased sources) under each of the four fair use factors. At a high level, the court concluded that:

Use of the copyrighted books for training the LLMs was exceedingly transformative and a fair use.
Digitization of the purchased books was a fair use because Anthropic merely replaced the print copies it had purchased with more convenient space-saving and searchable digital copies for its central library (without adding new copies, creating new works, or redistributing existing copies).
Creating a permanent, general-purpose library using pirated copies of copyrighted works was not itself a fair use.

First Factor: Purpose and Character of the Use

Copies Used to Train the LLMs

The court concluded that Anthropic's use of the authors' works to train LLMs to receive text inputs and return text outputs was "spectacularly" transformative. Specifically, "Anthropic used copies of [the plaintiffs'] copyrighted works to iteratively map statistical relationships between every text-fragment and every sequence of text-fragments so that a completed LLM could receive new text inputs and return new text outputs as if it were a human reading prompts and writing responses." The court determined that "Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different."
The court emphasized that the plaintiffs had alleged only that use of their books for training the LLMs constituted infringement, not that any LLC output to users was infringing.
The court held that the plaintiffs cannot exclude anyone from learning or training from their works "as such," and so rejected the plaintiffs' arguments that they should be able to exclude Anthropic from using their works to train Claude to "read and write."
Because the plaintiffs had not alleged that any LLC output to users was infringing, the court rejected the plaintiffs' argument that Anthropic's training was intended to memorize the creative elements of their works.

Use of Purchased Copies to Build a Central Library

The court found Anthropic's use of the purchased library copies — converted from print to digital, a mere format change — was a transformative use. Specifically, the court held that "[s]torage and searchability are not creative properties of the copyrighted work itself but physical properties of the frame around the work or informational properties about the work." The court also noted that every purchased print copy was copied in order to save storage space and to enable searchability as a digital copy, that the print original was destroyed (i.e., one replaced the other), and there was no evidence that the new, digital copy was shown, shared, or sold outside the company. According to the court, that Anthropic is a commercial entity is indicative, not dispositive.

Use of Pirated Copies to Build a Central Library

The court rejected the notion that the use of the pirated copies for a central library can be excused as fair use merely because some will eventually be used to train LLMs. The court stated that "piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded." The court noted, however, that it did not need to decide the case on that rule, as Anthropic had not used the pirated copies only for training its LLM. Rather, it retained pirated copies even after deciding it would not use them or copies from them for training its LLMs ever again. The court determined that pirating copies to build a research library without paying for it, and to retain copies should they prove useful for one thing or another, was its own use — and not a transformative one. Therefore, the court held that this factor weighed against a finding of fair use.

Second Factor: Nature of the Copyrighted Work

The court determined that the second factor weighed against fair use for all the copies used. Anthropic had accepted that the plaintiffs' books contained expressive elements. The court accepted the plaintiffs' view that their works were chosen by Anthropic for their expressive qualities, and so this factor weighed against fair use for all copies used for all purposes.

Third Factor: Amount and Substantiality of the Portion Used

Copies Used to Train the LLMs

The court found that the copying was reasonably necessary to the transformative use (i.e., training the LLMs), and so this factor weighed in favor of fair use. The court held that this factor focused not on the amount used to make the copy, but rather the amount that was made accessible to the public and served as a competing substitute for the original. As the plaintiffs had not alleged that any of Claude service's output was infringing, the court found the copying was reasonable. The court — and all parties — agreed that Anthropic needed billions of words to train any given LLM. The plaintiffs had not contested that the volume of text required to train an LLM is monumental, and so the court maintained that using so many works for training was reasonably necessary.

Use of Purchased Copies to Build a Central Library

The court found that the purpose of the copying — more favorable storage and searchability properties — required copying the entire work. This factor thus weighed in favor of fair use.

Use of Pirated Copies to Build a Central Library

Because Anthropic was not entitled to hold the pirated copies at all, the court held that this factor weighed against a finding of fair use.

Fourth Factor: Effect of the Use Upon the Market for or Value of the Copyrighted Work

Copies Used to Train the LLMs

The court found that this factor weighed in favor of fair use for multiple reasons:
- As the plaintiffs had conceded "that training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public," the copies used to train specific LLMs did not and will not displace demand for copies of their works.
- The court rejected the plaintiffs' generic contention that training LLMs will result in an explosion of works competing with their works, noting that "[t]his is not the kind of competitive or creative displacement that concerns the Copyright Act," which "seeks to advance original works of authorship, not to protect authors against competition." The court also rejected the plaintiffs' argument that training LLMs displaced (or will displace) an emerging market for licensing their works for the narrow purpose of training LLMs, as "such a market for that use is not one the Copyright Act entitles Authors to exploit."

Use of Purchased Copies to Build a Central Library

The court found this factor was neutral, as any losses from a format change "did not relate to something this Copyright Act reserves for [the plaintiffs] to exploit." The court also noted that the record did not show any intent to redistribute library copies once acquired or an inability to secure that valuable library against outside actors.

Use of Pirated Copies to Build a Central Library

The court found this factor to weigh against fair use, stating that "[t]he copies used to build a central library and that were obtained from pirated sources plainly displaced demand for [the plaintiffs'] books — copy for copy."

Weighing the Fair Use Factors

After weighing the four factors for each type of use, the court granted summary judgment for Anthropic that the training use was a fair use and that the print-to-digital format change was a fair use. The court found the downloaded pirated copies used to build a central library were not justified by a fair use, and denied summary judgment for Anthropic that the pirated library copies must be treated as training copies.

The case will proceed to trial on the pirated copies used to create Anthropic's central library and the resulting damages, actual or statutory (including for willfulness).

Takeaways

This is the first substantive decision issued by a U.S. court concerning fair use and generative AI model training. The court's determination that using copyrighted works to train large language models is "exceedingly transformative" and fair use is likely to influence how other courts confronting this issue — and there are numerous such pending cases — may view this issue.

The case is Bartz v. Anthropic PBC, Case No. 24-cv-5417 (N.D. Cal. June 23, 2025)

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

[View Source]

District Court Finds That Using Copyrighted Works To Train Large Language Models Is Fair Use

Contributor

Background

Fair Use Defense

Weighing the Fair Use Factors

Takeaways

Intellectual Property

Contributor

United States