ARTICLE
3 February 2026

Out Of The Shadow Library: Fair Use And AI Training Data

BB
Baker Botts LLP

Contributor

Baker Botts is a leading global law firm. The foundation for our differentiated client support rests on our deep business acumen and technical experience built over decades of focused leadership in our sectors and practices. For more information, please visit bakerbotts.com.
Since the launch of the first Large Language Models (LLMs), a wave of copyright litigation has been initiated by authors, musicians, and news organizations alleging that their works...
United States Technology
Baker Botts LLP are most popular:
  • within Insolvency/Bankruptcy/Re-Structuring topic(s)

Since the launch of the first Large Language Models (LLMs), a wave of copyright litigation has been initiated by authors, musicians, and news organizations alleging that their works were misappropriated to build today's most powerful generative AI tools. In response, AI companies have asserted that such use is non-infringing fair use. These lawsuits target a spectrum of alleged copyright infringement. Some lawsuits focus solely on the unauthorized use of works as training inputs, others focus on the model's ability to generate allegedly infringing outputs, and many premise liability on both behaviors.

In 2025, two rulings offered the first judicial perspectives on one end of this spectrum: whether unauthorized use of copyrighted works as training "inputs" constitutes fair use. While the outcomes of these cases have been widely reported—both rulings suggest training generative AI models on copyrighted works may be fair use—the legal landscape is far from definitive. Fair use remains a famously fact-specific inquiry, and these holdings do not broadly shield users of third-party content from liability for unlawful data acquisition or retention, and do not foreclose claims that may disturb the market for a copyrighted work. This article evaluates how last year's holdings in Bartz v. Anthropic1 and Kadrey v. Meta 2 on copyright law and AI training may shape the law and the market in 2026.

Fair Use Refresher
To prove infringement, a plaintiff must show a defendant used a copyrighted work in violation of one of the exclusive rights granted to a copyright owner: the rights to reproduce, distribute, perform, display, or adapt an original work without authorization.3 But not every unauthorized use results in liability. Copyright protection incorporates certain limitations—including fair use—designed to balance creative incentives with the public interest. Historically, fair use has accommodated technological innovation, such as home video recorders and internet search engines, where functional development requires reproducing protected works. Courts analyze fair use by assessing and balancing four statutory factors: (1) the purpose and character of the use, (2) the nature of the original work, (3) the amount and substantiality of the portion used, and (4) market effects.4 No single factor is determinative, but courts often focus their analysis on two factors: the purpose and character of the use (including whether the use is "transformative") and its effect on the potential market.

In Bartz v. Anthropic and Kadrey v. Meta, both plaintiffs were groups of authors who alleged that the unauthorized use of their literary works as "inputs" to train AI models constituted copyright infringement. In both Bartz and Kadrey, the question was not whether AI companies' use of the plaintiffs' works for AI training was authorized, but instead, whether unauthorized use at issue in each case was a fair use under the Copyright Act. The fair use holdings in Bartz and Kadrey suggest an openness to treating even unauthorized use of copyrighted works to train generative AI models as a fair use, but each case identifies important caveats for rightsholders and users to keep in mind.

Bartz v. Anthropic
In Bartz v. Anthropic, the court found Antropic's use of copyrighted works to train an AI model to be fair use, at least when the works were lawfully sourced.5 Anthropic compiled training data from both lawful (i.e. purchased books) and unlawful sources (i.e. pirated books) to train its model—and continued to retain all the data in a central library. The court separated its fair use analysis between the act of training and Anthropic's retention of data.

Regarding the first category, the act of training the model, the court found the first and fourth factors favored fair use because creating the model was "quintessentially transformative" and did not produce infringing substitutes.6 The third factor also favored fair use because the amount copied was "especially reasonable" for the purpose of using high-quality writing to train an LLM.7 While the second factors favored the plaintiffs, the court determined it was not dispositive considering the model's transformative purpose.8

Regarding the act of retaining unlawfully obtained copies, however, the court found "every factor point[ed] against fair use"9 (emphasis added). The court explained the piracy of otherwise available copies is "inherently, irredeemably infringing," even if the copies are immediately used or discarded.10 The court, however, stated it did not need to decide the case on that rule alone because Anthropic retained the pirated copies even after deciding not to use them.11 By splitting its analysis, the court cautioned that while the training process is transformative and may be protected, the acquisition of materials used to facilitate AI training is not.

Kadrey v. Meta
In Kadrey v. Meta, the court similarly found the unauthorized reproduction of copyrighted works to train an AI model to be fair use. Although Meta, like Anthropic, downloaded and reproduced pirated sources for portions of its training data, the court reached a different conclusion about the relevance of data acquisition in its fair use analysis.

Even with the use of training data from so-called "shadow libraries," (digital repositories of copyrighted works made available regardless of authorization) the court found the first factor weighed in favor of fair use.12 While the court acknowledged that Meta's use of shadow libraries was relevant to the first factor, the court ultimately found this factor to favor Meta because the downloading was considered an integral step toward the ultimate transformative goal.13 The second factor favored the plaintiffs because the books were expressive works, but the court found the third factor favored Meta because copying the entirety of the works was reasonably necessary to achieve the transformative purpose of training the model.14 Critically, the court leaned heavily on fourth factor, finding that it favored Meta due to the plaintiffs' perceived failure to provide evidence that Meta's model harmed the market for the plaintiffs' works.15

With that said, like in Kadrey, Meta's reliance on pirated works was not excused from all liability. In addition to alleging Meta had reproduced their works without permission, plaintiffs brought a separate claim alleging that Meta's use of peer-to-peer torrenting protocols to download the shadow libraries resulted in unauthorized distribution of their works to other users.16 Because neither party moved for summary judgment on that issue, it remains to be determined whether Meta's acquisition of training data was a fair use.

Takeaways
Together, these two decisions suggest a trend toward finding the use of copyrighted works as training inputs to a generative AI model to be fair use. However, both rulings clarify that this protection is not absolute. In addition to the unique facts of each party's AI training considered by the courts, the overall specter of copyright liability remains for the acquisition of data used to train AI. While both defendants in Bartz and Kadrey used "shadow libraries" of works to train their models, the cases' theories of liability were different. Anthropic faced copyright liability for its continued retention of pirated works. But Meta continued to face copyright liability not because it merely used pirated libraries, but because of how it acquired them. The court noted that while reproduction for training might be fair use, the distribution of protected works through a torrent network was a separate basis for liability.17 The differences here show how different courts may identify and consider unique issues.

Both cases also suggest that a transformative purpose alone may not be sufficient to avoid liability, particularly in the face of strong allegations of market harm. Significantly, both courts noted the absence of evidence that the AI models could produce substantially similar outputs that could substitute for the plaintiffs' original works. Thus, whether and to what extent the fair use analysis may differ if a model were shown to produce outputs that directly compete with its training data remains an open question.

Newly-filed litigation shows that rightsholders are aware of the open questions surrounding shadow libraries and market harm. In a case filed on January 28, 2026, a collection of music publishers sued Anthropic for allegedly unauthorized use of their music compositions.18 Unlike the first lawsuit of this type, however, the publishers specified violations of their distribution right via Anthropic's use of shadow libraries,19 the avenue of liability arguably left open in Kadrey. The publishers' complaint also carefully articulates their allegations regarding market harm, providing that the model has memorized and can regurgitate their works by user instruction, which acts as a direct substitute for their works.20

These recent fair use rulings and infringement allegations provide important but preliminary roadmaps on how generative AI intersects with copyright law and judicial precedent. However, as this technology and the legal landscape continue to shift myriad avenues for liability remain, particularly depending on how AI training data is obtained, retained, and used in generating outputs.

*Jasmine Boyer, a Law Clerk at Baker Botts, assisted in the preparation of this article.

Footnotes

1 Bartz v. Anthropic PBC, 787 F.Supp. 3d 1007 (N.D. Cal. 2025).

2 Kadrey v. Meta Platforms, Inc.,788 F. Supp. 3d 1026 (N.D. Cal. 2025).

3 17 U.S.C. §§ 106, 501.

4 17 U.S.C. § 107.

5 See Bartz, 787 F.Supp.3d at 1033.

6 Id. at 1022.

7 Id. at 1030.

8 Id. at 1033.

9 Id.

10 Id. at 1025.

11 Id.

12See Kadrey, 788 F. Supp. 3d at 1047-1048.

13 Id.

14 Id. at 1050.

15 Id. at 1050-1051.

16 Id. at 1043 n.4.

17 Id.

18 Complaint, Concord Music Group, Inc. v. Anthropic PBC, No. 3:26‑cv‑00880 (N.D. Cal. Jan. 28, 2026).

19 See id. at ¶ 4; see also id. at ¶ 64-66.

20 See id. at ¶ 108-110.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

[View Source]

Mondaq uses cookies on this website. By using our website you agree to our use of cookies as set out in our Privacy Policy.

Learn More