Copyright Office Releases Pre-Publication Report On Copyrighted Works In Generative AI Training

Article Insights

This article from Manatt, Phelps & Phillips LLP is most popular:

within Intellectual Property topic(s)
in United States

Tod Cohen’s articles from Manatt, Phelps & Phillips LLP are most popular:

in United States
with readers working within the Retail & Leisure industries

Manatt, Phelps & Phillips LLP are most popular:

within Law Department Performance and Real Estate and Construction topic(s)

On May 9, 2025, the U.S. Copyright Office (USCO) released a pre-publication version of Part 3 of its Copyright and Artificial Intelligence (AI) Report (the Report) addressing the uses of copyrighted works in Generative AI Training. This version, unlike Parts 1 and 2 of the Report, is a highly unusual pre-publication version, released less than 24 hours before the dismissal of the U.S. Register of Copyrights, Shira Perlmutter.¹ The USCO noted that a final version, "without any substantive changes expected", will be published in the near future.² It is unclear how much weight this version will have, but it should have some persuasive authority in the multiple lawsuits focusing on copyright and AI. Part 3 does provide guidance on how the USCO views whether AI's acts of infringement can be excused under the Fair Use doctrine. Overall, the Report leans towards limiting Fair Use claims, especially around the potential harms inflicted on copyright owners in the market for AI training data.

Mechanical Elements of Generative AI Training Examined

The Report begins by examining how generative AI training works and how copyrighted works are used in the training. The report acknowledges that, although generative AI training is an iterative process, there are two key phases that unquestionably implicate copyrights: data collection and memorization.

At the data collection and curation stage, the generative AI models rely on datasets of large quantity and high quality, often including copyrighted materials that are copied into the training dataset. To the USCO, this is prima facia evidence of infringement of the right of reproduction under Section 106 of the Copyright Act.³

At the memorization stage, the datasets are then ingested by the AI model, and the model "learns" by adjusting its "weights,"⁴ a mathematical representation of the copies in the dataset, to obtain wanted outputs and minimize errors. The Report notes that these "weights" may be considered protectable expression and may, therefore, infringe on the right to derivative works under Section 106 of the Copyright Act if the model has retained or memorized those weights.

Fair Use

Once infringement is established, the legal question becomes whether these uses of copyrighted works to train generative AI may nonetheless be protected as Fair Use under Section 107 of the Copyright Act. The USCO responded that "[m]any uses fall somewhere in between" and proper analysis of the Fair Use factors based on the facts is necessary.⁵

Under copyright law, there are four factors that must be asserted to determine whether the use is excluded from liability for copyright infringement. The Report evaluates each of these factors and opines on the actions of the alleged infringer.

Factor One

The first Fair Use factor asks whether the infringing use has a further purpose or a different character from that of the copyright's underlying purpose — original expression of creativity. A court examines this factor as the "transformativeness"⁶ and "commerciality" of the infringer's use.⁷

Transformativeness

Transformativeness turns on how the AI model is deployed and its purposes. The Report determined that, when the training of a generative AI model on copyrighted works is for "analysis"⁸ or "research,"⁹ the use is more likely transformative. However, when the model scrapes a dataset of copyrighted works for commercial use and generates outputs substantially similar to copyrighted works, the use is less transformative. Since there are no clear brightline rules, this factor must be evaluated case by case.

Generative AI-advocates argue the use is highly transformative because copyrighted works are used for statistical analysis, not for their originally expressive or consumptive purposes. They argue the process is abstract and non-expressive. In opposition, pro-copyright stakeholders find AI training to take the expressive elements of copyrighted works and re-encode, not transform, the works. Opponents determine that AI training relies on and extracts the expressive choices of authors, not just the facts or ideas presented.

In response, the USCO rejected the argument that the use is completely non-expressive because the generative AI models are capable of creating expressive content.¹⁰ For example, "[i]mage models are trained on curated datasets of aesthetic images because those images lead to aesthetic outputs."¹¹ Then, when the model generates an aesthetic output, that is a reproduction of the copyrighted work—the use is expressive. The USCO also refused to equate AI training to human learning, given AI's superhuman speed and scale. Thus, the Report emphasized that distinct and specific uses of copyrighted works in AI training must be analyzed under separate considerations and in the context of its overall use.

Commerciality

Commerciality considers the unfair use of copyrighted works for financial benefit. The Report noted that commerciality turned on the commercial purpose of a use, not the commercial status of an entity conducting the AI training. Knowing use of pirated or illegally accessed copyrighted works, however, was deemed by the USCO as against a finding of Fair Use, especially under Factor Four below.

Factor Two

For the second Fair Use factor, the nature of the copyrighted work, the Report noted that this factor will turn on the facts of the "quantity" of material used and its "quality" and importance. Using more creative or expressive works, especially published ones, weighs against Fair Use, whereas use of factual or functional works are more likely considered Fair Use by the USCO.

Factor Three

The third Fair Use factor examines the amount and substantiality of the portion of the copyrighted work used.

The Report concluded that there may not be Fair Use when the AI uses full copies of the entire work, even for training, though, when the training involves a transformative process and when the copies of the copyrighted works are not made accessible to the public, that is more likely to be a successful Fair Use defense.

Factor Four

Following the Supreme Court's recent decision in Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith¹² , the USCO agreedwith courts that the fourth Fair Use factor, the potential for market harm, is often the decisive factor in Fair Use" analysis and should be for generative AI training.¹³ The Report considered harm to different types of markets.

In the traditional market for lost sales of the original work, AI-generated outputs that substitute or reduce demand for the original works weigh against Fair Use.

The USCO expanded market harm to include market dilution, where outputs that are not direct copies but are imitations of style or feel of the original works can similarly dilute the value of the original, and harm the author's market position.

Further, despite the relative infancy of the licensing market for the use of copyrighted works in AI training, the USCO evaluated the loss of licensing for similar works as another market harm considered against Fair Use.

The Report expressed concern that unlicensed AI training could erode the overall value of original works by saturating the market with non-human-generated content, making it harder for human authors to compete.

Key Takeaways and Implications

As the legal and policy infrastructure continues to catch up with the evolving AI technologies, the pre-publication version of Part 3 of the Copyright and AI Report provides guidance to stakeholders in how to train generative AI models on copyrighted works with minimal risk of infringement. The Report emphasized that users of generative AI should consider implementing guardrails against unauthorized or illegal copying of copyrighted works and against potentially infringing outputs.

Conclusion

Manatt is experienced in obtaining positive outcomes for clients in licensing AI training data content in the U.S. and global markets. In the Generative AI space, Manatt supports its clients with legal guidance, compliance strategies, advocacy and litigation, as necessary, in this amazingly fast-growing environment, especially in the AI Training Data space.

Footnotes

1. See Katherine Tully-McManus, Trump fires top US copyright official, Politico, May 10, 2025.

2. U.S. Copyright Office, Copyright and Artificial Intelligence Part 3: Generative AI Training Pre-Publication Version (May 2025), available at https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf.

3. Id. at 26.

4. Id. at 28.

5. Id. at 46.

6. Id.

7. Id.

8. Id. at 42.

9. Id. at 46.

10. Id. at 48

11. Id. at 47.

12. 598 U.S. 508, 143 S. Ct. 1258, 215 L. Ed. 2d 473 (2023).

13. U.S. Copyright Office, Copyright and Artificial Intelligence Part 3: Generative AI Training Pre-Publication Version (May 2025), available at https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf, page 61.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

[View Source]