ARTICLE
17 June 2025

U.S. Copyright Office Issues "Pre-Publication Version" Of Report Regarding Generative AI Training

FH
Finnegan, Henderson, Farabow, Garrett & Dunner, LLP

Contributor

Finnegan, Henderson, Farabow, Garrett & Dunner, LLP is a law firm dedicated to advancing ideas, discoveries, and innovations that drive businesses around the world. From offices in the United States, Europe, and Asia, Finnegan works with leading innovators to protect, advocate, and leverage their most important intellectual property (IP) assets.
On May 9, 2025, the U.S. Copyright Office (USCO) made available a "pre-publication version" of the third part of its much-anticipated policy report concerning copyright and artificial intelligence (AI).
United States Intellectual Property

On May 9, 2025, the U.S. Copyright Office (USCO) made available a "pre-publication version" of the third part of its much-anticipated policy report concerning copyright and artificial intelligence (AI). This section addresses copyright issues and generative AI model training.

The report states that "[a] final version will be published in the near future, without any substantive changes expected in the analysis or conclusions." Shira Perlmutter, who served as the Register of Copyrights at the time the pre-publication version was made available, no longer heads the USCO. Accordingly, whether new leadership at the USCO will endorse the positions in the pre-publication version or issue a final report remains unclear.

In addition, how courts will consider the USCO's positions is uncertain. A recent U.S. Supreme Court decision held that courts, not federal agencies, are better suited to interpret ambiguities in federal law. That same decision eliminated the highest level of deference previously afforded by courts to federal agencies' practices and legal interpretations. (Read more about the potential impacts of that decision here.) In reaching at least one conclusion in the pre-publication version of the report, the USCO acknowledges it is venturing into "uncharted territory."

Below is a high-level summary of the USCO's general positions in the pre-publication version of the report, as well as those relating specifically to the four fair-use factors: 1) purpose and character of the use; 2) nature of the copyrighted work; 3) amount of work used and importance to the whole; and 4) effect on potential market for or value of copyrighted work. In the U.S., the "fair use" doctrine allows for permissionless use of copyrighted works in certain cases, including for parody, teaching, and news reporting.

Summary of USCO's Positions in the "Pre-Publication Version" of the Report

  • No changes to current copyright law are necessary, as the fair use doctrine is flexible enough to address issues raised by generative AI systems.
  • As fair use is a case-by-case inquiry, it is not possible to prejudge the results in any particular case.
  • Various uses of copyrighted works in AI training are likely to be transformative under the first fair-use factor. The extent to which the uses are fair will depend on what works were used, from what source, for what purpose, and with what controls on the outputs.
  • Given the growth of voluntary licensing, government intervention would be premature at this time.
  • A compulsory licensing regime for AI training would have significant disadvantages. Voluntary licensing agreements for AI training should continue to develop, extending into more contexts as soon as possible.
  • Many applications of generative AI promise great benefits for the public, as does the production of expressive works.
  • The first and fourth fair-use factors can be expected to assume considerable weight in courts' analysis of issues involving generative AI.

First Fair-Use Factor (Purpose and Character of the Use)

  • Copyrighted works are used in different ways during the development and deployment of generative AI models, and such different uses require separate consideration under the first factor.
  • While it is important to identify the specific act of copying during development, as compiling a dataset or training alone is rarely the ultimate purpose, fair use must also be evaluated in the context of the overall use.
  • Training a generative AI foundation model on a large and diverse dataset will often be transformative. The process converts a massive collection of training examples into a statistical model that can generate a wide range of outputs across a diverse array of new situations. Many AI models are meant to perform a variety of functions, some of which may be distinct from the purpose of the copyrighted works they are trained on (i.e., human enjoyment and education).
  • Transformativeness is a matter of degree, and how transformative or justified a use is will depend on the functionality of the model and how it is deployed.
  • Because generative AI models may simultaneously serve transformative and nontransformative purposes, restrictions on their outputs can shape the assessment of the purpose and character of the use. Where such restrictions are effective, the system will be less capable of fulfilling the purpose of the original works, and their use in training may be more transformative.
  • Training a model to generate outputs that are substantially similar to copyrighted works in the dataset may not be transformative.
  • The knowing use of a dataset that consists of pirated or illegally accessed works should weigh against fair use without being determinative.

Second Fair-Use Factor (Nature of the Copyrighted Work)

  • As generative AI models are regularly trained on a variety of works—both expressive and functional, published as well as unpublished—the facts will vary depending on the model and works at issue.

Third Fair-Use Factor (Amount of Work Used and Importance to the Whole)

  • Relevant considerations may include how much of each work is used; the reasonableness of the amount in light of the purpose of the use; and the amount made accessible to the public.
  • While for large, general-purpose models, there is no need to copy any amount of any specific work, research supports commenters' assertions that internet-scale pre-training data, including large amounts of entire works, may be necessary to achieve the performance of current-generation models. To the extent there is a transformative purpose, the use of entire works on that scale could be reasonable.
  • This factor may weigh less heavily against generative AI training where there are effective limits on the trained model's ability to output protected material from works in the training data. Most outputs from generative AI systems do not contain any protected expression from their training data, and models can be deployed in ways that entirely obscure outputs from users or result in non-expressive outputs. Where a model can output expression, however, the question is whether the AI developer has adopted adequate safeguards to limit the exposure of copyrighted material.

Fourth Fair-Use Factor (Effect on Potential Market For or Value of Copyrighted Work)

  • Lost Sales: There are instances where the use of works in generative AI training could lead to a loss in sales. For example, the use of pirated collections of copyrighted works to build a training library, or the distribution of such a library to the public, would harm the market for access to those works. Where training enables a model to output verbatim or substantially similar copies of the works trained on, and those copies are readily accessible by end users, they can substitute for sales of those works.
  • Market Dilution: The speed and scale at which AI systems generate content may pose a serious risk of diluting markets for works of the same kind as in their training data. Market harm can also stem from AI models' generation of material stylistically similar to works in their training data.
  • Lost Licensing Opportunities: Where licensing markets are available to meet AI training needs, unlicensed uses will be disfavored under the fourth factor. But if barriers to licensing prove insurmountable for parties' uses of some types of works, there will be no functioning market to harm and the fourth factor may favor fair use. There are copyright sectors where licensing infrastructure does not yet exist and may be difficult to build, and the amount of training data needed to produce state-of-the-art models may vary by content type or type of training. Administrative or transactional costs can pose particular challenges when works are created outside of professional creative industries or are not intended to be monetized, or when ownership is diffuse.

Takeaway

If you are interested in these topics and wish to discuss AI and copyright issues and/or policy studies, please contact Anna Chauvet (anna.chauvet@finnegan.com) for more information. Anna is currently a partner at Finnegan, Henderson, Farabow, Garrett & Dunner, LLP and leader of Finnegan's copyright practice. Anna previously served as associate general counsel at the USCO.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

Mondaq uses cookies on this website. By using our website you agree to our use of cookies as set out in our Privacy Policy.

Learn More