Generative AI vs. Copyright: The Fair Use Debate

Generative AI engines and large language models (LLMs) have increasingly gained traction, spurring necessary discourse about how exactly such innovations ought to coexist with the law. One particularly difficult issue is the case of intellectual property, as LLM platforms, and the methods used to "train" them, threaten the sanctity of copyright law. As policy debates over the structure of AI and copyright law continue to heat up, it's important to understand how this new technology may impact your intellectual property rights.

"Fair use" and AI training

One of the primary clashes between AI and copyright law stems from disputes regarding whether training LLM engines with copyright-protected material constitutes "fair use" of that material. Generally speaking, using a copyrighted work without the copyright owner's permission is copyright infringement. One main exception to this principle is the doctrine of fair use, which permits use of a copyrighted work without the owner's permission when that use involves some sort of transformation of the original work, such that potential consumers of the original material would not use the "copied" version in lieu of the work it is derived from. Commentary, criticism, and parody are common examples of fair use, as these serve a fundamentally different creative purpose than the original work.

However, the feeding of copyrighted material into AI models to train the systems has become the impetus of a new hot fair use debate. Because LLMs are currently incapable of generating truly "original"content, AI firms train their models on pre-existing materials. This practice has caused many to ask whether the use of copyrighted material for AI training constitutes a legitimate instance of fair use under copyright law.

What have the courts held?

Copyright challenges to AI have prompted more than 40 lawsuits, and the Northern District of California recently handed down the first two decisions in these cases. The first opinion, issued by the Honorable William Alsup in Bartz v. Anthropic, reasons that training LLMs on copyrighted material is paramount to aspiring writers reading famous works of literature in order to hone their craft. Accordingly, the court held that owners of copyrighted works do not need to be compensated for their work to be employed in LLM training.

The second ruling, issued by the Honorable Vincent Chhabria in Kadrey v. Meta, takes a slightly more favorable position for copyright holders. There, Judge Chhabria held that there is a fundamental difference between AI-generated and human-crafted content, noting that while human creation requires copious amounts of time and creativity, AI-generation needs neither. While the court determined that using copyright work to train AI was likely not fair use, Judge Chhabria dismissed the case because the author plaintiffs had not successfully developed the argument that output from Meta's AI models would "dilute the market for their own works."

How are artists responding?

Rightsholders have claimed that the use of their copyright-protected material to train LLM engines without their approval or licensing is a clear overstep of fair use guidelines. These creatives assert that AI firms have engaged in a massive breach of intellectual property rights nationwide.

Notable authors, such as David Baldacci, have brought these practices to the attention of the U.S. Senate in a congressional hearing convened by Sen. Josh Hawley (R-Missouri), where they have received sympathy from prominent lawmakers on both sides of the aisle.

Proposed legislation affecting AI

Multiple members of Congress are attempting to find solutions to balance the competing interests of copyright protection and technological innovation. Senators Peter Welch (D-Vermont) and Marsha Blackburn (R-Tennessee) have proposed the Train Act, which would allow creators to determine whether a company used their copyrighted intellectual property to train its AI.

Senators Josh Hawley (R-Missouri) and Richard Blumenthal (D-Connecticut) recently introduced the AI Accountability and Personal Data Protection Act, legislation that would place further restrictions on AI firms and allow individuals to sue companies that use copyrighted material or personal data without "express, prior consent." The bill would effectively force AI firms to obtain licenses to use any copyrighted materials for training LLM models.

What does the future hold for copyright and AI?

The debate over copyright and AI is far from settled. Many more cases remain pending in federal courts, and the aforementioned opinions are likely to be appealed to higher courts. If those courts follow the opinion of Judge Alsup in Bartz, then rightsholders will have very little control over how AI firms use their copyrighted material in training LLM models. However, if courts agree with Judge Chhabria's reasoning in Kadrey – or if the legislation proposed by Senators Hawley and Blumenthal is enacted – AI firms may have to pay to license content from rightsholders before they can include it in LLM training.

Either way, the development of LLMs and their relation to intellectual property rights is guaranteed to generate much more discussion, litigation, and eventual legislation. More questions are sure to arise as AI technology and the law develop.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.