- within Intellectual Property topic(s)
- with readers working within the Technology industries
- with readers working within the Technology industries
- within Intellectual Property topic(s)
The Northern District of California recently rendered two decisions in rapid succession concerning the legality of using copyrighted works to train artificial intelligence large language models ("LLM"). The decisions touch on issues of fair use and copyright infringement and offer insight into the future of IP litigation and licensing in the AI era.
The first case Bartz v. Anthropic PBC, No. C 24-05417 WHA (N.D. Cal. June 23, 2025), involved a group of authors suing Anthropic (owner of Claude.AI) for copyright infringement, alleging unauthorized use of their works to train Claude's LLM. Anthropic claimed any use of copyrighted materials in the training of Claude was a fair use under U.S. Copyright law. Judge Alsup opined that the training of an AI LLM like Claude using copyrighted works was a "spectacularly transformative" use of such works which would, in essence, constitute a "fair use" under the Copyright Act. The court mentioned that "[a]uthors cannot rightly exclude anyone from using their works for training and learning as such" and "[l]ike any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different. If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use." The Court also explained that taking a legitimately owned or licensed print library copy and converting it to a digital library copy was a fair use. However, the court made clear that using, copying or storing pirated or unauthorized copies to establish a central LLM training library is not fair use, but copyright infringement. Anthropic allegedly stored up to seven million copies of pirated books which it used to train its LLM without providing any payment to the copyright holders. Although the AI training itself was found to be fair use, Anthropic will now have to face trial for copyright infringement based on its use of the pirated books and the damages associated with the use, copying and storage of the pirated copyrighted works.
The second case, Kadrey v. Meta Platforms, Inc., No. 23-cv-03417-VC (N.D. Cal. June 25, 2025), involved a group of authors suing Meta on similar grounds, that Meta used these authors' copyrighted materials without permission or authorization for LLM training. Judge Chhabria similarly found that Meta's use of books to train its LLM was a fair use under the Copyright Act. The court described the use as, "highly transformative," but used a different reasoning than used by Judge Alsup in the Anthropic case to make this determination. Judge Chhabria heavily considered the fourth factor of the fair use analysis - "market dilution" – which evaluates the detrimental impact of the alleged fair use on the market for the copyright owner's works (usually in the form of licensing opportunities). The judge excoriated plaintiffs for failing to provide any "meaningful evidence on market dilution at all." The court went on to say if the plaintiffs had "presented any evidence that a jury could use to find in their favor on the issue, factor four would have needed to go to a jury. Or perhaps the plaintiffs could even have made a strong enough showing to win on the fair use issue at summary judgment." In the court's view, future copyright plaintiffs will stand a better chance of defeating fair use arguments for AI training if they can provide "better-developed records on the market effects of the defendant's use." Especially, the court noted, if artificial intelligence takes information and makes endless streams of competing works, therefore ruining markets for original copyrighted works.
These cases establish a baseline reasoning that could become a trend in future litigations – deference to LLM training as highly transformative fair use, tempered with respect for the traditional rights and financial interests of copyright owners. Simply put, if you want to use copyrighted material to train an LLM, it should be licensed or paid for. So far, the focus has been on preventing the use and storage of pirated and unauthorized copies of copyrighted works by the AI companies instead of attacking the training uses. If this trend plays out, we will likely see the establishment of a copyright licensing model for AI training and large clearinghouses like we have seen in other industries. New copyright infringement cases against AI companies continue to be filed and each decision will continue to mold the future of AI training and IP licensing.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.
 
                    