On 17 January 2023, Getty Images announced that it had initiated High Court proceedings in London against Stable Diffusion Limited for copyright infringement. Separately, a class action lawsuit has been launched in California against the generative AI systems Stability AI, Midjourney and Deviant Art.
Stability AI created Stable Diffusion, the text-to-image diffusion model that platforms such as Lensa AI uses in its Magic Avatars app. Lensa AI has become a huge viral phenomenon after it rolled out its Magic Avatar app. This app uses AI and several photos from users to create AI-assisted portraits in a variety of styles. However, the Magic Avatar app has also been the subject of controversy after some artists have expressed their unhappiness at how the AI is trained.
LAION-5B is one of the largest text/image datasets available today. It has been used by numerous companies to create deep learning models. One such deep learning model is called Stable Diffusion, on which new AI apps such as Lensa AI rely. LAION-5B is a dataset of 5.85 billion image-text pairs, which is 14 times larger than LAION-400M, the previous biggest openly accessible image-text dataset in the world.
According to LAION, "To create image-text pairs, we parse through WAT files from common crawl and parse out all HTML IMG tags containing an alt-text attribute. At the same time, we perform a language detection on text with three possible outputs...". The common crawl corpus contains petabytes of data collected since 2008. It contains raw web page data, extracted metadata and text extractions. So LAION identifies all those internet image files with an associated text accompanying them.
Interestingly LAION provides the dataset under a creative commons licence, which they say poses no particular restrictions. However, LAION recommends that the images should only be used for research purposes. They also point out that the images themselves are under copyright.
The issue being addressed in both sets of proceedings is whether using copyright works to train AI constitutes infringement. The alleged infringement that is the subject of those proceedings relates to works such as artworks and photographs. However, the principles involved could easily be applied to other forms of generative AI systems, such as those that output text or music.
In the Californian class action suit, the plaintiffs are claiming:
- direct copyright infringement;
- vicarious copyright infringement;
- DMCA violations;
- right of publicity violations; and
- unlawful competition.
The plaintiffs claim that the defendants reproduced the works, prepared derivative works, distributed copies of the works, performed the works, and displayed the works without the necessary authorisation. The plaintiffs also claim that passing off is occurring due to the ability to create art "in the style of" a particular artist. They say this has led to imposters selling fake artworks claiming to be established artists. The plaintiffs say that the defendants are liable for the sale of these so-called "fake" artworks on the basis of vicarious liability.
Interestingly, an important part of the claim is the plaintiffs' allegation that the defendants removed copyright management information (CMI) metadata from the works, which would have allowed them to be properly identified. The plaintiffs allege that this is not permitted under the DMCA. In the EU, Article 7 of the Directive 2001/29/EC on the Harmonisation of Certain Aspects of Copyright and Related Rights in the Information Society, would apply where CMI was removed. This article obliges Member States to provide adequate legal protection against any person knowingly and without authority removing or altering any electronic rights management information.
Data protection plays an important part in the proceedings too; one of the contractual claims against Deviant Art is that it unlawfully shared and processed the plaintiffs' personal data.
In the EU, under GDPR, a person's name is their personal data, and its processing must follow data protection rules. Linking an identifiable individual's name to an image and processing it is a form of processing of personal data. It will be interesting to see what avenue the Californian courts take on the issue, and the Californian case also raises the question of whether data protection will feature in future EU AI infringement litigation.
The class action will be interesting as it is sure to test the limits of the US Fair Use doctrine. While reproductions may made when creating datasets and training AI, it is unclear if an output image constitutes a derivative work. The Getty images proceedings will be especially interesting to see how the narrower 'fair dealing' consideration is applied in the UK or if it's even relied on. Given the similarity between Irish and UK copyright law, the UK proceedings are sure to be watched closely from Ireland.
If a case similar to the Californian case was underway in Europe, the reproduction exception for text and data mining in Directive (EU) 2019/790 on Copyright and Related Rights in the Digital Single Market, would play a prominent role, and it would be interesting to see it play out.
As AI music generation systems become more prevalent, we are likely to see similar music copyright cases and substantially similar litigation.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.