In a landmark decision, a German district court recently decided that copying images to create a data set that can potentially be used for training generative artificial intelligence (AI) systems does not infringe German copyright law. Robert Kneschke v. Large Scale Artificial Intelligence Open Network, Case No. GRUR-RS 2024, 25458 (Hamburg District Court Sept. 27, 2024)
The nonprofit Large Scale Artificial Intelligence Open Network (LAION) created a data set containing 5.85 billion image-text pairs publicly available on the internet. This data set can be used to train generative AI systems. For the creation of the data set, LAION accessed a preexisting data set with uniform resource locators (URLs) referencing images and their descriptions. First, LAION extracted the URLs and downloaded the referenced images, including a copyrighted work by photographer Robert Kneschke, even though a reservation of use against web scraping was declared on a subpage of the website. LAION analyzed the image descriptions with a software application. The application excluded image-text pairs where text and image content did not match sufficiently. LAION only added validated image-text pairs to its data mining.
Robert Kneschke claimed copyright infringement based on LAION's download of his images.
The district court explained that LAION's mere downloading of Kneschke's images did not encroach on his right of reproduction under German copyright law. The district court further held that LAION's actions were justified under and in compliance with Section 60d(1) of the German Act on Copyright and Related Rights (UrhG) – a scientific research exception.
Section 60d(1) authorizes reproduction of text and data mining for scientific purposes by research organizations. The district court clarified that the creation of the data set was data mining, even if the purpose of the creation was AI training. As the district court explained, analysis of an image to compare it with a preexisting description is analysis for the purpose of obtaining information. The district court held that even the creation of the data set, which could form the basis for training AI systems, should be regarded as a scientific purpose (i.e., activity in pursuit of new knowledge irrespective of an immediate knowledge gain or subsequent research success). The creation of the data set was found to be a fundamental step for the purpose of using the data set to gain knowledge later. Of note, the data set was published free of charge and thus also made available to researchers involved in AI. According to the district court, because the training and development of AI systems (even by commercial enterprises) is still scientific research, it was irrelevant that the data set could additionally be used by commercial enterprises to train or develop their AI systems.
Although not legally relevant to the outcome, the district court considered the reservation of use declared in natural language (English) on a subpage to be machine-readable and therefore effective.
Practice Note: This judgment will have far-reaching implications for the use of copyright as a barrier to training AI systems.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.