California Passes New Generative Artificial Intelligence Law Requiring Disclosure Of Training Data

On September 28, 2024, Governor Gavin Newsom signed into law AB 2013, which is a generative artificial intelligence ("AI") law that requires developers to post information on their websites regarding the data used to train their AI systems. Below are the key points regarding this new generative AI law:

Who does the law apply to? The law applies to AI developers, which is defined broadly to mean any person, government agency, or entity that either develops an AI system or service or "substantially modifies it," which means creating "a new version, new release, or other update to a generative artificial intelligence system or service that materially changes its functionality or performance, including the results of retraining or fine tuning."

What does the law regulate? The law regulates "generative artificial intelligence," which is defined as AI "that can generate derived synthetic content, such as text, images, video, and audio, that emulates the structure and characteristics of the artificial intelligence's training data." The law also adopts a common definition for AI that we have seen under other laws, such as the EU AI Act, Colorado's AI law, and the recently passed California AI Transparency Act. AI is "an engineered or machine-based system that varies in its level of autonomy and that can, for explicit or implicit objectives, infer from the input it receives how to generate outputs that can influence physical or virtual environments."

When does the law go into effect? The law applies to generative AI released on or after January 1, 2022, and developers must comply with its provisions by January 1, 2026.

What do developers need to do for compliance? If a developer makes a generative AI system publicly available to Californians, it must post on its website documentation regarding the data used to train the system or service. The elements that the developer must include on the website are:

The sources or owners of the datasets;
A description of how the datasets further the intended purpose of the AI system or service;
The number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets;
A description of the types of data points within the datasets (e.g., types of labels used or general characteristics);
Whether the datasets include any data protected by copyright, trademark, or patent or whether the datasets are entirely in the public domain;
Whether the developer purchased or licensed the datasets;
Whether the datasets include "personal information" or "aggregate consumer information" as those terms are defined under the California Consumer Privacy Act;
Whether the developer cleaned, processed, or modified the datasets and the intended purpose of those efforts in relation to the AI system or service;
The time period during which the data in the datasets was collected, including a notice if the data collection is ongoing;
The dates the datasets were first used during the development of the AI system or service; and
Whether the generative AI system or service used or continuously uses synthetic data generation in its development. The developer may include in its answer a description of the synthetic data's functional need or desired purpose based on the intended purpose of the AI system or service.

Are there any exemptions? Yes. The law does not apply to generative AI systems or services (A) whose sole purpose is to help ensure security and integrity, such as AI intended to detect security incidents; resist malicious, deceptive, fraudulent, or illegal actions; and ensure the physical safety of natural persons; (B) whose sole purpose is to operate aircraft in the national airspace; and (C) developed for national security, military, or defense purposes and that are made available only to a federal entity.

Takeaways

California's new AI law underscores the importance of AI developers maintaining a data provenance record that traces the lineage of data used to train AI systems and taking steps to be transparent about how they develop AI, including through trust centers on websites. Developers should consider adopting technology that automates this process in order to operate at scale. Moreover, companies that integrate their AI offerings with a foundation model should consider the impact of this new law because it could apply to developers that fine-tune or retrain AI systems or services.

Visit us at mayerbrown.com

Mayer Brown is a global services provider comprising associated legal practices that are separate entities, including Mayer Brown LLP (Illinois, USA), Mayer Brown International LLP (England & Wales), Mayer Brown (a Hong Kong partnership) and Tauil & Chequer Advogados (a Brazilian law partnership) and non-legal service providers, which provide consultancy services (collectively, the "Mayer Brown Practices"). The Mayer Brown Practices are established in various jurisdictions and may be a legal person or a partnership. PK Wong & Nair LLC ("PKWN") is the constituent Singapore law practice of our licensed joint law venture in Singapore, Mayer Brown PK Wong & Nair Pte. Ltd. Details of the individual Mayer Brown Practices and PKWN can be found in the Legal Notices section of our website. "Mayer Brown" and the Mayer Brown logo are the trademarks of Mayer Brown.

This Mayer Brown article provides information and comments on legal issues and developments of interest. The foregoing is not a comprehensive treatment of the subject matter covered and is not intended to provide legal advice. Readers should seek specific legal advice before taking any action with respect to the matters discussed herein.

California Passes New Generative Artificial Intelligence Law Requiring Disclosure Of Training Data

Contributor

Takeaways

Technology

Contributor

United States