ARTICLE
27 January 2026

AI Legal Updates: California's AI Training Data Transparency Law Takes Effect

DL
Davis+Gilbert LLP

Contributor

Davis+Gilbert LLP is a strategically focused, full-service mid-sized law firm of more than 130 lawyers. Founded over a century ago and located in New York City, the firm represents a wide array of clients – ranging from start-ups to some of the world's largest public companies and financial institutions.
A new California law, effective January 1, 2026, marks a significant shift toward AI transparency that could reshape how developers operate nationwide.
United States California Technology
Gary Kibel’s articles from Davis+Gilbert LLP are most popular:
  • with readers working within the Pharmaceuticals & BioTech industries
Davis+Gilbert LLP are most popular:
  • within Finance and Banking and Information Technology and Telecoms topic(s)

The Bottom Line

  • California's AB 2013, effective January 1, 2026, requires developers of generative AI systems to publicly disclose detailed information about the data used to train their models, including dataset sources, types of data, whether copyrighted materials were used, and whether personal information is included.
  • Developers must post the required information on their websites before making a covered AI system publicly available and update that disclosure whenever a substantial modification to the AI system is made.
  • Certain exemptions apply, including AI systems used solely for security and integrity purposes, national airspace operations, or national security and defense purposes.

A new California law, effective January 1, 2026, marks a significant shift toward AI transparency that could reshape how developers operate nationwide.

California's Generative AI Training Data Transparency Act (TDTA), also known as AB 2013, imposes significant new transparency obligations on generative AI developers, requiring them to publicize details about how their training data was sourced and what that training data includes.

The Act arrives at a critical moment. Mandated disclosure of details about AI training data will have significant implications on many of the lawsuits currently pending against AI developers, in which plaintiffs allege that the AI developers misappropriated their pre-existing intellectual property and personal data to build and train generative AI systems, without their knowledge or consent.

Requiring developers to make their training data public will now make it easier to determine whether generative AI systems do, in fact, violate any third-party proprietary rights.

What the Law Requires: Core Disclosure Obligations

The TDTA requires developers of generative AI systems to publicly disclose detailed information about the data used to train their models, including where data sets were obtained, the types of data they include, and whether the data includes copyrighted materials and personal information. For purposes of the law, "training" includes testing, validating, or fine-tuning the AI system.

Covered AI providers must post documentation containing these details on their websites before making their AI systems publicly available or releasing any substantial modification to their AI systems – such as a new version or release, or an update that materially changes the AI system's functionality or performance, including as a result of retraining or fine tuning.

The law specifically relates to generative AI systems, defined as AI that can generate synthetic content — text, images, video, and audio — that emulates the structure and characteristics of its training data. Any generative AI system or service made available to Californians since January 1, 2022 is covered under the law, casting a wide net over existing generative AI platforms in the marketplace as well as those which have yet to be built.

Who is Impacted?

The TDTA applies to "developers" of generative AI systems, defined as any person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an AI system or service for use by members of the public. This captures:

  • Major AI companies like OpenAI, Anthropic, and Google
  • Smaller developers building generative AI tools
  • State or local government entities creating public-facing AI systems

However, internal AI systems available only within an entity's corporate family (such as affiliates and subsidiaries) are exempt, as they're not accessible to members of the public.

Required Documentation: What You Must Disclose

Covered developers must post a "high-level summary of the datasets" used to train the generative AI system or service on their websites. This summary must cover a broad range of information, including:

  • Origin: The sources and owners of the datasets
  • Purpose alignment: How datasets support the AI system's intended function
  • Data volume: The number of data points (may be expressed as a range or estimate)
  • Data types: A description of data points, including label types for labeled datasets and general characteristics for unlabeled datasets
  • Intellectual property status: Whether datasets include copyrighted, trademarked, or patented materials, or are entirely public domain
  • Commercial arrangements: Whether datasets were purchased or licensed
  • Personal information: Whether datasets contain personal information as defined under the California Consumer Privacy Act (CCPA)
  • Aggregate consumer information: Whether datasets include aggregate consumer information under the CCPA
  • Data processing: Any cleaning, processing, or modifications to datasets, including intended purposes
  • Collection timeframe: The period during which data was collected, with notice if collection is ongoing
  • Usage dates: When datasets were first used in development
  • Synthetic data: Whether the system used or continuously uses synthetic data generation, with optional description of functional need or purpose

Key Exemptions

Several key exemptions exist for specialized AI applications. Developers are not required to post training data documentation for:

  • Generative AI systems whose sole purpose is to help ensure data security and integrity, physical safety, or operation of aircraft in national airspace, or
  • Generative AI systems provided exclusively to a federal government entity for national security, military, or defense purposes.

Broader Context and Next Steps

The TDTA reflects a growing trend toward mandating transparency in AI development, particularly regarding training data. Concerns over copyright infringement, the use of personal information, and a lack of visibility into how AI models are built have driven legislative action across multiple jurisdictions.

California's approach focuses on disclosure rather than substantive restrictions, enabling consumers, creators and regulators to understand what data has been used to train AI systems made available to Californians. However, unlike some other state AI laws that impose civil penalties for violations, the text of the TDTA does not establish a specific enforcement mechanism or penalty structure for those who do not comply.

Leading AI companies are already complying with the TDTA. OpenAI, Anthropic, and Google each published the required documentation on their websites by January 1, with varying degrees of detail.

Not all AI developers are eager to comply. xAI filed a lawsuit against the California Attorney General shortly before the TDTA went into effect, challenging its constitutionality. And, in light of a recent executive order issued by the White House, there is also the possibility of a federal challenge to this state law.

Notwithstanding these uncertainties, calls for increased transparency around generative AI training are only gaining momentum across U.S. states and abroad, and laws like the TDTA will likely soon become widespread in multiple jurisdictions. It is advisable for AI developers to take steps now to navigate these new requirements, to avoid negative repercussions or potential obstacles in the future.

Developers of generative AI systems should begin preparing for compliance by taking the following steps:

  • Conduct an inventory of all generative AI systems currently offered or planned for release to Californians.
  • Document the information required under the TDTA, including training data sources, types, and any modifications made to datasets.
  • Assess whether any exemptions may apply to specific AI systems.
  • Prepare a written report itemizing or summarizing the information required under the TDTA, to be published online.
  • Develop a process for updating disclosures when substantial modifications are made.
  • Monitor legal challenges and federal developments that could affect the applicability or enforceability of state AI laws.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

[View Source]

Mondaq uses cookies on this website. By using our website you agree to our use of cookies as set out in our Privacy Policy.

Learn More