Incorporating AI Training Language In Confidentiality Provisions

As use of artificial intelligence (AI) becomes increasingly common in business, the need to protect confidential and sensitive information from unauthorized use in training AI tools has become a significant legal issue and business priority.

Traditionally, companies have relied on confidentiality provisions within agreements, as well as stand-alone Non-Disclosure Agreements (NDAs), to protect confidential, proprietary and sensitive information; trade secrets; and personal data by prescribing how they are to be maintained and handled. With AI technologies posing new risks, these provisions will require close review to ensure that protected information is safeguarded before being introduced into AI-driven workflows and systems.

Why Traditional NDA Language Falls Short

Most NDAs and confidentiality clauses include something similar to the following prohibitory language:

"Confidential information may not be disclosed to third parties without the discloser's prior written consent."

In the context of AI, however, this language may fail to provide adequate protection of confidential information. How so?

Since AI systems rely on large datasets for training, there is an increased risk that confidential information could be fed into AI models during this stage. AI systems also have a capacity for "memorization." Unlike people who forget information over time, AI models can perpetually retain confidential information embedded in their system. Likewise, when confidential information is inputted into AI, it may become integrated in such a way that makes it impossible to segregate, remove, destroy or return this data once training is complete (unless programmed to do so).

Therefore, even if confidential data is not proactively "disclosed," by merely using it to train AI, there's a risk that later, a third party may "discover" the information simply by doing a search using the right criteria. About a year ago, two separate Samsung employees copied proprietary code into Chat GPT, along with the query: "how do I fix or optimize this code?". Samsung determined these inputs could result in a third party discovering proprietary information from Chat GPT in a subsequent answer and decided to ban employees from using Chat GPT¹.

Key Considerations for AI-Specific Confidentiality Language

Companies that previously relied upon NDAs or confidentiality clauses to protect their confidential and proprietary information may want to consider reviewing and updating contractual protections to help mitigate the risks associated with both AI technology and the rise in data breach incidents.

Below are 7 considerations for crafting effective, AI-specific confidentiality language:

1. Restrict AI Tools from Storing or Retaining Your Data

AI technology does not always fit neatly into traditional contract language, which typically permits disclosure of confidential data to "employees" or "agents," but restricts sharing with "third parties" (without prior written consent). By explicitly prohibiting AI tools from retaining confidential information without the discloser's consent, a business can significantly mitigate the risk of unintentional disclosure.

2. Prohibit Use in AI Training

Many AI tools train off of data shared within the AI platform. Adding specific language to your confidentiality provisions that prohibits the use of confidential information to train AI models or algorithms can help to ensure that sensitive data does not find its way into AI models.

3. Include Compliance and Enforcement Requirements

Emphasize the importance of accountability and proactive risk mitigation by expanding NDAs and similar clauses to include monitoring provisions, audit rights, reporting obligations, breach notification protocols, and the consequences of legal remedies such as injunctive relief and financial penalties (e.g., indemnities) for noncompliance.

4. Address Third Party Risks Explicitly

Many AI solutions involve third party vendors which can introduce additional data vulnerabilities for customers. For example, if a customer negotiates protections with a vendor, but not with the vendor's third party AI provider (which may use AI tools), it is possible that the third party's AI tool might use the customer's data for training, unless it is explicitly prohibited from doing so.

Some approaches to mitigate this risk in NDA and confidentiality provisions including requiring data recipients to:

– Disclose any third party data processing tools they use;

– Ensure that all third parties are subject to equivalent confidentiality and data protection standards and restricted from using content for AI training;

– Agree not to transfer confidential information to third parties (including third party AI tools) without prior written consent; and

– Explicitly acknowledge the remedies, financial and otherwise (including indemnity from the vendor), for breaches by these third parties and third party tools.

5. Implement Data Classification Systems

Include a system for classifying confidential information based on different levels of sensitivity. This allows for a hierarchy of protection measures to be included, with the most sensitive information receiving the highest level of protection. For example, highly sensitive information such as healthcare or financial data might be completely prohibited from any interaction with AI systems, regardless of the purpose or any safeguards.

6. Technical & Operational Safeguards and AI Containment Measures

In addition to legal protections, you could require specific technical and operational protections throughout the lifecycle when confidential information is being handled, both directly in the AI system and in any relevant connected environments (e.g., data storage systems).

These requirements may include:

Encryption standards for data at rest and in transit
Data segregation requirements to isolate confidential information
Specialized AI monitoring tools that detect risks or vulnerabilities
Technical controls that prevent AI systems from memorizing or reproducing specific data types
Mandatory implementation of privacy-enhancing technologies
Requirements for regular security control validation testing and using best practices to ensure the protections of confidential information.

7. Create AI Confidentiality Addendums

Develop templated AI riders or addenda that can supplement existing agreements to avoid having to negotiate new or full agreements. In addition to confidentiality protections, these addenda might address accuracy standards (no hallucinations or bias), the use of third party tools (including not only AI but also open source), and AI specific IP rights (e.g., who owns the prompts and the unique answers), among other issues.

Conclusion

As AI continues to reshape the business landscape, it will be essential for companies to remain vigilant to avoid unauthorized, AI-driven data exposure of proprietary information which can lead to competitive harm, regulatory violations and potential legal exposure.

By incorporating precise, AI-specific language in NDAs, embedded confidentiality provisions and/or addenda, businesses can better protect their confidential information, mitigate potential issues before they arise, and build trust in an increasingly AI-driven world.

Finally, customers can take their own precautionary operational steps to protect confidential information by adopting a strict policy against uploading highly sensitive information in the first instance.

Footnotes

1. See articles in PC Mag, CNBC and Cybersecurity Hub.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

Incorporating AI Training Language In Confidentiality Provisions

Contributor

Why Traditional NDA Language Falls Short

Key Considerations for AI-Specific Confidentiality Language

Below are 7 considerations for crafting effective, AI-specific confidentiality language:

Conclusion

Technology

Contributor

United States