On 11 October 2023, the French data protection authority (the "CNIL") adopted Guidelines on the interplay between the GDPR and the use of AI (hereafter the "Guidelines").

Structured around several key topics, the Guidelines aim to promote a responsible use of AI while protecting the personal data of individuals. In this regard, the CNIL highlights the interplay between the GDPR and the development of AI, including how AI must comply with the general principles of the GDPR.

The Guidelines are currently open for public consultation and the CNIL intends to adopt two additional sets of guidelines on AI in the near future.

1. Scope of the Guidelines
The Guidelines provide recommendations and clarifications exclusively within the context of the AI system development phase and the creation of databases for their training, thus excluding the deployment phase. More specifically, the development phase is composed of (i) the choice of the AI system's design, (i) the creation of the database and by (iii) the learning phase (training, validation, and testing).

The Guidelines apply where the data used for training an AI system are in whole or in part personal data, within the meaning of the GDPR.

The Guidelines apply to AI machine learning systems (i.e., supervised, unsupervised and reinforcement learning) and to AI systems based on logic and knowledge (i.e., inductive (logical) programming, knowledge-based systems, inference and deduction engines, symbolic reasoning, and expert systems), it being specified that general purpose AI are also included within the scope of the Guidelines.

2. Determine the applicable legal framework
To determine the legal framework applicable to the data processed during the development phase, two scenarios are to be distinguished:

  • The operational use of the AI system is clearly identified from the development phase.

In such case, if the processing carried out during the development phase serves exclusively the same purpose as that of the deployment phase, it is possible to consider that they fall under the same legal framework.

  • The operational use of the AI system is not clearly defined from the development phase (general-purpose AI systems)

The development phase and the deployment phase of the AI system may not fall under the same legal framework as some AI systems (e.g. general-purpose AI systems) are developed without a specific operational use. Therefore, it is not always possible to clearly identify the purpose of the processing during the deployment phase. In this event, it is generally considered, subject to a case-by-case analysis, that the development phase is subject to the GDPR.

3. Define a purpose
The CNIL distinguishes between two use cases:

  • The operational use of the AI system during the deployment phase have been identified from the development phase.

When the AI system is intended for a single operational use that has been clearly identified from the development phase, the processing carried out during the development and the deployment phases serves one and the same purpose. Consequently, if the purpose in the deployment phase is itself specified, explicit and legitimate, the CNIL considers that it equally applies to the purpose in the development phase.

  • The operational use of the AI system during the deployment phase is not clearly defined from the development phase (general-purpose AI systems)

The operational use of an AI system is not always clearly identified from the development phase, which is the case, for example, with general-purpose AI systems.

In such case, the purpose of the processing in the development phase can only be considered as specified, explicit, and legitimate if it is sufficiently precise, and if referring to:

  • The type of AI system developed (e.g., the development of a large-scale language model, a 'computer vision' system, or a generative AI system for images, videos, or audio) it being specified that such AI system must be presented in a sufficiently clear manner, considering the technical complexities and rapid advancements in this field.
  • The technically feasible functionalities and capabilities, which require the controller to compile a list of functionalities that can be reasonably foreseen as of the development phase.

Example: the development of a speech recognition model able to identify a speaker, his/her language, age, gender etc.

4. Determine the legal status of AI system providers within the meaning of GDPR.
According the CNIL, an AI system provider may be a:

  • Controller if it initiates the development of the AI system and creates a database for training purposes from data it has selected himself and on his own behalf (e.g., for commercialisation purpose).
  • Joint controller if a database is created and supplied by multiple controllers for a collectively defined purpose.
  • Processor if it develops an AI system on behalf of a client, as part of a service that is provided to the client (in such case, the client determines the purposes and means of the processing).

5. Ensure the processing of personal data is lawful
The controller must determine a legal basis on which to rely when creating a database that enables to train an AI system.

The CNIL considers that the legal basis for training an algorithm will vary on a case-by-case basis depending on the context of the processing in question. The legal basis may be consent, legitimate interest, contractual necessity, or the performance of a task carried out in the public interest.

6. Conducting a Data Protection Impact Assessment (DPIA)
Creating a data base for AI system training can pose a high risk to the rights and freedoms of individuals. In this regard, AI system providers must:

  • Identify when a DPIA is necessary.
  • Define the scope of the DPIA.
  • Consider the risks associated with the AI system (e.g., discrimination, bias, misinformation etc.).
  • Implement measures to mitigate related risks.

7. Protecting personal data as of the conception of the AI system
Developers of AI system must integrate privacy-by-design into the conception of the AI system, namely:

  • the purpose of the system that is being developed;
  • the technical architecture of the system it aims to design, which will impact the characteristics of the database on which the machine learning is based;
  • the data sources involved, meaning that the selection of data must be strictly necessary;
  • the validation of such choices, which can take various forms, such as conducting a pilot study or seeking the input of an ethics committee.

8. Protecting personal data in the collection and management process
AI developers must implement, ab initio, principles related to the protection of personal data ("privacy by design") during the development phase of the data base used to train the AI system, namely:

  • comply with the data minimisation principle;
  • data cleaning;
  • the identification of relevant data;
  • the implementation of adequate measures to mitigate risks (e.g., generalization and randomisation measures);
  • data follow-up and updates to prevent the mishandling of data;
  • determine a data period retention (in advance and monitored over time);
  • implement security measures (stream encryption, authentication methods to access the data etc.);
  • maintain documentation relating to the data used to train the AI system.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.