There can be no artificial intelligence (AI) system without massive data, and yet the relationship between the creation of personal learning databases and data protection can be tricky.

With the ambition of clarifying this articulation, the CNIL has published initial guidelines - subject to consultation - intended to define the conditions of compatibility of the constitution of learning databases and the development of AI systems, with the GDPR1.

The guidelines propose an application of the "principles relating to the processing of personal data" to the context of AI system development, articulated around (i) system design (including identification of necessary data), (ii) creation of the database, (iii) learning.

The "purpose limitation" principle - consisting of a requirement for a specific, explicit and legitimate purpose - is central in that it conditions the application of other essential principles: (i) transparency, since informing data subjects presupposes a clearly defined processing objective, (ii) minimization, requiring that only the data necessary for the purpose of processing be processed, and (iii) the principle of limiting storage periods, which must be defined according to the objective pursued.

While the operational use of the AI system may not be clearly defined from the development phase onwards, the purpose of processing during the development phase can only be considered specific, explicit and legitimate if it is sufficiently precise, i.e. if it refers cumulatively to:

  • the type of system being developed, e.g. an image-generating AI system, with a clear, intelligible presentation;
  • the functionalities and capabilities that are technically feasible, including, according to CNIL recommendations, the foreseeable capabilities most at risk (e.g. processing health data), and the functionalities excluded by design, as well as the conditions of use.

This interpretation rules out any overly general definition of purpose, such as "development and improvement of AI systems".

Regarding the qualification of AI system providers, it should be noted that "the fact that the same database is used for different customers, in the context of different services, is generally a decisive indication that the provider is the controller of the processing implemented for the creation of the database".

Any processing operation constituting and using a learning database presupposes the determination of a legal basis (often the consent of the data subject or legitimate interest of the controller), it being specified that in the case of re-use of data, a compatibility test must be carried out between the initial purpose of collection and this re-use, unless the re-use and its associated purpose have been priorly provided for - in a sufficiently precise manner - and brought to the attention of the data subjects.

The principle of minimisation translates into a selection of data (volume, categories, typology, source), without however excluding the possibility of processing a set of data indiscriminately, provided this is necessary and justified.

We recommend setting up a multi-disciplinary and independent ethics committee for the development of AI systems with a priorly defined governance's process.

Data retention periods must be assessed on a phase-by-phase basis, and only for data that is strictly necessary for the phase justifying its processing. According to the CNIL, the development phase may require more data than the product maintenance phase.

The conditions for linking this principle with those of the draft IA Act regulation, and in particular with governance obligations (violation of which will incur the heaviest penalties!), are specified, thanks to the provision in appendix of a model framework to complete in order to demonstrate the traceability of the data sets used and intended to demonstrate in particular that the data have been collected in a lawful manner.

The conditions for reconciling the imperatives of innovation on the one hand, and of data protection on the other, are becoming clearer, with greater clarity and security for manufacturers, and confidence for individuals.

Footnote

1 CNIL's 11-1O-2023 AI Practical data sheet

Article also published on DSIH.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.