GDPR

Adatvédelem mindenkinek / Data protection for everyone

Development and deployment of AI models in a GDPR-compliant way

2024. december 19. 11:45 - poklaszlo

The EDPB published its long-awaited opinion

The European Data Protection Board (EDPB), based on the request of the Irish Supervisory Authority, issued its opinion under Art. 64(2) GDPR on certain data protection aspects related to the processing of personal data in the context of AI models (the "Opinion"). 

Below, I briefly summarise the main points of the Opinion and its potential impact on the development and deployment of AI models in the European Union. 

1. What topics are covered in the Opinion?

The Opinion covers the following main topics: 

  • Anonymity of AI models (assessment of “anonymity” and how to demonstrate it),
  • Legitimate interest as a legal basis for the development and for the deployment of AI models,
  • Consequences of unlawful data processing in the development phase on the subsequent processing or operation of the AI model.

On the other side, there are several relevant topics that are not covered in the Opinion but as the EDPB also highlights in the Opinion, such topics are also very important in the course of assessing the data protection compliance of AI models. These topics are as follows:  

  • Processing of special categories of data (Art. 9 GDPR);
  • Automated-decision making, including profiling (Art. 22 GDPR);
  • Compatibility of purposes (Art. 6(4) GDPR);
  • Data protection impact assessments (Art. 35 GDPR);
  • Principle of data protection by design (Art. 25(1) GDPR).

 2. What is the scope of the Opinion?

The Opinion focuses only on the subset of AI models that are the result of a training of such models with personal data. (In this training process, the models learn from data to perform their intended task.)

The Opinion acknowledges that there are various stages within the “life-cycle” of AI models, inter alia, the "creation", "development", "training", "update", "fine-tuning", "operation" or "post-training" of AI models. However, for the purposes of the Opinion, the EDPB distinguishes between the development phase and the deployment phase. The development phase covers all stages before any deployment of the AI model, and includes, inter alia, code development, collection of training personal data, pre-processing of training personal data, and training. The deployment phase covers all stages relating to the use of an AI model and may include any operations conducted after the development phase.

It´s also important that the Opinion makes a distinction between "first-party data" and "third-party data". In this context, "first-party data" refers to personal data which the controller has collected from the data subjects and "third-party data" refers to personal data that controllers have not obtained from the data subjects, but collected or received from a third party, for example from a data broker or collected via web scraping.  

3. Under what circumstances could AI models be considered anonymous and how to demonstrate this?

The question whether AI models contain personal data or not is a highly disputed topic, both from a legal and also from a technological perspective. The Hamburg Data Protection Authority issued a discussion paper about this topic a few months ago. 

The discussion papaer of the Hamburg Data Protection Authority contained three basic theses:  

  1. The mere storage of an LLM does not constitute processing within the meaning of article 4 (2) GDPR. This is because no personal data is stored in LLMs. Insofar as personal data is processed in an LLM-supported AI system, the processing must comply with the requirements of the GDPR. This applies in particular to the output of such an AI system.
  2. Given that no personal data is stored in LLMs, data subject rights as defined in the GDPR cannot relate to the model itself. However, claims for access, erasure or rectification can certainly relate to the input and output of an AI system of the responsible provider or deployer.
  3. The training of LLMs using personal data must comply with data protection regulations. Throughout this process, data subject rights must also be upheld. However, potential violations during the LLMs training phase do not affect the lawfulness of using such a model within an AI system.

 (For more details, see my blog post about the topic here.)

The Opinion distingushes (i) AI models that are specifically designed to provide personal data regarding individuals whose personal data were used to train the model, or in some way to make such data available, and (ii) AI models that are not designed to provide personal data related to the training data.  

In case of AI models belonging to category (i) above (i.e. AImodels specifically designed to provide personal data regarding individuals whose personal data were used to train the model) "will inherently (and typically necessarily) include information relating to an identified or identifiable natural person, and so will involve the processing of personal data. Therefore, these types of AI models cannot be considered anonymous. This would be the case, for example, (i) of a generative model fine-tuned on the voice recordings of an individual to mimic their voice; or (ii) any model designed to reply with personal data from the training when prompted for information regarding a specific person." (Opinion, Point 29, p.13)

After making the above statement, the Opinion focuses on AI models belonging to category (ii) above (i.e. AI models that are not designed to provide personal data related to the training data) and considers that "whenever information relating to identified or identifiable individuals whose personal data was used to train the model may be obtained from an AI model with means reasonably likely to be used, it may be concluded that such a model is not anonymous." (Opinion, Point 31, p.13) This means that "AI models trained on personal data cannot, in all cases, be considered anonymous. Instead, the determination of whether an AI model is anonymous should be assessed, based on specific criteria, on a case-by-case basis." (Point 34 of the Opinion)

The Opinion also elaborates on considerations that the competent authorities should take into account when they assess the anonymity of AI models. For an AI model to be considered anonymous, using reasonable means, both (i) the likelihood of direct (including probabilistic) extraction of personal data regarding individuals whose personal data were used to train the model; as well as (ii) the likelihood of obtaining, intentionally or not, such personal data from queries, should be insignificant for any data subject. The likelihood should be assessed taking into account "all the means reasonably likely to be used" by the controller or another person and should also consider unintended (re)use or disclosure of the model. (Point 43 of the Opinion)

The Opinion also provides a non-prescriptive and non-exhaustive list of elements to evaluate the residual likelihood of identification, i.e. other approaches may also be possible if they offer an equivalent level of protection.

The list of evaluation criteria provided by the EDPB contains the following elements: 

a) AI model design

  • Selection of sources used to train the AI model: it includes, among other things, (i) the appropriateness of the selection criteria; (ii) the relevance and adequacy of the chosen sources considering the intended purpose(s); and (iii) whether inappropriate sources have been excluded.
  • Data Preparation and Minimisation: in particular: (i) whether the use of anonymous and/or personal data that has undergone pseudonymisation have been considered; and (ii) where it was decided not to use such measures, the reasons for this decision, taking into account the intended purpose; (iii) the data minimisation strategies and techniques employed to restrict the volume of personal data included in the training process; and (iv) any data filtering processes implemented prior to model training intended to remove irrelevant personal data.
  • Methodological choices regarding the training: including, among others: (i) whether that methodology uses regularisation methods to improve model generalisation and reduce overfitting; and, crucially, (ii) whether the controller implementedappropriate and effective privacy-preserving techniques (e.g. differential privacy).
  • Measures regarding outputs of the model

b) AI model analysis, including the analysis of reports of code reviews, as well as atheoretical analysis documenting the appropriateness of the measures chosen to reduce the likelihood of re-identification of the concerned model.

c) AI model testing and resistance to attacks, including, among others, structured testing against: (i) attribute and membership inference; (ii) exfiltration; (iii) regurgitation of training data; (iv) model inversion; or (v) reconstruction attacks

Proper documentation (accountability) is a key obligation of the controllers in this respect, also in line with Art. 5(2) GDPR. The EDPB also noted that 

if a SA* is not able to confirm, after assessing the claim of anonymity, including in light of the documentation, that effective measures were taken to anonymise the AI model, the SA would be in a position to consider that the controller has failed to meet its accountability obligations under Article 5(2) GDPR. Therefore, compliance with other GDPR provisions should also be considered. (Point 57 of the Opinion, *"SA" means Supervisory Authority)

According to the Opinion, Supervisory Authorities should verify whether the controller’s documentation includes (see Point 58 of the Opinion):

  1. any information relating to DPIAs, including any assessments and decisions that determined that a DPIA was not necessary;
  2. any advice or feedback provided by the Data Protection Officer (“DPO”) (where a DPO was - or should have been - appointed);
  3. information on the technical and organisational measures taken while designing the AI model to reduce the likelihood of identification, including the threat model and risk assessments on which these measures are based. This should include the specific measures for each source of training datasets, including relevant source URLs and descriptions of measures taken (or already taken by third-party dataset providers);
  4. the technical and organisational measures taken at all stages throughout the lifecycle of the model, which either contributed to, or verified the lack of personal data in the model;
  5. the documentation demonstrating the AI model’s theoretical resistance to re-identification techniques, as well as the controls designed to limit or assess the success and impact of main attacks (regurgitation, membership inference attacks, exfiltration, etc.). This may include, in particular: (i) the ratio between the amount of training data, and the number of parameters in the model, including the analysis of its impact on the model; (ii) metrics on the likelihood of re-identification based on the current state-of-the-art; (iii) reports on how the model has been tested (by whom, when, how and to which extent) and (iv) the results of the tests;
  6. the documentation provided to the controller(s) deploying the model and/or to data subjects, in particular the documentation related to the measures taken to reduce the likelihood of identification and regarding the possible residual risks.

4. Can legitimate interest be relied on as a legal basis for the development and deployment of AI models?

In assessing the applicability of legitimate interest as a possible legal basis for the development and deployment of AI models, the Opinion builds on EDPB´s Guidelines 1/2024 on processing of personal data based on Article 6(1)(f) GDPR.

Some specific considerations of the data protection principles are mentioned in the Opinion in the context of the development and deployment of AI models (see Points 61-65 of the Opinion). This is followed by some considerations on the three steps of the legitimate interest assessment in the context of the development and deployment of AI models

In order to rely on the legitimate interest as a legal basis, controllers shall conduct a three-step analysis, called legitimate interest assessment (LIA). These cumulative conditions for a valid legitimate interest of data processing are as follows:

  1. the pursuit of a legitimate interest by the controller or by a third party;
  2. the processing is necessary to pursue the legitimate interest; and
  3. the legitimate interest is not overridden by the interests or fundamental rights and freedoms of the data subjects

Step 1: Pursuit of a legitimate interest:

The interest must be

  • lawful,
  • clearly and precisely articulated, and
  • real and present, not speculative. 

According to the Opinion, the following examples may constitute a legitimate interest in the context of AI models: (i) developing the service of a conversational agent to assist users; (ii) developing an AI system to detect fraudulent content or behaviour; and (iii) improving threat detection in an information system. (Point 69 of the Opinion)

Step 2: Necessity test: 

The assessment of necessity entails two elements:

  1. whether the processing activity will allow the pursuit of the purpose, and
  2. whether there is no less intrusive way of pursuing this purpose. 

In the context of AI models, among others, "the intended volume of personal data involved in the AI model needs to be assessed in light of less intrusive alternatives that may reasonably be available to achieve just as effectively the purpose of the legitimate interest pursued. If the pursuit of the purpose is also possible through an AI model that does not entail processing of personal data, then processing personal data should be considered as not necessary." (Point 73 of the Opinion)

"The assessment of necessity should also take into account the broader context of the intended processing of personal data", including the fact whether the controller has a direct relationship with the data subjects (first-party data) or not (third-party data). (Point 74 of the Opinion)

In addition to the above, "implementing technical safeguards to protect personal data may also contribute to meet the necessity test" (e.g. using anonymization techniques). (Point 75 of the Opinion)

Step 3: Balancing test:

This step consists in identifying and describing the different opposing rights and interests at stake, i.e. on the one side the interests, fundamental rights and freedoms of the data subjects, and on the other side the interests of the controller or a third party. The specific circumstances of the case should then be considered to demonstrate that legitimate interest is an appropriate legal basis for the processing activities at stake. (Point 76 of the Opinion)

The following considerations might be especially relevant for the balancing test:

  • Data subjects´ interests specific to the given phase: the risks to different fundamental rights associated with the intended purposes of the AI model must be assessed. Positive effects of the deployment of AI models can also be considered.
  • The impact of the processing on the data subjects is also relevant, including the nature of the data processed by the models;  the context of the processing; and the further consequences that the processing may have.
  • Reasonable expectations of data subjects: it may be difficult for data subjects to understand the variety of potential uses of an AI model and the data processing involved; the wider context of processing is also important; the expectations may vary for the different phases.
  • Mitigating measures: the Opinion provides a non-prescriptive and non-exhaustive list of mitigating measures that might be applicable (e.g. pseudonymization, using “fake”/synthetic data, transparency measures, facilitating the exercise of data subjects´ rights, etc.).

5. What are the consequences of unlawful data processing in the development phase on the subsequent processing or operation of the AI model?

As AI models are generally developed to further use them in different AI systems, it is a very important question what happens if the AI model was developed in a way breaching data privacy rules. The Opinion considers 3 scenarios in this context:

  • Scenario 1: A controller unlawfully processes personal data to develop the model, the personal data is retained in the model and is subsequently processed by the same controller (for instance in the context of the deployment of the model)
  • Scenario 2: A controller unlawfully processes personal data to develop the model, the personal data is retained in the model and is processed by another controller in the context of the deployment of the model
  • Scenario 3: A controller unlawfully processes personal data to develop the model, then ensures that the model is anonymised, before the same or another controller initiates another processing of personal data in the context of the deployment

In case of Scenario 1 (i.e. the personal data is retained in the model and is subsequently processed by the same controller): 

  • The power of the SA to impose corrective measures on the initial processing would in principle have an impact on the subsequent processing (e.g. if the SA orders the controller to delete the personal data that was processed unlawfully, such corrective measures would not allow the latter to subsequently process the personal data that was subject to the measures).
  • Whether the development and deployment phases involve separate purposes (thus constituting separate processing activities) and the extent to which the lack of legal basis for the initial processing activity impacts the lawfulness of the subsequent processing, should be assessed on a case-by-case basis, depending on the context of the case (e.g. the fact that the initial processing was unlawful should be taken into account in the legitimate interest assessment).

    In case of Scenario 2 (i.e. the personal data is retained in the model and is processed by another controller in the context of the deployment of the model): 

    • Ascertaining the roles assigned to these different actors under the data protection framework is an essential step in order to identify which obligations under the GDPR apply and who is responsible for those obligations.
    • SAs should assess the lawfulness of the processing carried out by (i) the controller that originally developed the AI model; and (ii) the controller that acquired the AI model and processes the personal data by itself.
    • The corrective measures to be applied by the SA on the controller that developed the AI model might have an important impact on the subsequent use of the AI model.
    • SAs should take into account whether the controller deploying the model conducted an appropriate assessment to demonstrate compliance with Article 5(1)(a) and Article 6 GDPR, to ascertain that the AI model was not developed by unlawfully processing personal data. The degree of the assessment of the controller and the level of detail expected by SAs may vary depending on diverse factors, including the type and degree of risks raised by the processing in the AI model during its deployment in relation to the data subjects whose data was used to develop the model.

    The Opinion makes an important reference to the AI Act concerning high-risk AI systems: “The EDPB notes that the AI Act requires providers of high-risk AI systems to draw up an EU declaration of conformity, and that such declaration contains a statement that the relevant AI system complies with EU data protection laws. The EDPB notes that such a self-declaration may not constitute a conclusive finding of compliance under the GDPR. It may nonetheless be taken into account by the SAs when investigating a specific AI model.”, see Point 131 of the Opinion)

    In case of Scenario 3 (i.e. the model is anonymised, before the same or another controller initiates another processing):

    • If it can be demonstrated that the subsequent operation of the AI model does not entail the processing of personal data, the EDPB considers that the GDPR would not apply. Hence, the unlawfulness of the initial processing should not, therefore, impact the subsequent operation of the model.
    • When the controllers subsequently process personal data collected during the deployment phase, after the model has been anonymised, the GDPR would apply in relation to these processing activities. In these cases, as regards the GDPR, the lawfulness of the processing carried out in the deployment phase should not be impacted by the unlawfulness of the initial processing.

    6. Main takeaways

    • The Opinion covers some well-defined aspects of the development and the deployment of AI models (anonymity of AI models; reliance on legitimate interest as a legal basis; consequences of unlawful data processing in the development of AI models), however, many relevant aspects (e.g. processing of special categories of data; compatibility of purposes, etc.) are not covered.
    • The Opinion mainly gives a reasonable and fair assessment of the applicable data protection requirements. Of course, providers of AI systems (including providers of GPAI models) shall meet high standards also from a data protection point of view, however, due to the risks associated with such AI models, it is an acceptable approach. 
      For more information regarding the requirements of the AI Act on general-purpose AI models (GPAI models), please see my blog post in the series of Deep Dive into the AI Act. 
    • Due to the huge diversity of AI technologies, development methods and use cases, the case-by-case analysis will play a very important role also in the future, however, the Opinion gives a good basis for such analysis (in the topics that are covered). 
    • The possibility to rely on the legitimate interest as a legal basis for the development and the deployment of AI models remains fully available and some considerations specific to the development and deployment of AI models are also mentioned in the Opinion.
    • Clarifications regarding the consequences of unlawful data processing in the development of AI models on the subsequent processing or operation of the AI model are also very important since it is a typical scenario that an AI model developed by a provider (typically, controller regarding the development phase) is used as an integrated part of an AI system deployed (used) by another party (typically, controller regarding the deployment phase). 
      Szólj hozzá!

      A bejegyzés trackback címe:

      https://gdpr.blog.hu/api/trackback/id/tr6818754854

      Kommentek:

      A hozzászólások a vonatkozó jogszabályok  értelmében felhasználói tartalomnak minősülnek, értük a szolgáltatás technikai  üzemeltetője semmilyen felelősséget nem vállal, azokat nem ellenőrzi. Kifogás esetén forduljon a blog szerkesztőjéhez. Részletek a  Felhasználási feltételekben és az adatvédelmi tájékoztatóban.

      Nincsenek hozzászólások.
      süti beállítások módosítása