Balancing innovation with GDPR compliance
In December 2024, the European Data Protection Board (EDPB) and the UK Information Commissioner’s Office (UK ICO) separately published significant guidance on the application of the GDPR to AI.
The EDPB’s Opinion 28/2024 had been much anticipated and generated significant media coverage, with headlines such as ‘AI developers don’t need permission to scoop up data, EU data watchdogs say’ (Politico). The UK ICO’s response to its year-long consultation on privacy issues in generative AI may have attracted less attention, but it also marked a significant development in how businesses should assess AI from a privacy perspective.
Both the EDPB and the ICO endorsed a pragmatic approach to the application of the GDPR to the novel challenges presented in the development and deployment of AI models. However, both made clear their strict expectations on certain issues. In particular:
- Legitimate interests: The use of personal data for developing and deploying AI models requires the clear articulation of a lawful and specific interest, an ability to demonstrate the necessity of the processing for that interest, and a balancing of that interest against individuals’ rights. In the context of web scraping, the ICO pulled back from its previously permissive view that this data was generally necessary to train LLMs, while the EDPB emphasised specific measures that may be relevant to mitigating risks to individuals from the use of this data.
- Individual rights: Facilitating the exercise of individual rights is key, including with respect to training data sets and models that contain personal data. Both the UK ICO and EDPB stressed the importance of the right to opt-out where legitimate interest is relied on as a lawful basis.
- Accountability and transparency: Ensuring robust documentation (including DPIAs) and transparency about data usage are imperative to meet GDPR obligations. However, both the EDPB and the ICO emphasised in different contexts that transparency may not be a complete solution to some AI-related challenges, including ensuring that processing aligns with legitimate expectations and that models are sufficiently accurate considering their purpose.
- Anonymisation: The EDPB made clear that providers claiming that their AI models only process anonymised data shall be able to demonstrate that the likelihood of direct or indirect identification of individuals via the model is negligible, considering all reasonably likely means of re-identification. While the UK ICO did not address this issue, the EDPB position is consistent with more general ICO guidance.
The Opinion and Consultation Response both explicitly avoided the topic of special category data in training data sets and also side-stepped issues such as automated decision-making. Given the continued rapid developments in AI technology, the increasing integration of AI across organisations, and these gaps in the guidance, neither the Opinion or Consultation Response should be treated as the final word.
In this blog, we consider the practical implications of the Opinion and Consultation Response and offer key takeaways for businesses developing and deploying AI to ensure GDPR compliance.
Practical Implications
Legitimate interests as a legal basis
The EDPB’s guidance and UK ICO’s Consultation Response reinforce the viability of legitimate interests as a legal basis for personal data processing in various AI contexts. However, both the EDPB and UK ICO spotlighted the focus with which that analysis must be undertaken in the context of the development and deployment of AI systems.
Considering each step of the legitimate interest analysis in turn:
- Both the EDPB and UK ICO emphasise the importance of identifying and documenting a lawful and specific interest (eg developing conversational agents or fraud/threat detection systems). In this context, both the EDPB and UK ICO explicitly noted that data processing that breaches other legal requirements (such as intellectual property laws) will also be unlawful under data protection law.
- Prove the necessity of personal data for achieving the stated interest and evaluate whether less intrusive means are available. On this point, the ICO pulled back from its previously permissive view that web data was generally necessary to train LLMs, instead stating that businesses must be able to demonstrate the necessity of using these data sets. The EDPB underlined the need to take into account the broader context of the processing when considering necessity, including whether the data is first or third-party data and any technical measures to safeguard the data.
- Conduct a balancing test to ensure data subjects’ rights are not overridden.
- In this context, the EDPB accentuated the importance of various factors, including the nature of data and reasonable expectations of data subjects, while noting that merely meeting the GDPR transparency requirements does not make certain data processing reasonably expected.
- The EDPB also identified various measures businesses can take that would be potentially relevant to the balancing exercise in both the development and deployment phase. For example, in the context of web scraped data, this may include respecting robots.txt or ai.txt protocols.
- The EDPB also emphasised the relevance to the balancing exercise of the potential benefits to data subjects in using AI.
Both sets of guidance suggest a risk of heightened scrutiny on businesses relying on legitimate interest as a lawful basis. In this context, adequate documentation, including Data Protection Impact Assessments and Legitimate Interest Assessments remain a valuable tool to mitigate regulatory risk.
Importance of enabling the exercise of individual rights
Both the EDPB’s guidance and UK ICO’s consultation response underscored the importance of respecting individual rights in the development and deployment of AI systems.
The UK ICO re-iterated its previous guidance that it is vital that, across the AI lifecycle, organisations have processes in place to enable and record people exercising their information rights. The UK ICO highlighted particular concerns with respect to web-scraped personal data. Of particular note, the UK ICO highlighted that many respondents to the consultation had mentioned output filters as a useful tool for implementing information rights. However, the UK ICO concluded that such filters ‘may not be sufficient, as they do not actually remove the data from the model’. The UK ICO also noted the limits of Article 11 as a basis for avoiding data subject rights, noting the need to consider rights requests on a case-by-case basis.
Both the EDPB and the UK ICO emphasise the importance of the right to opt-out where legitimate interest is relied on as a lawful basis. The EDPB highlights a specific approach with respect to web-scraped data of creating an opt-out list based on the identification of specific websites. Without expressing a view on the availability of the argument, both the EDPB and UK ICO also note the possibility that opt-out requests may be overridden by compelling legitimate grounds.
Ensuring accountability and transparency
Both the EDPB and the UK ICO refer in several places to the importance of ensuring robust documentation (including DPIAs) and transparency about data usage.
The UK ICO previous guidance did not suggest any change from the position articulated in its First Call for Evidence and DPIA Guidance that AI-related processing is generally a high-risk activity that requires a DPIA. The EDPB was less emphatic on this point, noting the importance of DPIAs as an element of accountability, but referencing the existing WP29 Guidelines on Data Protection Impact Assessment that do not explicitly deal with AI.
Transparency was a theme that was interwoven in various aspects of the Opinion and Consultation Response. The UK ICO was emphatic that it expects ‘developers and deployers to substantially improve how they fulfil their transparency obligations towards people, in a way that is meaningful rather than a token gesture’. Noting also that, ‘[w]here insufficient transparency measures contribute to people being unable to exercise their rights, generative AI developers are likely to struggle to pass the [legitimate interest] balancing test.' The EDPB likewise highlighted the importance of transparency in overcoming the risk of an information asymmetry between AI developers and deployers and data subjects, noting though that the mere fulfilment of transparency requirements is not necessarily sufficient to ensure a processing activity is within a data subject’s reasonable expectations.
Ensuring anonymisation
The EDPB acknowledges a broad understanding of personal data under the GDPR by emphasising that personal data may remain ‘absorbed’ in the parameters of an AI model. This is consistent with the UK ICO’s view that models may contain personal data. For both the EDPB and UK ICO, the question is fact specific.
The EDPB Opinion encourages supervisory authorities in Europe to scrutinize claims of anonymisation taking into account the following considerations:
- Whether personal data can be extracted from the model’s parameters or outputs taking into account ‘all the means reasonably likely to be used’ by the user of an AI system or another person to identify individuals. In this context, the EDPB appears to endorse different anonymisation standards for AI models depending on whether they are accessible within a business or more broadly.
- Measures taken during the model’s development to minimise identifiability, such as differential privacy techniques, data filtration and other robust data minimisation strategies.
- Regular AI model testing against widely known, state-of-the-art re-identification attacks such as attribute and membership inference, exfiltration, regurgitation of training data, model inversion or reconstruction attacks.
- Documentation on adherence with anonymisation standards, including internal and external audits and evaluations, code reviews and theoretical analysis documenting the appropriateness of the implemented measures. The Opinion also sets out the ‘ideal’ content of this documentation, including:
- Any information relating to DPIAs, including any assessments and decisions that determined that a DPIA was not necessary.
- Information on technical and organisational measures to reduce re-identification risk (including threat model and risk assessments on which these measures are based, specific measures for each source of training data sets, including relevant source URLs and descriptions of measures taken).
- Any advice or feedback provided by the data protection officer.
- Documentation regarding theoretical resistance to re-identification techniques and controls to limit or assess the success and impact of main attacks (including ratio between amount of training data and parameters).
- Metrics on the likelihood of re-identification, including detailed test reports and results.
- Documentation provided to model deployers and/or data subjects.
Conclusion and key takeaways
There are clear risks associated with GDPR compliance in the development and deployment of AI models. In addition to fines, the EDPB’s Opinion highlights the possibility that where personal data is unlawfully used, EU regulators may require the deletion of the model or restrict its deployment, provided that the model includes personal data. The temporary ban of a generative AI tool by the Italian data protection regulator in March 2023 underscores this enforcement risk.
That said, the EDPB’s Opinion and UK ICO’s Consultation Response offer some hope of a pragmatic, pro-innovation approach to compliance—albeit one that is concerned to maintain the role of each relevant GDPR obligation. In this context, businesses should look for opportunities to:
- Robustly test anonymisation techniques: Efforts to minimise the processing of personal data are important. But anonymisation claims should be made with care. Businesses looking to rely on such claims may wish to look to advanced techniques like differential privacy to meet the high standards of EU regulators.
- Strengthen governance: Establish internal policies for audits, data protection impact assessments, and legitimate interest assessments to ensure accountability. These documents can be an effective mitigant against regulatory enforcement risks.
- Verify data provenance: Conduct due diligence on third-party data sources to confirm lawful data processing.
- Adapt to evolving standards: Stay informed about emerging risks and best practices, and update privacy measures accordingly. The Opinion and Consultation Response highlight the pace of technological change and rapidly evolving regulatory standards.