What you need to know before anonymizing personal data

Artificial intelligence (’AI’) has undeniably become an integral part of our lives. We use face recognition to unlock our smartphones, AI-powered virtual assistants to manage our schedules, and ChatGPT to write blogs. There is no doubt that AI is improving our lives by enabling higher productivity, faster decision-making, and better healthcare. However, the development of AI models requires the use of large amounts of data, often including personal data. This intersection of AI and personal data brings both exciting opportunities and significant responsibilities.

AI Development and GDPR: A Symbiotic Relationship

Consider an electricity supplier who collects and processes electricity consumption of its customers for the purpose of calculating electricity costs and now wants to use this data to develop an AI model that maps at which times the electricity price is lowest in order to optimize electricity consumption. This should lead to lower electricity costs for customers.

Under the General Data Protection Regulation (GDPR), this constitutes further data processing. The electricity supplier must evaluate if this new purpose aligns with the original purpose of data collection. If the AI model is used to optimize electricity costs, this is deemed compatible with the initial purpose of calculating electricity costs, thus no additional legal basis is required. At the same time, the electricity supplier must follow the principle of data minimization at all times.

On the other hand, if the electricity supplier intends to use data collected for scoring customer satisfaction to train an AI model for optimizing electricity costs, it is incompatible with the original purpose. In such cases, anonymizing personal data becomes crucial. In fact, the EU AI Act emphasizes the importance of data anonymization, requiring the use of synthetic or anonymized data when possible. Once anonymized, the data falls outside the scope of GDPR. Understanding what constitutes anonymous data, the impact of anonymization, and the criteria for selecting appropriate techniques is essential for GDPR compliant AI development.

What is anonymous data?

Before digging into impact assessment, it is important to understand the distinction between personal and anonymous data. According to Article 4 of GDPR, personal data refers to ” any information relating to an identified or identifiable natural person (’data subject’) ”. Examples include full names, date of birth, social security numbers, credit card numbers, medical records, or mailing addresses. Whereas the Recital 26 of GDPR defines anonymous data as ” …information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable ”. In other words, if individuals are no longer identifiable, data can be rendered ’anonymous’ and it will not fall within the scope of GDPR as it satisfies the strict requirements of not being re-identifiable. 

Even though direct identifiers of data subjects have been removed, the risk of cross-referencing information from different data sources may remain due to big data. For that reason, anonymization cannot be limited to the simple, routine, and passive application of commonly used anonymization techniques such as randomization, generalization, or encryption. It is necessary that your organization undertakes an impact assessment to manage the risk of re-identification, estimate the impact that re-identification could have on the rights and freedoms of individuals, and establish whether additional measures should be implemented to reduce risks to data subjects.

What to consider in your impact assessment?

GDPR does not prescribe specific anonymization processes, allowing flexibility based on the context. It should be decided on a case-by-case by using a combination of different anonymization techniques, while considering the practical recommendations provided in Opinion 05/2014 on Anonymization Techniques issued by the Article 29 Data Protection Working Party. Before you choose anonymization techniques you need to identify, evaluate, and determine the following elements in your impact assessment:

Nature of Personal Data: A data controller must identify the type of personal data that is going to be anonymized. Based on that, a data controller needs to identify direct, indirect, and quasi-identifiers of personal data.

Intended Purposes: A data controller must describe the purpose for which the anonymized data will be used. In addition to that it must be clarified whether the anonymized data will be disclosed to third parties or will be used internally. If the data is to be made public, a higher level of expertise and anonymization can be assumed.

Scope of Anonymization: A data controller must decide on how much data will be collected, how often it will be used, how long it will be stored, how many individuals will be affected, and which geographical areas will be covered. Moreover, the data controller must abide by the principle of data minimization and not go beyond what is necessary for the purpose of anonymization.

Risks for rights and freedoms of data subjects in case of re-identification: A data controller must identify risks to data subjects in the event of re-identification, such as discrimination, identity theft or fraud, monetary loss, damage to reputation, other significant economic or social harms to data subjects.

Acceptable re-identification threshold : A data controller is responsible for establishing the risk limit for re-identification. As anonymization increases, the utility of the data is reduced. Thus, the data controller must decide on the degree of the trade-off between utility and the risk of re-identification.

Organizational and technical measures: A data controller must assess whether it has all the necessary organizational and technical measures in place for identified risks and whether additional measures are required.

Monitoring and review : In any event, a data controller must monitor and review risks to the rights and freedoms of natural persons.

What are the criteria for assessment of anonymization techniques?

According to Opinion 05/2014, three criteria ought to be considered to determine whether anonymization has occurred.

Is it still possible to single out an individual: The possibility to isolate some or all records which identify an individual in the dataset.

Is it possible to link records relating to an individual: The ability to link, at least, two records concerning the same data subject or a group of data subjects (either in the same database or in two different databases).

Can Information concerning an individual still be inferred : The possibility to deduce, with significant probability, the value of an attribute from the values of a set of other attributes.

If the answer to the above three questions is negative, then the data can be considered anonymous. It is important to remember that re-identification risk may increase over time due to technological developments. The Opinion 05/2014 on Anonymisation Techniques issued by the Article 29 Data Protection Working Party indicated that a data controller should consider both ’the state of the art in technology at the time of the processing’ as well as ’the possibilities for development during the period for which the data will be processed’.

In short, a combination of anonymisation techniques meeting all the above three criteria would be robust against identification performed by the most likely and reasonable means the data controller or any third party may deploy.

Final remarks

In this article, we explained that anonymization cannot be limited to the simple, routine, and passive application of commonly used anonymization techniques. To anonymize personal data, you must conduct an impact assessment to identify, prevent, and protect risks to the rights and freedoms of natural persons. The combination of techniques you have chosen must meet three criteria of effective anonymization, namely no singling out, no linkability and no inference. If personal data is anonymized in such a manner, your organization would be able to use such data for the purpose of developing AI. If you need further questions and assistance, Eris Law Advokatbyrå can support your AI development with our legal and technical expertise in the field of AI.

Sources:

AEPD (2023) Anonymization III: The risk of re-identification. Available at https://www.aepd.es/en/prensa-y-comunicacion/blog/anonymization-iii-risk-re-identification#:~:text=The%20anonymization%20is%20a%20processing,set%20is %20not%20re%2Didentifiable .
AEPD (2023) Risk management and Impact Assessment regarding Data Protection. Available at https://www.aepd.es/en/areas/innovation-and-technology
AEPD (2021) Anonymity as a Privacy Measure. Available at https://www.aepd.es/guides/k-anonymity-as-a-privacy-measure.pdf
AEPD (2019) Introduction to the Hash Function as a Personal Data Pseudonymisation Technique. Available at https://www.edps.europa.eu/sites/default/files/publication/19-10-30_aepd-edps_paper_hash_final_en.pdf
Data Protection Commission (2020) Guidance on Anonymisation and Pseudonymisation. Available at https://www.dataprotection.ie/sites/default/files/uploads/2019-06/190614%20Anonymisation%20and%20Pseudonymisation.pdf
Data Protection Working Party (2014) Opinion 05/2014 on Anonymisation Techniques. Available at https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf
Data Protection Working Party (2016) Guidelines on Data Protection Impact Assessment (DPIA) and determining whether processing is ”likely to result in a high risk” for the purposes of Regulation 2016/679. Available at https://ec.europa.eu/newsroom/article29/items/611236/en
EDPS (2021) Misunderstanding related to Anonymisation. Available at https://www.edps.europa.eu/system/files/2021-04/21-04-27_aepd-edps_anonymisation_en_5.pdf
ICO (2012) Anonymisation: managing data protection risk code of practice. Available at https://ico.org.uk/media/1061/anonymisation-code.pdf
ICO (2021) Chapter 2: How do we ensure anonymisation is effective? Available at https://ico.org.uk/media/about-the-ico/documents/4018606/chapter-2-anonymisation-draft.pdf
ICO (2023) The ICO’s approach to impact assessment – our draft Impact Assessment Framework. Available at https://ico.org.uk/media/about-the-ico/consultations/4023825/draft-impact-assessment-framework-20230130.pdf
IMY (2023) The Data Protection Regulation on impact assessments and prior consultation. Available at https://www.imy.se/verkehrsmatt/dataskydd/det-har-galler-enligt-gdpr/konsekvensbedomningar-och-forhandsamrad/dataskyddsordinningar-om-konsekvensbedomningar-och-forhandsamrad/

What you need to know before anonymizing personal data

Leave a Reply Cancel Reply