Effective anonymisation is an issue for many organisations, however the processA series of actions or steps taken in order to achieve a particular end. remains a crucial tool in safeguarding privacy rights and ensuring UK and EU GDPR compliance. Here, we examine the challenges of using anonymisation techniques and the considerations needed to assess any limitations.
Anonymisation is the process of removing personal identifiers from data in order to protect the privacy of individuals. For life sciences, this is an important technique and helps organisations make full use of data for research and innovation, while reducing obligations under data protection regulations.
It could be argued that anonymisation is the cornerstone of ethical research, although this is a rather crude overview. Research relies upon identifiable data*, as there is often a need to link back to a particular individual. However, anonymisation is crucial to enabling further or secondary research to be undertaken using an existing dataset. An organisation should therefore consider the most appropriate anonymisation technique in terms of compliance, taking into account any ongoing and future requirements. There are two principal ways in which anonymisation can be achieved:
Using these and other anonymising techniques fortifies regulatory compliance, as well as helping to safeguard against data misuse and assisting with the consistency of results. If a robust process is implemented, this will not only avoid some of the more highly regulated activities that arise from using personal data, but also help to build trust and commercial opportunities.
Life sciences organisations often deal with highly specific and detailed personal data, which can make effective anonymisation particularly challenging. In some projects, it may be essential to hold certain identifiers, making true anonymisation impossible.
Anonymisation could be reframed as an exercise in risk management, much like the general aspects of data protection. But how can the risks posed by a dataset be assessed, and how can we accurately determine whether it has been successfully anonymised?
Reasonable likelihood test
For organisations just beginning their privacy journey, it may be beneficial to begin with this test.
To outline a little history: The 1995 EU Data Protection Directive first introduced the concept of “reasonably likely” when considering the risks of re-identification. Now, it is more commonly known as the reasonable likelihood test.
Unfortunately, it is not possible to obtain an absolute guarantee that an individual will never be identified from a particular datatset, particularly in the life sciences, where highly specific data is often collected. However, it is a legal responsibility of an organisation to ensure all reasonable attempts have been made to limit the risks.
The considerations for this test concern the methods that are “reasonably likely” to be used by an “intruder” or an “insider”. Sounds simple, but it can be difficult to pin down the methods that are reasonably likely to be used, especially with questions of time and costs, and the technology available. In short, this test has limited value and can end up being a circular argument.
Motivated intruderA motivated intruder is a person who starts without any prior knowledge but wishes to identify an individual from whose personal data the anonymous information is derived. test
More developed organisations would do well to consider using the concept of a “motivated intruder”. This test was introduced in the 2012 UK ICOThe United Kingdom’s independent supervisory authority for ensuring compliance with the UK GDPR, Data Protection Act 2018, the Privacy and Electronic Communications Regulations etc. Anonymisation guidance and refers to an external individual who does not possess a specialist skillset (i.e. not a hacker, but a reasonably switched-on person).
A motivated intruder would have access to the internet and other publicly available datasets and, for whatever reason, is sufficiently determined to re-identify a particular anonymised dataset.
The risk assessment of this test involves reviewing each “data release” (e.g. public release, third party sharing, etc.), and evaluating whether an imagined motivated intruder could re-identify a given individual.
This exercise highlights individual risks and allows foresight, however the factors are obviously marred by subjectivity. As with all tests, there are limitations.
K-anonymityA technique used to release person-specific data such that the ability to link to other information using the quasi-identifier is limited. K-anonymity achieves this through suppression of identifiers and output perturbation.
For a more objective approach, organisations often use the k-anonymity method for quantifying risk. This can be likened to a “needle in a haystack” strategy and is a popular technique to facilitate data privacy concerns.
K-anonymity is a technique whereby individuals’ data is pooled into a much larger group to suppress identifiers. Also called data generalisation, this method is a useful tool which can also be used for sharing de-sensitised data.
The k-value defines the effectiveness of anonymity. To assess this value, the number of records with identical attributes in an anonymised dataset need to be counted. A k-value of 1 means the record is unique, and the data is likely not anonymous. If any given two records are identical, the dataset has a k-value of 2, and so on. Where k is high, it can be assumed the risk of individual re-identification is low.
Sounds straightforward, however, there can be anomalies or outliers, so-called fringe cases that might need diverging k-values. By definition, these cases are uncommon or unique, but these demonstrate the difficulties that can arise. This is also highlighted by the Singaporean Supervisory AuthorityAn authority established by its member state to supervise the compliance of data protection regulation., the Personal Data Protection Commission, (PDPC), who suggests a k-value of 3 is appropriate for internal data sharing, while 5 is recommended for external cases.
Singapore Supervisory Authority 5 step process
In March 2022, the PDPC published a five-step process which organisations can use to perform anonymisation.
Figure 1: PDPC 5-step anonymisation process
This mature approach incorporates all the previous methodologies:
Step 1: Know your data – be aware of the data attributes and the aspects of identifiability
Step 2: De-identify your data – remove identifiers and apply pseudonyms
Step 3: Apply anonymisation techniques – e.g. data masking, k-anonymity etc
Step 4: Compute your risks – assess using tests such as the motivated intruder test
Step 5: Manage your re-identification and disclosure risks
The key innovation of this 5-step process is how the trajectory is not limited to the actual act of anonymisation. Both before and after implementing a process, organisations need to conduct dataflow mapping, undertake risk assessments and implement appropriate technical and organisational measures.
These controls are common to most developed compliance frameworks, and the PDPC’s five step process offers a useful tool to complement and enhance any existing practices. Organisations should apply robust processes to assist clinical trial operations while maintaining resilient data protection.
As mentioned earlier, with the outliers and anomalies, there are certain data studies that cannot be anonymised. For these projects, other measures should be considered, and these include:
To conclude, Contract Research Organisations (CROs), sponsors, and where applicable clinical research sites, all have an important role to play in data protection that extends beyond the application of appropriate pseudonymisation and anonymisation techniques. They need to remain informed of any legislative changes to ensure compliance, in addition to keeping abreast of any new risk reduction techniques that may arise in the future.
Ultimately, the only way for life science organisations to reduce risks is to implement robust technical and organisational measures as well as having clear and concise agreements in place.
The contracts between sponsors, CROs and partners must establish contractual responsibilities, in the minimum, for all of these areas:
By focussing on collaboration and risk reduction, life sciences organisations can effectively navigate data protection challenges as well as promoting innovation and compliance.
*It should be noted as per Case T-557/20, SRB v EDPS, the General Court of the European Union ruled that the pseudonymised data SRB disclosed to a third party could be considered anonymous. The onward ramifications of this judgement for the wider data protection industry are still percolating, however within the context of clinical trials, there are persuasive arguments in delaying any drastic operational model redesigns at this time. The DPO Centre will be monitoring the situation closely. Clients will be provided with expert advice for any necessary adjustments, as and when needed and appropriate.
The DPO Centre offers flexible and tailored data protection support with professional advice and expertise specific to life sciences organisations. We provide outsourced data protection officers (DPOs) who work as an integral member of your team, as well as Data Protection Representatives (DPR) in both the EU and UK.
We will help you to quantify your anonymisation risks and support you with your wider data protection obligations, both before and after your trial begins. Our DPOs and EU/UK Representatives assist with privacy maturity reviews, Data Protection Impact Assessments (DPIAs), and dataflow mapping exercises. We review protocols, policies, Privacy Notices, Informed Consent Forms, and other clinical trial documentation, as required. We will also advise in greater detail as to which datasets may be subject to applicable data retentionIn data protection terms, a defined period of time for which information assets are to be kept. periods, and therefore require specific anonymisation and archival techniques to be applied.
Need more information? See our full range of outsourced Data Protection support services
For more news and insights about The DPO Centre, follow us on LinkedIn
With thanks to James Boyle from Mishcon de Reya for content contributions. Discover more about risk reduction in the next blog, Anonymisation Part 2: Risk Reduction for CROs, Sponsors & Partners Conducting Clinical Trials.