skip to primary navigationskip to content

Limitations of de-identification: no reason not to share data

last modified Jun 18, 2015 12:17 PM
Neil Walker, Department of Medical Genetics

Limitations of de-identification: no reason not to share data

Neil Walker, Department of Medical Genetics



In Feb 2015, the Nuffield Council on Bioethics reported1 on:

The collection, linking and use of data in biomedical research and health care: ethical issues

Under "Law, governance and security" they write:

"In the UK, the Human Rights Act guarantees a right to privacy, except where there is an accepted and overriding public interest.

Data protection law in the UK and Europe controls the processing of certain categories of data and applies enhanced controls to sensitive data such as health data. Specific relationships also generate duties of confidence, such as that between a doctor and a patient.

Where data are to be re-used in other contexts, or for other purposes, procedures to seek the consent of individuals to share data or to de-identify data are typically used in order to ensure their privacy is not breached. However, in the context of modern data initiatives, there can be significant problems with these strategies."


"The de-identification of individual-level data cannot, on its own, protect privacy as it is simply too difficult to prevent re-identification."

Consent has limitations too, as participants

"cannot  foresee or comprehend the possible consequences of how their data will be available for linkage or re-use."

This talk focuses on de-identification, with examples from current practice. It will argue that, with clinical data, one should both de-identify and seek consent for data re-use, acknowledging to participants the limitations of both.

 Further, given funders mandate data sharing - including of clinical data - this should be with access control, as is widespread in social science and genetics, in order to honour consents.


A Jan 2015 Institute of Medicine report, "Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risks"2 includes a substantial appendix on "Concepts and Methods for De-identifying Clinical Trial Data".

Their key example of successful data sharing, is the International Stroke Trial database (ISTDB)3, available without access control4.

They suggest (with evidence from HIPAA) that key areas of re-identification threat - once obvious identifiers are removed5 - are geography, broad demographics and dates.  With 842 unique combinations of age, sex and country code, ISTDB fails the Information Commissioner's Office anonymisation guidance6.

GSK publish an Anonymisation Standard7 at the industry data sharing repository.  Its general approach is to recode or redact information, until data records are no longer unique. Much meaningful data will not survive the process.

 The Guardian wrote8 on:

Privacy and the 100,000 Genome Project

As the Department of Health starts to draw a map of thousands of genomes, will it keep its promise to anonymise our data?

Ah, no. And the DoH has (deliberately) used "anonymous" as a synonym for "pseudonymised", and therefore, they argue, the consents are invalid.




 [3] Sandercock, PA, et al

The International Stroke Trial database. Trials 2011;12(1):101


[5] Hrynaszkiewicz I, et al

Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers. Trials 2010;11:1-5