A recent article in The Economist reported on a new low-privacy impact method of studying medical records by analysing data in situ

The method has been pioneered by the OpenSAFELY team of epidemiologists and data scientists led by the University of Oxford and the London School of Hygiene and Tropical Medicine. 

The most distinctive aspect of the method is that the electronic medical records being studied remain in a secure environment under the management of the UK's National Health Service (NHS). No data is 'transferred' to the researchers, who are instead granted remote access.

MedConfidential, an independent lobby group that campaigns for the preservation of the confidentiality of medical records, supports the OpenSAFELY approach, calling it 'important, new, safe research infrastructure that should have existed already'. OpenSAFELY has published its software and research code as open source software tools for general use.

In this post, we will take a look at the implications of the OpenSAFELY method under privacy law in the UK and, for comparison, Hong Kong, Singapore and China.

The OpenSAFELY method

OpenSAFELY’s data analytics platform has been used by researchers to analyse health data in situ at the third party vendor that stores patient data for the NHS.

The key components of OpenSAFELY’s method are as follows:

  • Computations are run on patient records inside the data centre where those records are held.
  • A subset of the patient records is abstracted into a ‘feature store’ for statistical analysis, in a tiered system relevant to the particular study. Those records are pseudonymised before the researcher is granted access.
  • The only information to ever leave the data centre is summary tables (with low numbers suppressed) from statistical models.
  • Researchers access the abstracted data by a secure remote connection from specific authorised IP addresses and each action is logged for independent review.

OpenSAFELY states that by 'building our analytics platform inside the originating EHR vendors’ data centre, we completely avoid transporting large raw primary care datasets which would otherwise present a substantial privacy risk, even when pseudonymised'.

OpenSAFELY is understood to also be working on an update that will enable the platform to take a researcher’s analytical query and automatically return the results to the researcher without the researcher having remote access to the feature store.

At the time of writing, the Twitter feed of the joint principal investigator on the project, Dr Ben Goldacre of the University of Oxford, tantalises with 'oh man we have all manner of interesting news to share very soon' (15 June 2020).

What are the implications of OpenSAFELY’s methods under data privacy law?


Under English legislation, NHS England (as data controller) is permitted to collect data from general practitioners (GPs) directly from their health records vendor. OpenSAFELY states on its website that '[w]e are working on behalf of NHS England, who are acting as Data Controller' and 'each [electronic health record] vendor acts as Data Processor'. It is unclear whether OpenSAFELY regards itself to also be acting in the capacity of a data processor.

Researchers who access and analyse data by remote access are still processing personal data and are thus subject to data protection regulation. Medical records constitute special category personal data that is subject to additional protections.

Under article 89 of the EU general data protection regulation (GDPR), as enacted in the UK Data Protection Act 2018, if personal data is processed for scientific research, certain data subject rights are exempted, but only insofar as, among other things, the exempted rights prevent or seriously impair the purpose of processing and appropriate safeguards are in place. 

However, while this makes certain aspects of processing more straightforward researchers are still required to have a lawful basis for the processing.

What is the lawful basis for processing that would be relied on?

The lawful basis for processing special category personal data is more restricted than for other types of personal data. 

Given that the NHS is the data controller, the most likely lawful basis for personal data will be that the processing is necessary for a task carried out in the public interest recognised in the remit of the public body or law. 

However, where organisations are seeking to rely on the legitimate interest basis for processing (eg when the data controller is not a public body), OpenSAFELY’s methods may make it easier to rely on this basis compared to when using traditional research methods. This is because the legitimate interest basis requires a balancing between the legitimate interest identified and the individual’s rights. 

Where there is less interference with the personal data, and lower risks for individuals, it is usually easier to find that the legitimate interest is not outweighed by the rights of the individual.

There is a specific lawful basis for the processing of special category personal data for scientific research although the processing must satisfy a number of conditions to fall within this exemption. This means not all scientific research will automatically be able to rely on this lawful basis. 

However, the nature of the safeguards introduced by keeping the data in the same secure environment, rather than exporting the data set to an external environment, may make these conditions about how and why the research is conducted easier to satisfy using OpenSAFELY’s method.

It will be interesting to see whether, if the OpenSAFELY method is updated to run an analytical query autonomously on the records and return results, presumably without the researcher accessing the underlying data at any level, the system design will mean that OpenSAFELY is not itself acting as a data processor.

Hong Kong

Hong Kong launched a centralised electronic patient records system (EHRSS) in 2016. As of March 2019, the system included the data of one million registered patients in March 2019 (approximately one-eighth of Hong Kong’s population). Participation in the EHRSS is, however, voluntary and patients must give a separate consent for their data to be shared. There have also been challenges in obtaining the participation of private clinics.

The Personal Data (Privacy) Ordinance (PDPO) prohibits a data user from using patient data for a different purpose from the purpose for which it was collected (eg patient diagnosis) except with the patient’s consent. However, section 62(4) of the PDPO exempts this requirement where personal data is used solely for research or preparing statistics and the results are not made available in a form that identifies the data subject.

Very few complaints have been brought before the Personal Data Protection Commissioner (PCPD) in relation to this exemption and so there is little in the way of available guidance on its application.

Other guidance (PDF) published by the PCPD, while not specifically elaborating on the application of the section 62(4) exemption itself, does stress the need for data controllers (the Electronic Health Record Office) to assess the privacy risks (such as by carrying out privacy impact assessment) before determining whether or not to grant access to such records. The guidance also emphasises that the burden of proof in satisfying the requirements of any exemption is on the data controller (or 'data user' in Hong Kong terminology).

Other PCPD guidance (PDF) also reminds that the resulting statistics or research must not be 'compiled in such a way that makes it reasonably practicable for [patients’] identities to be ascertained'.

It appears to us that the use of the OpenSAFELY method does greatly reduce the risk that a third party would be able to re-identify data subjects in the statistical summary tables. Relevant patient information is extracted from a pseudonymised record for statistical analysis under tightly controlled conditions. If the section 62 exemption is available then researchers in Hong Kong would be able to deploy the OpenSAFELY software tools and method to mine patient records without seeking additional consents. A separate consent should not be required to transfer personal data to the researcher, since the summarised statistical data is unlikely to constitute personal data at that point, and the underlying event data is not transferred out of the data centre.

The Electronic Health Record Sharing System Ordinance, which provides the legal basis for the EHRSS, contains a provision that the information contained in an electronic health record may be used for carrying out research or preparing statistics that are relevant to public health or public safety. However, this provision is not yet in operation and also does not exempt the application of the PDPO.  


In China, a national database of health records does not yet exist, but the government aims to create such a nationwide health records system by the end of this year. Different cities and regions have independently adopted their own record systems, which are undergoing various stages of digitalisation.

Under Chinese law, personal data can only be collected and used on the basis of the data subject’s consent. The Personal Information Security Specification, the guiding document for personal information protection in China, exempts the requirement to obtain individuals’ consent where the processing is carried out for the purposes of statistical or other academic research and the data has been de-personalised. This exemption is, however, only available to academic research institutions and the results of the research must be published. A privacy impact assessment should also be carried out before undertaking significant processing activities of this nature.

Consent is also required for the transfer of personal data to a third party. While an advantage of using the OpenSAFELY platform is that the data is analysed in situ, the current iteration of the analytics platform provides remote access for researchers to a sub-set of patent records. If that sub-set still contains personal information, then, based on an analogy to guidelines on the cross-border transfer of data, Chinese regulators are likely to interpret the granting of remote access to constitute a transfer of data.


Since 2011, Singapore has deployed a National Electronic Health Record (NEHR) system across its government-run universal health care system. Health care providers across all settings are able to access and contribute records to the system. The Singapore government intended to make contribution to the NEHR mandatory but has deferred this requirement to allow for further testing in the wake of a cyberattack on Singapore’s largest group of healthcare institutions in July 2018.

The Personal Data Privacy Commissioner’s (PDPC) advisory guidelines on the PDPA (PDF) explain that, unlike other jurisdictions, pseudonymised personal data is not subject to data protection regulation. If researchers analyse and process data that is already pseudonymised, they may do so without the consent of the data subject.

Alternatively, organisations may use personal data without consent under the Personal Data Protection Act if:

  • the research purpose cannot be reasonably accomplished unless personal data is provided in an individually identifiable form;
  • it is impractical to obtain consent;
  • the personal data will not be used to contact the data subject to participate in the research; and
  • the linkage of personal data to other information is not harmful to data subjects and the benefits derived from that linkage are clearly in the public interest.

The PDPC’s guidance (PDF) notes that the public interest requirement is not satisfied merely because research is funded by a public agency. Satisfying the public interest standard will depend on the specific facts of each research project.