News Details

img

UK Biobank Data Breach

The UK Biobank breach shows the need for data management training

Universities increasingly invoke the language of “trusted” and “secure” research. Yet many institutions using these terms do not fully understand what those concepts operationally require. That is why the recent controversies surrounding UK Biobank cannot simply be put down to a few “bad apples” or isolated failures of individual judgement.

Last month, researchers were temporarily blocked from accessing the database of anonymised medical information on more than 500,000 UK volunteers after some of that information was offered for sale online in China.

But while the incident constituted a shocking breach of research ethics, it is equally true that we should have seen this coming.

Most academics receive extensive training in ethics, informed consent and research integrity. But far fewer receive practical training in how to safely manage and share data; on the contrary, modern research culture encourages openness, reproducibility and the widespread sharing of data and code, in ways that make disclosure risks inevitable.

The UK has long been a global leader in managing confidential research data. National statistical agencies, government departments and specialist data services – the UK Data Archive and the Office for National Statistics among them – have spent decades developing systems to enable research access while protecting subjects’ identities.

Central to this is the Five Safes framework. This recognises that safe research depends on multiple interacting safeguards related to the project in question, the people involved, the security of the data analysis environment, the extent to which the data are anonymised and the project’s public outputs.

Governance failures rarely arise from dramatic cyberattacks. More often, they emerge from ordinary human behaviour. Researchers may misunderstand rules, overshare outputs or fail to recognise how multiple seemingly harmless data sources can be combined to identify individuals or groups. This is why good governance does not rely primarily on trust in individuals but rather on layered systems designed around predictable human limitations.

The key concept is the “data access spectrum”. Mature governance systems do not treat all data as simply “open” or “secure”. Instead, they recognise that different levels of detail require different forms of access and control.

At one end of the spectrum are aggregate statistics and heavily anonymised public-use data: this can often be openly shared. Further along are certified downloads of de-identified microdata subject to agreements, training and institutional oversight. At the highest-risk end are secure-use files and source data containing highly detailed record-level information, which require checks on who is using the data, why and how.

A secure research system should function more like a reference library than a lending library. Analysis happens inside, in a secure environment, while the underlying data do not leave. All outputs should undergo independent checking before release. Strong approval processes are also needed, as is researcher training.

UK Biobank diverged from most of these established principles. For example, until very recently, participant-level data could reportedly be downloaded directly by researchers. Even after the introduction of a “secure platform”, exceptions allowing downloads remained possible.

Moreover, it’s easy to see why researchers think nothing of uploading such data to GitHub when they operate in cultures that reward collaboration and rapid dissemination and when universities proclaim their systems to be “trusted” or “secure” on the basis of authentication systems or restricted logins. In reality, cybersecurity is only one component of good governance.

Secure access to detailed data has transformed public interest research across health, economics and the social sciences, and the lesson from UK Biobank is not that research openness has gone too far. But public trust in research, particularly health, is fragile. Participants contribute data because they believe the information will be handled responsibly and used for public benefit. If institutions blur the distinction between genuinely secure systems and aspirational branding, they risk undermining confidence not only in one organisation, but in data-intensive research more broadly.

Elizabeth Green is a senior lecturer in economics at the University of the West of England. Felix Ritchie is professor of economics at UWE and founded the first safe data governance environment when he was working at the Office of National Statistics.

  • SOCIAL SHARE :