The main theme generated by the qualitative analysis was ‘tightrope walking’. This overarching theme of negotiating pathways through multiple, sometimes conflicting, considerations included three sub-themes of ‘a culture of caution’, ‘fuzzy boundaries’ and ‘cultivating the ground’. These are shown in Table 1 with their associated main coding categories identified from the interview data. Selected key findings are summarised below, supported by illustrative quotes.
Tightrope walking
Metaphors such as ‘balancing act’, ‘juggling’ and ‘tightrope walking’ were frequently used in many of our interviewees’ accounts of working towards maximising the safe and secure use and reuse of clinical data. These metaphors referred in particular to four main areas where participants spoke of challenges they felt required continual, careful negotiation:
balancing protections for individual patient privacy and using available health-related data for the ‘public good’ and ‘in the public interest’;
fostering public trust in expanding the secondary uses of HIT-derived data, and engaging patients and the public with the processes;
achieving proportionate governance in secondary uses for dataset linkage research for trustworthy but also faster studies;
efforts to balance perceived costs and rewards among different groups, for example between hospital staff involved in collecting data for clinical care and academics using these data for research purposes.
Support for secondary uses among our sampled interviewees would be expected given each of our participants was actively involved in secondary uses of HIT-derived data in some capacity. Advocacy for pursuing benefits from secondary uses was, however, consistently related to the concept of the public interest and, in the UK context of publicly funded National Health Services (NHS), to ideas of a social contract and reciprocation.
‘Part of my view would be to say if you’re an NHS patient there are rights and there are responsibilities and part of your responsibility of being treated within a state healthcare system is for your information to be used for the benefit of you and other members of society’. (16: medical professional)
‘The NHS is still a socialised system here [in Scotland], you know, it’s retained that much more so than in England, and I think that makes a difference in terms of making an argument for health research for patient benefit when the overall boundary around that is a socialised healthcare system’. (15: social scientist)
‘There is a level of concern if commercial entities are involved, we know that, but it’s much more subtle than that. It’s not just public – good, and private – bad’. (03: legal academic)
Interviewees acknowledged likely benefits, both economic- and research-related, from attracting, for example pharmaceutical companies to the UK – providing appropriate safeguards were in place and public and health professionals’ trust was not jeopardised. A media outcry over the selling of datasets10,11 in England (care.data) had reflected genuine, widespread distaste about commercial exploitation of data, according to participants. They believed such events had negatively influenced public perceptions more widely, even although the systems put in place in the devolved NHS Scotland and elsewhere were quite different from those that were operating in England.
‘In Scotland we don’t have one single large database which is where the Care database in England got itself in trouble. Here all the datasets are all left with the original suppliers of the data, so no-one can get into some big database and get access to all the data’. (05: Farr Institute)
While Scotland had not followed the open access initiative that had been instigated in England, some interviewees still recognised that there could nonetheless be advantages to big data open access:
‘People can do analyses, they can hold public sector to account if they can get access to the data. These all seem like good things. Breaches of individual privacy aren’t a good thing so, again, we’re balancing advantages and disadvantages from making data available’. (05: Farr Institute)
An NHS National Services Scotland (NSS) interviewee suggested that for the most part, he believed people were very happy with the idea of NHS staff conducting research on NHS-derived data, and slightly less so if it was academics carrying out research on those data unless the academics were closely aligned to the NHS. The involvement of commercial bodies and particularly of the pharmaceutical industry, he said, ‘made people more nervy’. Personally he had no bias against working collaboratively with pharmaceutical companies because he believed they did positive work and needed to do research in order to do more of it.
‘We always ensure that we have control over publication, so we maintain a certain level of independence from the pharmaceutical industry … And that so far has been fine, but perhaps that’s mostly because this work has been relatively low scale, small numbers’. (06: epidemiologist)
Another interviewee likened managing the sometimes conflicting priorities encountered in a step-by-step approach to maximising benefits from HIT-derived data to ‘walking a tightrope’, whereby he believed protecting the NHS should always remain the first priority.
‘It (the data) is collected for patient care, firstly, and then for running the health service, so we need to be able to plan how many hospitals, how many doctors, we need to be able to look at quality of care, we need to be able to do those things. Anything which leads to the public withdrawing confidence has the potential to bring down the health service’. (05: Farr Institute)
An exemplar in governance for data linkage studies, highlighted by study participants both from the UK and internationally during interviews, was the principles-based proportionate data governance framework developed collaboratively in Scotland.12 This framework had four elements: an account of the principles and instances of best practice, information on who was a data controller and in what circumstances, a model of proportionate governance and a training element. Combining safeguards with the flexibility of a principles-based approach was considered a model for others in the UK and abroad who also hoped better to balance researchers’ needs for reasonably fast approvals to access data and the recognised need to protect privacy, confidentiality and data security. The framework had not been designed to be specific to Scotland or to be specific to health-related research.
‘We deliberately designed it in a generic fashion so it could be picked up by anybody in any sectors actually considering what needs to be taken into account in data linkage’. (03: legal academic)
The idea of ‘proportionate’ governance in this framework comprised consideration of data anonymisation, consent for using the data and if a proposed data linkage would be in the public interest. These three considerations were underpinned by the governance framework’s triad of ‘safe people’ (accredited researchers), ‘safe environments’ (for example accessing research data only from a “safe haven”) and ‘safe data’, which covered technological capabilities such as for anonymising, and zipping and unzipping research datasets.
‘This is about robust research use. That implies that there are appropriate ethical checks and balances, that there is suitable anonymisation, where that’s possible and practical for the research… There’s consent, there’s anonymisation and a third avenue which is authorisation – the idea that you can actually have, for example, ethics bodies that can authorise the linkage of identifiable data in the public interest so long as certain types of criteria are met’. (03: legal academic)
The primary legislation controlling how personal data in the UK could be used currently was the amended 1998 Data Protection Act (DPA).13 The legal and regulatory context in which personal data were either shared or linked was described as one of the ‘labyrinthine’ complexities. The DPA itself was reported to be hard to understand and often misunderstood. There were different legal systems within the UK and further legal complexities should secondary uses also pertain to continental Europe or elsewhere internationally. As a European Directive, DPA legislation had been implemented differently across European Union (EU) member states. Now, however, it was proposed to replace the various national data protection laws with a single, uniform EU Regulation.14 Uniformity was intended to introduce standardised, personal data protection legislation across the EU and facilitate data movement as well as save costs to businesses, but some interviewees feared that the advent of a rigid regulation could instead be ‘potentially a huge threat’ to health-related secondary uses research. Part of that concern lays in fears that any recognition of health research being conducted in the public interest could be overwhelmed by lobbyists for privacy protections with respect to the commercial exploitation of personal data. An interviewee described research funders’ concerns as getting:
‘…drowned out by a lot of extreme views about what should happen in terms of consumer privacy, rather than just patient privacy’. (03: legal academic)
We now in turn consider each of the three sub-themes subsumed under the ‘tightrope walking’ meta-theme.
A culture of caution
Misconceptions about the contents of the current DPA controlling personal data, according to UK interviewees, was a contributing factor to inconsistent attitudes among data controllers to sharing different types of personal data and to the phenomenon of “data hugging” – an overly cautious approach to data sharing – by some. It was believed that professional attitudes could also be a problem, and some perceived general practitioners (GPs) in particular to be unnecessarily cautious about making primary care data available for secondary uses research. Better information and education could challenge data hugging.
‘One of the ways to address the culture of caution is to raise awareness among the custodians, and also researchers who want access to data, that fundamentally the law is there as much to protect privacy as it is to facilitate the responsible sharing of data’. (03: legal academic)
More positively, a culture of caution was deemed fruitful in the context of multidisciplinary collaborators taking incremental steps to lay solid foundations for expanding secondary uses of HIT-derived data. In this context, a past record of personal relationships with multidisciplinary colleagues and scale were both identified as important facilitators. The smaller size of a Canadian province, an Australian state or a large healthcare organisation in the more fragmented and commercialised health systems of the USA meant that individuals working in various disciplines relevant to secondary uses were all likely to know of one another and could ‘pick up the phone’ to each other. The small UK nation of Scotland had the advantage that ‘small-scale’ encompassed national datasets. International interviewees described Scotland’s position in regard to developing secondary uses as enviable.
Fuzzy boundaries
An array of secondary uses-related fuzzy boundaries was identified by interview participants from lack of clarity or consensus over terminology, for instance whether secondary uses was an appropriate term and how it should best be defined, to widespread conceptual fuzziness about data ‘ownership’, being a data controller, and the differences among sharing data, linking identifiable data and linking data using anonymised, aggregated datasets. Interviewees also spoke of the fuzzy boundaries of hybrid organisations where it was no longer necessarily always clear if an organisation could clearly be classed as belonging to the public, private or voluntary sector.
‘Let’s take the NHS. It has research functions within it and a lot of their researchers conduct it in conjunction with universities, and there are joint posts, so you can’t put a boundary around that system very easily. And outside, that boundary is even more fluid because the health system itself is really increasingly a combination of public and private’. (15: social scientist)
Fuzzy boundaries could also be construed as advantageous in so far that they offered possibilities for opening up debate and discussions between diverse stakeholder groups. For example even when research participants had given consent at the start of a study, unforeseen ethical dilemmas could arise as further information came to light over time, such as in genetics research:
‘The plan is to actually look at some real cases where this does occur and actually see what the individual concerned would like to have done, because the real problem is the balance between alarming people unnecessarily and not rescuing people from situations they need to be rescued from. Unfortunately the reality is, even when people in studies have explicitly said they do not want to have any data back from the study, a lot of them still believe that they would be contacted if a lifethreatening thing was found and that’s simply not true’. (18: academic/commercial)
Cultivating the ground
Interviewees in our study highlighted developments in the secondary uses of HIT-derived data, particularly the growth of dataset linkage studies and the introduction of additional health-related datasets, such as genomics and other biotechnology data, primary care data, medical images and laboratory results. Large datasets linked to individual patients would advance developments in precision (also known as personalised) medicine, in which healthcare is individually customised. An example of progress with such work came from North America, where a national network of genetics research had been funded by the National Human Genome Research Institute to support genomic medicine.15 In addition to the potential to deliver faster, cheaper research and to enhance medical knowledge and drug safety, interviewees suggested a more holistic approach to health and health care would develop through a growing numbers of studies that linked health-related datasets with each other and with datasets from other sectors. Combining HIT-derived data with education, housing and justice datasets, for example, would increasingly generate evidence to support public health initiatives and evidence-based policy making beyond specifically health policy, as well as to potentially support more robust policy evaluations.
Relationships between different sectors still had to be fully worked out. UK and international interviewees reported this particularly to be the case with respect to establishing mutually satisfactory working relationships with pharmaceutical companies, such that for-profit, private businesses saw benefits from participating in secondary uses research while simultaneously clearly safeguarding the public interest principle and retaining public support. Most interviewees acknowledged that collaborative working with commercial companies, including the pharmaceutical industry, would be an important component of achieving any aspiration to create economic wealth through secondary uses of digital data. Job creation could be a measurable, medium-to-longer term return on investment in HIT systems. Value might also accrue, however, which would be harder to define, such as some future overall improvement in public health as a consequence of evidence-based policies that had reduced environmental damage. That would be according to an interviewee:
‘… a very different vision of wealth creation’. (15: social scientist)
Natural language processing, according to some interviewees from the UK, should soon allow new datasets derived from unstructured information in electronic health records to become available for research. A further, potential resource for secondary uses research was patient reported measures.
‘It’s another layer that will come … and could be in many ways invaluable and wonderful extra data. It would just require another level of thinking about’. (06: epidemiologist)
Patient-entered data in records would have to be flagged in order for researchers to understand if and how to incorporate those data in research. More generally, data quality in clinical records, and especially the quality of data following dataset linkages, was raised as still being a challenge to conducting robust secondary uses research, by both UK and international interviewees.
‘Data quality is certainly an issue. The primary goal, I think, has probably got to be interoperability, and I would say patient access to those integrated records as well… In order to have interoperability you’ve got to have a standard and the standards are a very technical thing. Getting everyone to agree that that’s enforced and getting the people who are funding the development to understand that it’s important, all of these things are all sort of steps along the way’. (22: GP)
‘Data quality is a huge issue with clinical datasets. … So a measure of weight for instance will depend in part on what the measurement of weight was but also on somebody correctly entering that into the database and there will be typographical errors in that without question. If you’re entering enough data there will be data entry errors, inevitably’. (06: epidemiologist)
In very large epidemiological studies, which were now possible using population datasets, the influence of some data errors should be attenuated by the number of data items. Another issue was how well researchers understood the variables in the datasets with which they were working. An interviewee referred to the publication initially of misleading findings from a study comparing lengths of hospital stays in two Canadian provinces; in this instance, that research team had been unaware that in one of the province’s databases, acute patients were recorded as being discharged after a given period of time and then recorded as readmitted as a different type of patient.
‘The provinces are very independent and so making sure you’re making real comparisons, rather than apples and oranges comparisons and assuming that they’re the same thing, is challenging and there are real examples of disasters which have been made across provinces’. (08, Manitoba Centre for Health Policy)
Most participants raised the need for stable and adequate funding to continue supporting organisations that were advancing the secondary uses of HIT-derived data and help address outstanding data quality issues.
‘I just think it should be viewed as unacceptable to spend many millions on running a cohort and then not be willing to spend £1,00,000 to ensure that if you now do a coanalysis with another cohort that the variables that you’re going to use actually do mean the same thing in the two cohorts’. (18: academic/commercial)
‘We are looking at incorporating genomic data and kind of treating this as a mechanism for reuse of omic-related data. We don’t have funding yet and we’re working on that’. (09: PopData)
In Canada, healthcare is organised by province and British Columba holds longitudinal data for the whole of the province’s population at a research support facility known as PopData.16 A repository of Manitoba’s range of research datasets is housed in the Manitoba Centre for Health Policy.17 These are different models from England where the Health and Social Care Information Centre collects data nationally.18 In Scotland, where datasets are not held centrally, researchers seeking to work with NHS data apply through the electronic Data Research and Innovation Service;19 if a proposal is approved, researchers then access the appropriate, indexed and linked data for their study from a secure ‘safe haven’. In the UK, Wales has also developed a remote access system for researchers called the Safe Anonymous Information Linkage.20,21
Interviewees from abroad and from the UK identified the need for funding for staff development and capacity building in order to grow the available workforce with the range of required skills for further secondary uses development, such as in building HIT infrastructures, mathematics, designing new research methodologies and analysing and interpreting the findings from large dataset studies. Interviewees advocated increased training opportunities in health informatics and data science across a range of existing academic disciplines to foster multidisciplinary collaborations, with stable funding to support new PhD studentships for years to come. In the UK context, the recent substantial investments in the ADRN and the Farr Institute,1,2 with just such organisational aims as these, were seen by interviewees as a vitally important step to support continued progress.
Insufficient funding was, however, suggested by one UK interviewee as an explanation for the UK’s failure to date to implement a national public education and public engagement campaign for secondary uses work with HIT-derived data. Others highlighted a reported example of highly successful public engagement from Western Australia (WA), where WA Data Linkage System22 research projects used state-wide population data and where there was no patient consent to opt in or out. Every WA Data Linkage System project had a consumer advisory group attached to it, and every study had to produce a lay summary of the research carried out, its findings and its impact in clear, everyday language. Over the years, the state’s community had become the main ‘champions’ for data linkage studies using unconsented data, according to a staff interviewee.
‘20 years ago they would have opposed it because they would have been worried about privacy. But we’ve got such a fantastic track record of doing wonderful research which has benefited the community and we get out there and tell everyone about it, so I think that’s an important aspect of it… What our consumers said was that we know you’ve got data on us, if you don’t use it to improve the health system and you don’t use it to avoid harm we will sue’. (14: WA Data Linkage System)
An important factor in accruing positive support for data linkage research in WA was translational capacity, which reflected the close working relationships between WA Data Linkage System staff and local policymakers, as well as a history of significant public involvement at all levels and stages of the research process.
‘The translation starts with involving the policy people and even the practitioners as well as the consumers actually at the beginning of the project’. (14: WA Data Linkage System)