Introduction
The National Institute for Health Research (NIHR) Health Informatics Collaborative (HIC)1 was established in 2014, in response to a challenge by the UK’s Chief Medical Officer Dame Sally Davies to make routinely collected clinical data available for translational research across multiple sites. The NIHR HIC is a programme of infrastructure development across the network of 25 National Health Service (NHS) Trusts, supported by their university partners through the NIHR Biomedical Research Centres. The programme was initially made up of five NHS Trusts hosting comprehensive NIHR Biomedical Research Centres; Cambridge University Hospitals NHS Foundation Trust, Guy’s and St Thomas' NHS Foundation Trust, Imperial College Healthcare NHS Trust, Oxford University Hospitals NHS Foundation Trust, and University College London Hospitals NHS Foundation Trust. The aim of the NIHR HIC programme is to improve the quality and availability of routinely collected clinical data, making it available for cross-centre collaborative, translational research. This presents both opportunities and challenges:
Opportunities
The UK’s unified healthcare system (the NHS) generates millions of clinical datapoints each year, which can be leveraged to improve collection of clinical information, address clinical research questions and improve patient care.
The automated collection of data from electronic patient record systems can dramatically reduce the time and cost of data collection for research and provide opportunities for collaboration with both academic and industry partners.
Modern machine learning techniques using neural networks require large datasets to be used effectively,2 the reuse of routinely collected data can provide a cost effective way of collating these datasets.
Challenges
All NHS trusts are separate organisations, responsible for the protection of the data of their own patients. To enable data to be shared across these separate organisations for research, a governance framework needed to be established.
Each NHS trust has its own electronic patient record, and its own set of customisations, extensions and variations in data entry practice. Alongside the primary electronic patient record, each trust will also have an extensive collection of departmental systems, again subject to customisation and variations in practice.
Data definitions are not all standardised.
Not all data are collected electronically at all sites.
Large amounts of important data are stored in free text rather than discrete values.
Clinical practice can differ between sites.
Data can be produced and collected differently between sites. (eg, different laboratory methods or platforms used for tests).
Different trusts have different levels of expertise in clinical informatics.
Projects such as the NIHR HIC require sustained investment before they start to deliver tangible results.
The NIHR HIC aims to overcome these challenges and demonstrate the value of these data for research in key therapeutic areas; the first five areas considered were viral hepatitis, ovarian cancer, critical care, acute coronary syndromes and renal transplantation. This paper focuses on the viral hepatitis theme, which is led by Oxford University Hospitals NHS Foundation Trust.
Viral hepatitis is a global health problem with an estimated 1.35 million people dying from either end-stage liver disease, hepatocellular carcinoma or other viral hepatitis-related diseases in 2015.3 The majority of these deaths are as a result of hepatitis B virus (HBV) and hepatitis C virus (HCV) infections; this is greater than tuberculosis, HIV or malaria. Unlike these other infections, the number of viral hepatitis deaths has increased since 1990.4 International targets arising from the United Nations ‘sustainable development goals’ have set a challenge for the elimination of viral hepatitis as a public health threat by the year 2030.5 6 As part of meeting this goal, leveraging existing clinical data are a cost-effective way to answer vital research questions. The NIHR HIC Viral Hepatitis Theme aims to address key research questions (table 1), to demonstrate the utility of the NIHR HIC methodology.
This paper presents a comprehensive methodology that has been proposed, implemented and validated by the NIHR HIC for the development of a new data collection and management pipeline. This development is under a comprehensive governance framework that allows data to be collated across multiple centres for collaborative research on viral hepatitis. Under this governance framework, a data collaboration involves the generation of a research-ready dataset that is broad enough to support a wide range of investigations in a specific clinical area. The dataset is assembled to the same agreed standards at each centre. The data transformations needed to achieve this, starting from patient records, are documented and shared.