INTRODUCTION
Data collected in electronic medical records for a patient in primary care can span from birth to death and can have enormous benefits in improving health care and public health, and for research. Several systems exist in the United Kingdom (UK) to facilitate the use of research data generated from consultations between primary care professionals and their patients. General Practitioners play a gatekeeper role in the UK’s National Health Service (NHS) because they are responsible for providing primary care services and for referring patients to see specialists.
In more recent years, these databases have been supplemented (through data linkage) with additional data from areas such as laboratory investigations, hospital admissions and mortality statistics. Data collected in primary care research databases are now increasingly used for research in many areas, and for providing information on patterns of disease.1 These databases have clinical and prescription data and can provide information to support pharmacovigilance, including information on demographics, medical symptoms, therapy (medicines, vaccines, devices) and treatment outcomes.1 The major primary care research databases in the UK include the ‘Clinical Practice Research Datalink’ (CPRD), ‘QResearch’ and ‘The Health Improvement Network’ (THIN). For all three systems, the information relating to symptoms, diseases, consultations and other clinical events is recorded using the Read code system. The data made available to researchers are anonymised, and strong patient identifiers such as name, address and postcode, date of birth and NHS number are removed.
The CPRD is jointly funded by the NHS National Institute for Health Research (NIHR) and the Medicine and Healthcare Products Regulatory Agency (MHRA).2 It is one of the largest databases of longitudinal medical records derived from primary care in the world.3 The collection of information began in 1987 under the previous name General Practice Research Database (GPRD). GPRD was initially part of the Value Added Medical Products, a company that pioneered the design and marketing of a general practice office computer system, allowing the recording of individual patient medical recording. The database was later transferred to government control.4 CPRD has been providing nearly 30 years of longitudinal data. As of December 2014, the database contained data for over 13.5 million patients, of which approximately 5.7 million are currently active.4 As well as primary care data, CPRD now links to a number of other data sets such as ‘Hospital Episode Statistics’ (HES) and mortality data from the Office for National Statistics. It is increasingly being used to enhance clinical trial efficiency (protocol optimization, feasibility and recruitment), through working with the general practitioners, and can provide data for both industry and academic researchers.4 Access to the data is subject to protocol approval by the MHRA Independent Scientific Advisory Committee. Over 1,500 research reports published in peer-reviewed journals have used data from the CPRD and have had direct impacts on public health and disease speciality areas.5
QResearch is a large primary care database derived from the anonymised health records of over 12 million patients.6 The data currently come from over 950 general practices using the Egton Medical Information Systems (EMIS) clinical computer system that is used throughout the UK.6 Although the data contain socio-economic details of patients based on their postcode, it does not hold any identifiable data, and access to it is only opened to academic researchers who have ethical approval to receive datasets. QResearch has led many projects such as QFlu, which was used for monitoring and tracking the prevalence of the swine flu outbreak in 2009, reporting to the Health Protection Agency.6 One of the limitations of QResearch is that although it has links to external databases such as HES, the anonymisation process in compiling the database means that there is no way to identify patients.
THIN is collaboration between two companies; In Practice Systems Ltd. (INPS), who developed Vision software used by General Practitioners in the UK to manage patient data, and Cegedim Healthcare Software.7 THIN data collection started in 2003 and over 500 vision practices have so far joined the scheme. THIN data currently contain the electronic medical records of 11.1 million patients (3.7 million active patients). This covers 6.2% of the UK population.7 In addition to the main consultations being recorded, the most patient data in THIN are linked to postcode-level area-based socioeconomic, ethnicity and environmental indices. The data are based on the patients’ postcodes so that variables at ward level are available.8 The patient is identified only by a code allocated by the GP system and cannot be identified outside the practice.