Original Research

Design and architecture of the CARA infrastructure for visualising and benchmarking patient data from general practice

Abstract

Objective Collaborate, Analyse, Research and Audit (CARA) project set out to provide an infrastructure to enable Irish general practitioners (GPs) to use their routinely collected patient management software (PMS) data to better understand their patient population, disease management and prescribing through data dashboards. This paper explains the design and development of the CARA infrastructure.

Methods The first exemplar dashboard was developed with GPs and focused on antibiotic prescribing to develop and showcase the proposed infrastructure. The data integration process involved extracting, loading and transforming de-identified patient data into data models which connect to the interactive dashboards for GPs to visualise, compare and audit their data.

Results The architecture of the CARA infrastructure includes two main sections: extract, load and transform process (ELT, de-identified patient data into data models) and a Representational State Transfer Application Programming Interface (REST API) (which provides the security barrier between the data models and their visualisation on the CARA dashboard). CARAconnect was created to facilitate the extraction and de-identification of patient data from the practice database.

Discussion The CARA infrastructure allows seamless connectivity with and compatibility with the main PMS in Irish general practice and provides a reproducible template to access and visualise patient data. CARA includes two dashboards, a practice overview and a topic-specific dashboard (example focused on antibiotic prescribing), which includes an audit tool, filters (within practice) and between-practice comparisons.

Conclusion CARA supports evidence-based decision-making by providing GPs with valuable insights through interactive data dashboards to optimise patient care, identify potential areas for improvement and benchmark their performance against other practices.

Supplementary file 1. Graphical abstract

What is already known on this topic

  • Health data is generally stored in isolated systems (data silos), making it difficult for healthcare professionals to access data in aggregated form or to compare across different systems. This issue is prominent in primary care, where patient data is commonly recorded using patient management software (PMS) systems, which do not allow analysing the aggregated data stored in the backend. No publications have been identified that presented a data extraction process, any interactive elements to compare or filter or the potential to sustain the infrastructure or dashboard beyond the aim of the project.

What this study adds

  • Health research funders and governments developed data-sharing policies. However, most data sharing fails due to a lack of common data models and infrastructures supporting such models. We developed a comprehensive infrastructure, including a data extraction tool (CARAconnect) to combine data from different general practices with different PMS, circumventing the need to streamline and link data at the source. The tools can easily be adapted to include/exclude specific data fields to adhere to ethical and General Data Protection Regulation requirements. Additionally, the infrastructure supports visualising within-practice and between-practice comparisons (anonymised). The CARA infrastructure offers a blueprint for extracting, transforming and sharing data from data silos and presenting data in interactive visualisations.

How this study might affect research, practice or policy

  • Visualising health data allows comparisons within and between general practices, countries and institutions. However, the challenge with data stored in data silos remains and prevents the possibility of aggregated data visualisation to understand patterns and provide better treatments and interventions. The CARA infrastructure described in this paper supports data extraction, transformation and sharing across data silos.

Introduction

Patient management software (PMS) systems were introduced in the 1990s in Ireland and have a client-server architecture allowing general practitioners (GPs) to access patient information from within the practice.1 2 Data is stored centrally on the practice server in multiple separate tables, adding tables during a continuing data modelling process to include more or new information. This has resulted in a complex database with an expansive list of tables interlinked through various relationships defined in consultations, patients, prescriptions and other codes. Despite efforts to achieve data interoperability, the data remains sequestrated within the general practice, which acts as a data silo.

PMS is organised around consultation to support the management of individual patients and the organisation of general practices. However, PMS does not allow straightforward practice-level data aggregation to visualise, compare or benchmark patient and consultation data. This hampers overall care improvement as detecting trends, revealing occurrences, reducing the incidence of medical errors and identifying quality improvement opportunities require within-practice and between-practice comparisons.3

Four accredited PMS systems are used in Ireland but two main PMS systems account for over 90% of the practices.4 The general practice is that the data controller and GPs can, as part of training, teaching, auditing and research, analyse their practice data.5 An annual clinical audit is a requirement for the renewal of their clinical competence certification by the Medical Council, and its provision as part of a (research) project has been shown to increase participation and commitment from GPs.5

The Irish health system is characterised by a complex structure involving a mix of public and private financing and service delivery.6 Ireland does not have universal health coverage. Approximately 31% of the Irish population are entitled to a general medical services card, while a further 11% have a doctor-visit card and both secure free access to general practice care.7 Medical cards are allocated based on age, economic factors or long-term illnesses. Patients without a medical card pay per GP visit (up to €70) and often take out additional private health insurance.8 Detailed primary care information is only available to the Health Service Executive (HSE) from medical card holders, as part of the HSE’s payment to GPs for their service. The HSE directs the public healthcare system, including acute care, primary care and community care services.9

In 2019, the Irish Medical Organisation, the HSE and the Department of Health reached an agreement to support and maintain general practice services and ensure the effective interoperability between PMS systems and the HSE information technology systems to integrate some information in eHealth Ireland.7 However, records are not accessible by GPs once submitted and no overview of practice-level or individual feedback is provided. Another improvement initiative was developed for antibiotic prescribing in which quarterly feedback is provided on antibiotic prescribing for medical card patients.10 The design of this feedback has been reported to be complex and without an interactive option to query practice data, while it only includes information from public patients, excluding antibiotic prescribing for private patients.11

The COVID-19 pandemic highlighted the importance of data accessibility and sharing across countries and institutions, and the initial fragmented public health response to the pandemic demonstrated the value of timely, publicly available data.12 The pandemic also led to some global bodies collecting datasets at the country level and illustrating the situation in interactive dashboards.13 Visualising COVID-19 data allowed comparisons within and between countries; however, the issue with data stored in isolated data silos remained. This challenge persists in general practice where reusing and sharing accurate and detailed health data recorded by GPs and stored in PMS systems is not possible.14

CARA set out to provide a sustainable infrastructure to help GPs understand their patient population and their disease management in addition to monitoring their prescribing through the use of dashboards. The first exemplar dashboard focused on antibiotic prescribing to develop and showcase the proposed infrastructure, including an audit tool as well as filters (within-practice) and between-practice comparisons. This paper explains the design and development of the CARA infrastructure, which consists of CARAconnect, a data extraction tool, a data model (to combine data from different PMS systems) and the CARA dashboard for use in Irish general practice.

Methods

Definition of requirements

The development of a valuable integrated infrastructure for healthcare data is a complex task and demands a very close interaction with the key stakeholders—GPs. Therefore, an agile development process was applied, in which a user-centred design approach was taken to ensure the prioritisation of GPs’ needs at every project stage.15 16 A number of user stories were created in cooperation with seven GPs to describe possible data exploration scenarios which were used to create data requirements. These requirements informed the data modelling process and a set of variables for the initial data extraction.

For example, one user story focused on a GP’s need to visualise their practice population, which formed the basis for a set of ‘patient characteristics’ variables. This process was iteratively reviewed and adapted to ensure that the extracted data still fulfilled the requirements of the user story. The user story further informed a ‘concept dashboard’, used as an example when involving GPs and creating new user stories and thereby actively involving GPs in proposing, developing and identifying visualisations of interest and consequent extraction requirements. An overview of the requirements, its challenges and the solution proposed is shown in table 1.

Table 1
|
Main requirements

Data integration

A set of variables for the initial data extraction was identified, and a data transformation pipeline was created to transform the extracted data into bespoke data models. To this end, practice data was extracted from various tables stored on the practice server and processed before being moved into new data models. Data integration uses the extract, load and transform process (ELT) (figure 1), where data from general practices is ‘Extracted’, ‘Loaded’ into a data warehouse and ‘Transformed’.17 This ensures that data necessary for any transformation is always loaded and does not need to be re-extracted.

Figure 1
Figure 1

Extract, load and transform approach.

Coding systems

The data extracted includes the two main coding systems for classifying disease and therapeutic prescriptions, which are used by general practices in Ireland.18 The International Classification of Primary Care (ICPC) was designed to capture the interaction between the GP and the patient and is structured around the consultation. ICPC has fewer diagnosis codes than other systems, such as the International Classification of Diseases (ICD-10).18 PMS includes both standard coding systems for coding consultations and diseases.18 However, GPs are not incentivised to code and few GPs code consultations, making the prevalence of common diseases difficult to measure accurately.2 Coding in Irish general practice has improved for a few selected conditions since the introduction of the chronic disease management (CDM) programme in 2020. CDM was integrated into the four accredited PMS systems resulting in the automatic recording of CDM disease codes.19

Therapeutic prescriptions are coded within Irish PMS using the Anatomical Therapeutic Chemical (ATC) code, a unique code assigned to any medicine according to the organ or system it works on and its action.20 The classification system is maintained by the WHO. ATC codes have five levels, the highest level is a letter ‘X’ for the main group (eg, J), the second level is two numbers ‘##’ for the therapeutic group (eg, 01), the third level is a letter ‘X’ for the pharmacological subgroup (eg, C), the fourth is a letter ‘X’ for the chemical subgroup (eg, A) and the final, fifth level is the active substance ‘##’ (eg, 01). This results in ATC codes written as X##XX## (ie, J01CA01 for the antibiotic ampicillin). This allows easy classification of medicines to specific or larger therapeutic groups.20

Results

The CARA data processing infrastructure encompasses the general practice database, secure data extractor and uploader, data integration component and a database that connects to the relevant dashboard interfaces. Figure 2 shows the architecture of the CARA infrastructure, which includes the ELT process to extract, load and transform de-identified practice data into data models, the Representational State Transfer Application Programming Interface (REST API), which provides the security barrier between the data models and their visualisation on the CARA dashboard.

Figure 2
Figure 2

The architecture of the CARA infrastructure.

Data integration

Data extraction and loading

CARAconnect was created to facilitate the data extraction process. The data extraction and loading process encompasses selecting relevant data from the practice database, data de-identification and de-identified data upload to the CARA remote servers. In order to conduct this process in an automatic, structured and secured way, a desktop application was developed to streamline this task and assist the GP. CARAconnect was envisaged as an easy-to-use application once the practice was registered. The link to download CARAconnect is sent in an email to the practice secure email account (see CARA registration process below) and can easily be downloaded and initiated by a double click. On activation, CARAconnect identifies the practice server(s). On the practice server, the database is identified and selected fields from different tables are automatically extracted and securely uploaded to the infrastructure and processed before saving into the new data models during the data transformation steps. At each stage, confirmation by the GP is requested to start extraction and to finalise and upload/send the data.

Through extensive exploration and elimination, a basic understanding of practice databases was developed, which was tested with a reference (anonymous) dataset to finalise specific tables needed to fulfil the data model requirements. CARAconnect facilitates the secure data extraction process based on the variables and tables identified.

Data security and confidentiality

CARAconnect extracts de-identified data from the practice database and ensures data security through a practice-specific login-based access with two-factor authentication. GPs can view their practice data but are only allowed to view aggregated practice data from all other participating practices to avoid possible identification of another practice. Uploads are for a static period to irrevocably de-identify the practice data. Any new data upload overwrites previously uploaded data. Data extraction does not include any patient identifiers or free text, nonetheless, specific technical identifiers needed to link together data in different tables are hashed and extracted. As combinations of specific variables with other externally identifiable data sources may potentially lead to identification, two additional processes were applied to facilitate the use of specific technical identifiers to link data in different tables21:

  • Salted hashing is unidirectional encryption and decryption is almost impossible. Salting introduces an additional random part (a set of strings of fixed length) to a hash function to create a one-way function. The random part remains the same during extraction but is unique for every extraction.21

  • k-anonymisation is applied to the data from all other practices, with k set at 5. This guarantees that a minimum of five similar patients are included in any comparison (between practice) visualisation. A k-anonymised dataset implies that each record is de-identified from at least k - 1 other.

To accommodate the patients’ right to object to the processing of personal data, which provide the option to exclude their data from data processing for research purposes in accordance with General Data Protection Regulation (GDPR) requirements,5 a data entry field was identified in the PMS to indicate the exclusion of a record.

Data transformation

The data extracted and loaded during the previous steps is not fully compatible with the desired output format for analysis and dashboard creation, and data quality, compatibility and usability must be ensured. In order to convert the data into the desired format, bespoke target data models were created based on the data requirements gathered (figure 3) and a number of data transformations were written to populate these models. The bespoke data models ‘CARA Consultations’ and ‘CARA Prescriptions’ represent the main sources of data for the dashboard visualisations and constitute the final product of the data transformation process.

Figure 3
Figure 3

Data requirements gathered to data model.

Data transformations include date formatting, field calculation (such as age), mapping to different standards (such as ATC, ICPC or ICD-10), mapping to created taxonomies (such as green and red antibiotics) and data aggregations (such as antibiotics aggregations per consultation).

For the purpose of visualisations, the ICD-10 codes were mapped to the ICPC codes, taking a pragmatic approach and considering their occurrence in general practice. For the exemplar dashboard for antibiotics, the antibiotic ATC code J01 was used and subdivided into classes J01C. Additionally, a second categorisation implemented as part of a national antimicrobial stewardship initiative that divides antibiotics into green (preferred) and red (non-preferred) antibiotics guidelines was used.10 22

CARA Consultations and CARA Prescriptions are not the only data models created. Two other data models were created to support the dashboard functionality. ‘Practice User’ represents the general practice and conceptualises its main attributes, including the credentials for login. ‘Practice Details’ depicts the practice in-detail aspects, such as location, type of practice (rural/urban/mixed) and laboratory they send their samples to.

CARA dashboard

The development and testing of the CARA dashboard have been described in detail elsewhere.23 CARA dashboard was developed as a web application for GPs to visualise and explore their data and metrics visually and interactively. The data models created in data transformation connect to the CARA data dashboard and enable the analytical features. To create a visualisation from the data models, a REST API was set up. It allows applications or devices to communicate with each other using Hypertext Transfer Protocol (HTTP) methods. In this case, a REST API is used for communication between the dashboard running in the GP internet browser and the CARA infrastructure. The REST API handles user authentication and registration and provides a secure interface to query different models using aggregations and filters.

The CARA dashboard customises open-source tools and different software frameworks and libraries for data manipulation, transformation and visualisation, using Python, Javascript and a Postgres database. The dashboard is provided through charts (pie, bar, scatter and line) and allows GPs to compare their data with data from other unidentifiable practices. The visualisations can be used to generate automatic audit reports. The dashboard is based on GP requirements as elicited from a series of interviews. This led to the creation of a prototype that was tested with users before being included in the overall data infrastructure.

CARA registration process

The CARA network registration process for general practices follows a well-defined workflow to ensure secure and efficient onboarding (figure 4). Initiating the registration, new practices provide necessary details, and the system validates that the email is associated with the closed and secure email service adopted in Irish general practice and hospitals as the primary mechanism for secure communication between health systems (hospitals, laboratories, general practices, pharmacies).7 On entering the registration details, a one-time password (OTP) is generated and sent via email to the provided secure email address. Users are required to enter the OTP within a specified time limit. In case of delay, an option to regenerate a new OTP is available. Following successful OTP validation, users proceed to fill out the registration form, where confirmation of terms and conditions (GP agreement) is a prerequisite before finalising the registration. The GP agreement includes a detailed list of the data that will be extracted and an explanation of the aggregated use of this data for practice comparisons and for research purposes. Afterwards, an email is sent to the secure email address, facilitating the download of the CARAconnect application. The users can subsequently login using their registered credentials, ensuring a seamless and secure experience throughout the CARA network registration and login process. At every subsequent data upload, the GP agreement has to be re-confirmed.

Figure 4
Figure 4

CARA registration process for general practices.

Discussion

This paper describes the design and development of the CARA infrastructure to help GPs explore and visualise their practice data and compare this to the combined data from all other practices using interactive data dashboards. CARAconnect allows seamless connectivity for and compatibility with the main PMS, as well as making it possible to include other PMS in the future by specific identification and mapping of the relevant data fields.

A unique feature of CARA infrastructure is using ELT instead of ‘Extract, Transform, Load’ (ETL). ELT includes faster loading times, more flexibility, the possibility of cloud computing to deliver data in real-time and the separation of potential errors between the loading and the transformation stages.17 While the ETL approach has been implemented successfully in healthcare systems,24 using ELT is more effective in ensuring successful data integration.25 This implementation should consider factors from the system quality (availability, responsiveness, reliability, usability, capability, compatibility, safety and maintenance) and data quality (perfection, validity, consistency, accuracy, integrity and timeliness).25 The CARA infrastructure will continue to integrate other data sources, and using ELT will avoid potential data problems (missing data, mismatched data, etc).

The CARA infrastructure ensures data security and confidentiality by extracting de-identified data, thereby addressing GDPR considerations concerning privacy and security.26 Data minimisation is applied as only limited data on prescribing and consultations is extracted at a practice level, avoiding the identification of individual GPs. Any personal patient data is de-identified and anonymised, and individual patients can request their data to be excluded from any extraction.

The CARA infrastructure is unique, in that it combines primary data sources (practice databases), a user-centred approach (involving GPs in the design and testing of the CARA dashboard throughout) and an extensive architecture. Other similar projects have included only administrative health databases as data sources,27 do not offer a description of either the architecture or the data integration used or omit a description of user involvement.28 29 This has allowed the CARA network to ensure the project addresses users’ needs and will be scalable and sustainable beyond the life of the project.

A limitation of the CARA infrastructure is that it was developed based on two PMS in Ireland. Other systems, such as different PMS or electronic health records, may require adapting the current mapping system, which could involve challenges around data integration, breadth of use, data complexity and statistical rigour.30 Another limitation is that the CARA infrastructure has yet to be implemented in real-world practice, posing a challenge in optimising its functionality. However, the innovation of the CARA infrastructure lies in its ability to offer a novel approach to overcoming data silos while providing a framework that can be universally applied across diverse systems. The CARA infrastructure is the first step in tackling data fragmentation, visualising aggregated data and exploring treatment strategies and interventions. However, the innovative implication of the CARA infrastructure lies in its ability to offer a novel approach to overcoming data silos while providing a framework that can be universally applied across diverse systems. Patient consultations are typically documented in various PMS in general practice. While these systems facilitate the efficient administration of consultations, PMS often lack the capability for in-depth analysis of aggregated patient data or the ability to compare and benchmark with other general practices. Therefore, the CARA infrastructure serves as a pivotal initial step in tackling the issue of data fragmentation, enabling the visualisation of aggregated data to discern patterns and enhance treatment strategies and interventions.

Conclusion

CARA set out to provide a sustainable infrastructure to help GPs understand their patient population and management in addition to monitoring their prescribing through the use of data dashboards. Currently, two dashboards are presented, a practice overview and an exemplar dashboard focused on antibiotic prescribing, which includes an audit tool in addition to filters (within-practice) and between-practice comparisons. This provides a reproducible template to access and visualise general practice data in a setting where a centralised database does not exist. CARA supports evidence-based decision-making by providing GPS with valuable insights through interactive data dashboards to optimise patient care, identify potential areas for improvement, and benchmark their performance against other practices.