User testing of a diagnostic decision support system with machine-assisted chart review to facilitate clinical genomic diagnosis
•
...
Abstract
Objectives There is a need in clinical genomics for systems that assist in clinical diagnosis, analysis of genomic information and periodic reanalysis of results, and can use information from the electronic health record to do so. Such systems should be built using the concepts of human-centred design, fit within clinical workflows and provide solutions to priority problems.
Methods We adapted a commercially available diagnostic decision support system (DDSS) to use extracted findings from a patient record and combine them with genomic variant information in the DDSS interface. Three representative patient cases were created in a simulated clinical environment for user testing. A semistructured interview guide was created to illuminate factors relevant to human factors in CDS design and organisational implementation.
Results Six individuals completed the user testing process. Tester responses were positive and noted good fit with real-world clinical genetics workflow. Technical issues related to interface, interaction and design were minor and fixable. Testers suggested solving issues related to terminology and usability through training and infobuttons. Time savings was estimated at 30%–50% and additional uses such as in-house clinical variant analysis were suggested for increase fit with workflow and to further address priority problems.
Conclusion This study provides preliminary evidence for usability, workflow fit, acceptability and implementation potential of a modified DDSS that includes machine-assisted chart review. Continued development and testing using principles from human-centred design and implementation science are necessary to improve technical functionality and acceptability for multiple stakeholders and organisational implementation potential to improve the genomic diagnosis process.
Summary
What is already known?
There is a need in clinical genomics for tools that assist in analysis of genomic information and can do so using information from the electronic health record.
Such tools should be easy to use, fit within clinical workflows, and provide solutions to priority problems as defined by clinician end-users.
Natural language processing (NLP) is a useful tool to read patient records and extract findings.
What does this paper add?
We demonstrated the use of Human-centred design and implementation science principles in a simulated environment for assessment of a new version of a decision support tool prior to large-scale implementation.
This study provides preliminary evidence that a clinical decision support tool with machine-assisted chart review is acceptable to clinical end-users, fits within the clinical workflow, and addresses perceived needs within the differential diagnosis process across all Mendelian genetic disorders.
Terminology codes for diagnostic decision support systems should have levels of granularity tuned to the sensitivity and specificity appropriate to its various functions, for example, NLP versus chart documentation.
Introduction
Clinical decision support (CDS) integrated into electronic health records (EHRs) has long been considered a promising way to improve patient outcomes and decrease inefficiencies.1–4 It is also recognised that CDS must be designed with the user in mind, fitting the concepts of human-centred design with computer interfaces at the individual clinician level.1 5 Design alone, however, is insufficient to facilitate implementation. For CDS to impact care and patient outcomes, it must fit within clinician workflow and provide a solution to a priority problem for the clinician and the healthcare system.4 6–8
Diagnostic decision support systems (DDSSs) are a key type of CDS needed in genomics to supplement a shortage of trained clinicians and address the inherent complexity of genomic diagnosis.9 10 This complexity arises from the heterogeneous nature of genetic diseases, the variable expression in patients and the degree of overlap in findings (ie, signs, symptoms and test results) among genetic conditions, sometimes differentiated only by onset age of individual findings.11 Position statements and a systematic review note two new functions needed for DDSSs in genomics: (1) a cost-effective, regular approach to re-evaluation of patient cases in light of new findings or genetic knowledge, when testing does not immediately yield a diagnosis; and (2) developing machine-assisted chart review.12 13 Most genomic patient records are extensive with input from by multiple clinicians, such that manual review is prohibitively time-consuming; resulting in added costs from repeated or unnecessary tests and increased risk of missed information that could have facilitated timely diagnosis. Because most of the relevant information is in unstructured clinical notes, approaches such as natural language processing (NLP) are needed to automate and assist this manual process.
To address both re-evaluation and automation, we adapted a commercially available DDSS already capable of incorporating genomic sequencing data to perform automated chart review and present the information to a clinician in the form of findings obtained through structured data mining and NLP of an EHR. We then created clinical case vignettes to simulate the real-world clinical diagnostic workflow for user testing. The goal was to provide preliminary evidence of usability, perceived fit with clinical need and workflow, and potential for implementation into the real-world clinical environment.
Methods
Setting
Development of the clinical case vignettes, simulated EHR environment, and user testing were conducted at Geisinger, a healthcare system in rural Pennsylvania.
Adapting a DDSS for machine-assisted chart review of clinical findings
We adapted SimulConsult’s Genome-Phenome Analyzer, as it is the one DDSS that allows for detailed analysis of clinical information, including pertinent negatives, findings onset information and frequency and treatability of diseases. It has also been shown to be accurate and helpful in clinical diagnosis, including interpreting genomic results.14–16 Described in detail elsewhere,11 14 15 SimulConsult correlates annotated variant call files (VCFs) with patient-specific clinical and family history information; and the underlying algorithms include age-dependent Bayesian pattern-matching and computational metrics of usefulness and pertinence. SimulConsult also generates a Patient Summary for saving interim patient findings and a customisable genomic return of results (RoR) report shown in previous research to be effective for facilitating standardised communication for patients and referring clinicians.17–20 When clinicians enter findings, the DDSS returns a ranked list of candidate diseases and suggestions of other findings to check, ranked by usefulness in narrowing the differential diagnosis in a way that accounts for cost and treatability; thus facilitating the iterative approach of information gathering in diagnosis.21 22 For each finding, th presence (with onset age) or absence can be specified (figure 1).
SimulConsult main interface showing ranked list of candidate diseases and guidance for entering finding presence (or absence) with onset age.
We used the Logica platform to create a simulated EHR and the cTAKES tool with the Unified Medical Language System (UMLS) module23 for NLP of patient notes. Steps in adaption included (1) mapping DDSS findings to Human Phenotype Ontology (HPO) and UMLS codes, including creation of hundreds of new HPO terms resulting in creation of new UMLS concepts, (2) using results from NLP analysis of EHR notes to flag ‘Mentions’ of the findings used by the DDSS and (3) augmenting the DDSS’s interface to present the flagged findings with contextual information needed to clinically assess the information (table 1).
Table 1
|
Adaptations made to existing DDSS to create GPACSS
Architecture of the Genotype-Phenotype Archiving and Communication System with SimulConsult (GPACSS). The key components are the coordination/archiving system, the DDSS and the NLP. DDSS, diagnostic decision support system; EHR, electronic health record; NLP, natural language processing.
The architecture of the resulting prototype, called the Genotype-Phenotype Archiving and Communication System with SimulConsult (GPACSS), is shown in figure 2.
Clinician review of the flagged findings created from the automated findings search using NLP is facilitated through flag icons (figure 3). Through this ‘machine-assisted’ chart review, the clinician reviews flagged findings and decides whether and how to specify presence (with a particular onset) or absence (or omit) as shown in figure 1. The mapping of DDSS findings to multiple UMLS concepts was chosen to minimise false negatives in concept identification; relying on the user decisions about findings and the limited set of UMLS concepts to minimise false positives (table 2).
Flagged findings with EHR text display for DDSS. A finding having a flag icon indicates that information was found in the EHR. Clicking the flag shows the various mentions of the flagged finding. DDSS, diagnostic decision support system; EHR, electronic health record.
Table 2
|
Solutions for mnimising false positives and negatives identified through NLP and DDSS by clinician review
Creating simulated cases
Three cases of increasing complexity were created using real but deidentified clinical phenotypic and time course data from medical notes of Geisinger patients with known genetic diagnoses (online supplemental table 1). Cases were selected for conditions of varying complexity yet relatively common in the context of rare disease and where diagnosis might be difficult using phenotype alone. Simulated cases were created by research assistants trained in capturing information from the EHR, supervised by a practicing Geisinger clinician certified in genetics and informatics. The three final cases were reviewed by a second Geisinger physician certified in genetics and informatics prior to user testing.
Case vignettes for the test scenarios assumed that some patient characterisation was previously noted by the clinician and genomic results were now available and could be interpreted with clinical information available in the EHR (online supplemental figure A). For the three cases, a total of five findings were used as initial information before the genomic results, with three (one per case) being flagged findings identified through NLP. This created a ‘near live’24 experience within the simulated EHR for user testing while limiting the expense and time of EHR integration during this preliminary phase.
User testing methods
Participants
GPACSS is both a DDSS and communication tool to facilitate utilisation of genomic and phenotypic information available in the EHR by all clinicians to improve patient care within a healthcare system. Therefore, we purposively selected primary testers from Geisinger staff representative of current end users of the genome-phenome analyzer. Because a limited number of individuals at Geisinger regularly engage in using genomic information for differential diagnosis, we followed guidance recommending 3–5 evaluators for preliminary usability testing.25 A group of secondary testers (inclusive of a pilot tester) with other roles in the genetic testing and interpretation process were purposively selected for potential broader utilisation in the healthcare system.
Testing sessions
At the beginning of each session, testers viewed a 4 min training video (https://simulconsult.com/videogpacss) beginning from saved patient findings, then importing a VCF, and review of flagged findings to make a diagnosis and create a customisable patient-friendly RoR report.
A semistructured interview guide (online supplemental file 3) was created to elucidate factors relevant to human factors in CDS design (information, interaction, interface)1 5 26 and organisational implementation (acceptability, perceived need, feasibility, workflow fit).27 We used a think aloud24 approach where testers were asked to verbalise thoughts while using the GPACSS prototype with the interviewer asking questions as needed and at key points in the testing to create a cognitive walkthrough with heuristic evaluation.25 28 Testers were invited via direct contact from study staff and provided a description of the study. At the beginning of each session, study staff reviewed a study information sheet and obtained verbal consent to participate. Test sessions lasted 2 hours and testers received a US$100 gift card.
An experienced interviewer (AKR) and observer (MAW) from Geisinger worked with each tester to imagine using GPACSS for each test scenario. The interview and process were piloted with a cancer genetic counsellor reviewing one test vignette. At the end of the session, testers were asked a series of study-specific questions using a 0–10 rating scale (hard to easy) to rate the overall usefulness, satisfaction, and navigation. Transcripts were created from the audio portion of each session and the computer screen was video recorded to capture tester movement through GPACSS.
Analysis
Two Geisinger coders (MAW and JCR) viewed each user test session recording, read transcripts and created a codebook of themes identified across sessions. Transcripts were coded and the corresponding quotes were organised into a matrix using the three categories of CDS components (information, interface and interaction) identified by Miller et al,1 and categories of acceptability, perceived need, feasibility and workflow fit according to Rogers’ Diffusion of Innovations in organisations constructs.27 Coders analysed transcripts independently and reviewed for agreement with discrepancies resolved by the primary author.
Results
Three clinicians currently using genomic information to diagnose patients participated as primary testers: a paediatric geneticist (orders exomes daily), internal medicine physician (orders 4–5 exomes per month) and a paediatric genetic counsellor. Three additional clinicians participated as secondary testers; representing broader usability within the healthcare system: the pilot tester (cancer genetic counsellor), a laboratory director (conducts variant interpretation) and a laboratory genetic counsellor (conducts variant analysis).
GPACSS usability: human factors of CDS design
Overall impression of the prototype was positive. Testers raised general issues relevant to human factors in CDS design.1 5
Interface
Testers liked the flagged findings (figure 3), the contextual information for each mention in the EHR, and the rank ordering of flagged findings by usefulness. The visualisation of the evolving differential diagnosis and the automated RoR report for sharing with patients and referring clinicians, including the ability to save and access this report from the EHR were also appreciated.
The interface was noted to be complex, but testers stated this was expected due to the inherent complexity of genetic diagnosis and that they anticipated a learning curve to develop proficiency. Placement, positioning and the multiple presentation layers (text and graphics in the interface)1 were well liked. In particular, the ‘Assess diagnosis’ display was noted as valuable because it made transparent the logic used by the DDSS in comparing patient findings to information about the disease. Of note, each tester interpreted differently the meaning of the graphical bars and shading, however, this did not hinder their ability to make the diagnosis, and the bar itself was appreciated as a design feature. To help with interpretation, more labelling was suggested (table 3).
Table 3
|
GPACSS usability: human factors of CDS design and Organisational implementation factors through tester Experiences*
Interaction
Testers were thoughtful and purposeful using GPACSS. Notably, in case 3 (the most complex case), one primary tester did not immediately choose the top diagnosis offered by GPACSS. Supported by the data displayed, the tester indicated that to make a definitive diagnosis they would next evaluate for the second-ranked disease—as that condition had a test that was easy and accurate and the condition was also more treatable—indicating utilisation of the DDSS as intended and consistent with clinical diagnostic decision making.
Testers initially expressed concern around ‘too many clicks’ and ‘click fatigue’ but noted as they progressed through the cases that the clicking was unavoidable and necessary. For example, they saw value in taking the time to correctly specify onset information (which requires clicking and cognitive load in the DDSS), as this is part of the genetic diagnostic process. ‘Cognitive Load’ in DDSS testing refers to additional thinking required to interact with the tool, and the general recommendation is to minimise this in CDS design.1 Testers who commented on the cognitive load required to review flagged findings and choose age of onset noted the cognitive load as similar to completing this task without GPACSS.
Information
Testers appreciated resources such as the hover feature that revealed synonyms to findings and requested even more hovers and infobuttons. Confusion over some terminology occurred, notably ‘zygosity’ and ‘severity score,’ when reviewing the genomic variants; as only some testers located the explanatory resource for these terms.
The fact that the EHR ‘Mentions’ displayed in flagged findings were sometimes triggered by parent or by child concepts was noticed by all testers, and some stated the findings used in the DDSS were not as granular as they were expecting. Regardless, testers recognised and emphasised the importance of being able to review the ‘Mention’ information from the EHR and manually adjust for any false positives and false negatives from the NLP process.
For the primary testers, satisfaction averaged 8.5 out of 10 (range 8–9.5) and navigation ease averaged 8 out of 10 (range 7.5–9). All three felt GPACSS would save time throughout the clinical process, with one primary tester estimating it at 30%–50%. Specific value in time saved was noted for chart review by all testers.
Perceived need
The RoR report and detailed prognosis table20 generated in each scenario was highly valued for being standardised and for its ability to communicate complex genetic information to patients and other clinicians (table 3). The RoR report was also noted as an improvement over current laboratory reports; with one tester stating it was ‘where the most utility would be’(Tester 4).
Testers exhibited learning and familiarity with GPACSS as they progressed through the testing session; appreciating the DDSS assistance as each vignette increased in complexity; noting ‘It takes it [clinical diagnosis and diagnostic thinking] to a higher level’. [Tester 2]. Primary testers expressed readiness to adopt the tool in clinical practice; and one (paediatric geneticist) suggested GPACSS could also serve as a differential diagnosis training tool for medical students and residents in their clinic.
Two secondary testers (lab director and variant analyst) expressed enthusiasm that GPACSS could fill a need for in-house sequencing laboratories because full EHR data would be available during sequence interpretation. These testers also hypothesised that the ability to periodically re-analyse an existing VCF in minutes using GPACSS would improve the diagnosis rate over time.
Workflow fit
The three primary testers noted that the GPACSS process as tested fit with their clinical workflow diagnosing genetic conditions. As an added benefit, they described how using GPACSS also helped them learn about diseases and associated findings with which they were less familiar (table 3).
The three secondary testers questioned GPACSS fit with a clinical genetic testing workflow in which only a report with variants labelled as to pathogenicity and association with a condition (implying a clinical diagnosis) is received from an external lab. However, they did identify value and possible workflow fit for situations with uncertainty as to the diagnosis after sequencing or where flagged findings and the usefulness ranking would allow clinicians to review the EHR with flagged findings in light of the genomic information to make the diagnosis.
Discussion
We provide preliminary evidence through user testing in a simulated real-world clinical workflow that the combination of NLP with a CDS tool optimised to support the clinical process of differential diagnosis may address the needs of those involved in this complex task. Such assessment of fit is critical if CDS is to fulfil the promise of standardising and improving care.1 4 5 8
Technical issues related to the interface and interaction of CDS design were minor and fixable; as were issues with design layout. Despite initial remarks on the number of clicks and cognitive load, testers acknowledged these as necessary to the genetic diagnosis process and no different than without the DDSS. Other issues related to terminology and usability could be solved and evaluated in future usability studies through a combination of training, added infobuttons and experience using GPACSS. Some of the technical gaps noted and additions requested by testers are addressed within GPACSS, however, the 4 min training video was created to provide enough instruction only to facilitate user testing. These results, therefore, provide direction for training and ongoing reference materials for future implementation.
For CDS to be acceptable and implemented by clinicians and organisations, it must fit with the real-world workflow and must present a solution to a perceived need.5 27 All primary testers identified ways GPACSS added such value and fit and noted ways GPACSS filled multiple needs in their diagnostic workflow. Workflow fit was highest among primary testers but opportunities for workflow fit were described by all testers. GPACSS was also noted as acceptable for implementation by all testers regardless of individual issues identified and suggestions for technical improvements.
Limitations
To facilitate user testing of GPACSS in the context of clinical workflow prior to full integration and implementation, simulations of the real-world were required. Because this study used the Logica EHR simulation, benefits or drawbacks of GPACSS in a production EHR could not be directly observed. Also, full annotations for the causal variants were not included in the variant table for the simulated patients limiting full assessment of the value of the DDSS in variant interpretation. This impacted the understanding of the ‘severity score’ by all testers, as the annotation information that would have been provided for a real patient was not included for the simulated cases. Finally, the generic cTAKES NLP using the UMLS concepts found only 20 of the 30 (67%) pertinent positive concepts within the test cases that a paediatric neurologist (MMS) identified manually. This was sufficient for GPACSS to generate the correct differential diagnosis for user testing, as further enrichment of the generic NLP to improve detection and avoid false positives was out of scope for this preliminary user testing.29 Subsequent automated search for UMLS terms for flagging and addition of a separate stage of text search enrichment for terms missed by the NLP such as ‘tall’ improved NLP yield to 30 of 30 (100%).
This simulated EHR and user testing were a necessary first step and provide data to guide implementation of GPACSS. NLP improvements and additional beta testing within an actual EHR, in real-world clinical workflows, with real patient results and in real-world clinical workflows will be necessary to fully assess individual user-level and organisational-level facilitators and barriers to use, implementation and impact on clinical care. Such studies are currently in progress.
Conclusions
This study provides preliminary evidence for the usability, workflow fit, acceptability and implementation potential of a DDSS that includes machine-assisted chart review. Overall, responses suggest the GPACSS prototype is usable based on technical CDS and human-centred design criteria, addresses perceived clinical need, and has good fit within the real-world clinical workflow of genetic testing and diagnosis. Further development is needed to improve usability for multiple clinical stakeholders and organisational implementation.