Results
The search strategy yielded 17 included articles; these varied widely in terms of quality, type, size of study population, methods and conflicts of interest (online supplementary table S2). It was difficult to distinguish clearly between online triage and other features like e-consultation since systems like ‘eConsult’ also contain a built-in triage function where initial decisions are taken by a combination of the user/patient, doctor/nurse and algorithms.12 The larger observational studies were all multifunctional, with the triage function only one feature among others such as help for the self-management of various conditions and communication platforms with health professionals.9 13–15 These articles were all included as long as they included some kind of digital triage tool as defined earlier.
Characteristics of the articles
Five articles described studies based on mixed methods that combined quantitative and qualitative data as well as retrospective and prospective data. Three articles were considered expert opinions. One article described a case report. Four articles described studies of accuracy outcomes. Two articles described observational studies. Two articles were reviews (online supplementary table S2). The articles originated from the USA (n=5), the UK (n=7), Australia (n=4), New Zealand (n=1) and The Netherlands (n=3) and were published between 2001 and 2018. Four articles described the use of clinical vignettes to test the triage tools. Six articles described studies that enrolled real patients. One article described the methods very poorly, and it was not possible to draw conclusions; this article was considered an expert opinion. The studies enrolling real patients had very few subjects who actually used and evaluated the digital tools. One study reported only two e-consultations per 1000 patients per month.13 Another study enrolled 13 133 potential online patients and ended up with only 35 patients going through the complete follow-up, and only 20 patients who actually complied with the advice.15 In a third study that enrolled 80 546 patients, only 6.5% completed the evaluation during a 6-month period.14 At least three articles had clear conflict of interests, since the authors had invested in the AI tool they were evaluating.16–18
Categorising the data
The included articles covered the three main aspects of the overall scope: (1) how to design a digital triage tool, (2) how to implement an existing tool and (3) evaluation of diagnostic accuracy. Three articles contributed critical views on the topic.19–21
The design
Four articles explored the optimal design of an implementable digital AI triage tool.16 22–24 One article pointed out that triage tools should be evaluated in realistic situations on a broad set of randomised cases, in contrast to testing symptom checkers using clinical vignettes which do not reflect real-life complexity or the everyday language of a typical patient. Investigation of the symptom checkers with respect to the balance between correctly identifying a disease and risk of missing a critical diagnosis should be of particular focus.22 Symptom checkers were thought to have great potential for improving diagnosis, quality of care, and health system performance worldwide. However, poorly designed systems could potentially put patients at risk and could even increase the load on health systems if they are too risk averse.22 Implementation of evaluation guidelines specific to each symptom checker was found to be very important for facilitating the development and wide-scale use of the system.22
Another article found that implementing the system ‘Tele-Doc’ resulted in redistribution of work from the doctors to the administrative staff and patients. There was little evidence of any efficiency gains.16 This system appeared to implement a very low level of AI.
A third article focused on the design of an AI-powered decision support system for patients. The main finding was that much thought should be put into customising the delivery of the system, based on close consultation with the target users and an iterative development process, until the system is accessible and useful. The design of system content should go beyond the traditional emphasis on scientific evidence to establish patients’ perspectives of options.23
Other findings were that an intelligent triage system must be able to handle uncertainty and gaps in data, as well as subjective descriptions and perceptions of symptoms since data are filled in by patients. It was stated that, in order to work efficiently, it is more important that the correct interpretation is made than that the correct diagnosis is set, initially.24
Implementation
The main focus of six of the articles was on large scale implementing of an existing online digital tool with some level of AI-powered triage for a real life population. The digital tools appeared to be multifunctional with a low level of AI and with access to online GP consultation.9 13–15 25 26 These studies were mostly multimethod observational studies and were designed to explore multiple facets of the overall scopes.
A large multimethod study on the implementation of the digital tool eConsult in 11 practices in Scotland suggested that the workload was not decreased but, in general, that patients who used eConsult felt that they benefited from the service.14 Factors that would facilitate the implementation included: the presence of a superuser; the inclusion of innovative methods for promoting appropriate use of the tool and the engagement of staff in all areas of the practice. Barriers to the implementation included: delays in system start-ups; marketing not being aligned with practice expectations; challenges in integrating eConsult with existing systems and low numbers of eConsultations. Patients’ perceptions of eConsult were generally positive, particularly because of the ability to use it anytime and the option of having an alternative way of communicating with their GP.14
A similar study on eConsult in 36 practices in England found that its use was actually very low, particularly at weekends, with little effect on reducing staff workload. Additionally, e-consultations may be associated with increased costs and workloads in primary care. Patterns of use suggested that the design could be improved by channelling administrative requests and revisits separately.13
Another UK study evaluated the implementation of WebGP in six practices. During the evaluation period, the actual use of the system was limited, and there was no noticeable impact on practice workloads. Introducing webGP appeared to be associated with shifts in responsibility and workloads between practice staff and between practices and patients. Patients using E-consultations were somewhat younger and more likely to be employed than face-to-face respondents. The motivation for using webGP mostly concerned saving time.25
A large observational study from the Netherlands on a population level evaluated the effect on healthcare usage of providing evidence-based online health information. The study showed that, 2 years after the launch of an evidence-based health website, nationwide primary care usage decreased by 12%. This effect was most prominent for phone consultations and was seen in all subgroups (sorted by sex, socioeconomic status and age) except for the youngest age group. This suggests that eHealth can be effective in improving self-management and reducing healthcare usage in times of increasing healthcare costs.9
Another study from The Netherlands concluded that their web-based triage contributed to a more efficient primary care system, because it facilitated the gatekeeper function.15 Over a period of 15 months, 13 133 individuals used the web-based triage system and 3812 patients followed the triage process to the end. Most commonly (85%), the system advised contacting a doctor but in 15% of the cases the system provided fully automated, problem-tailored, self-care advice.15
The author had earlier reported that less well educated patients, elderly patients and chronic users of medication were especially motivated to use e-consultation, but these patients also reported more barriers to using the system.26
Accuracy
In four articles, the main focus was exploring the accuracy of AI-powered digital triage tools in diagnosing disorders from clinical vignettes (not real patients) compared with the diagnoses of real doctors or the known correct diagnosis.17 18 21 27 28
The development of an Australian online symptom checker ‘Quro’29 was described in a small study that used 30 clinical vignettes. The accuracy ranged between 83.3% and 66.6%, and 100% of the vignettes requiring emergency care were appropriately recalled.17 It was concluded that the chatbots could be greatly improved by adding support for more medical features, such as location, adverse events and recognition of more commonly used medical terms.17
An article from 2016 (USA) described a direct comparison of diagnostic accuracy between 234 physicians and 23 digital symptom checkers. The physicians significantly outperformed the computer algorithms in diagnostic accuracy: 72.1% vs 34.0% (p < 0.001) put the correct diagnosis first and 84.3% vs 51.2% (p<0.001) put the correct diagnosis in the top three listed. In particular, the physicians were more likely to list the correct diagnosis first for high-acuity and uncommon vignettes. Symptom checkers were more likely to list the correct diagnosis first for low-acuity vignettes and common vignettes.28
The same USA-based author reported in 2015 that 23 digital symptom checkers clearly had deficits in both triage and diagnosis interpretations of clinical vignettes. The triage advice from the symptom checkers was generally more risk averse than necessary; users were encouraged to seek professional care for conditions where self-care was reasonable.27 The 23 symptom checkers provided the correct diagnosis first for 34% of the vignettes (95% CI (CI) 31% to 37%).27 Triage performance varied with the urgency of the condition, with appropriate triage advice provided in 80% of emergency cases, 55% of non-emergency cases and 33% of self-care cases. There were wide variations in performance between the algorithms.27
Analysis and thematic synthesis
We identified several advantages and disadvantages associated with the design and implementation of intelligent online triage tools in a primary care context. The results presented above were used to identify key areas of concern.
Features of an intelligent online triage tool
When designing systems for intelligent online triage tools, it is necessary to have a realistic setting for tests and to use an iterative process of development involving trial and adaptation, with the focus on customised delivery of the service.23 In order to enhance self-help and reduce the strain on the health system, the tool should not be overly risk averse.22 It would also be a major advantage if evaluation guidelines were formulated and implemented, since this would enhance the further development and evaluation of the tools.22 An intelligent triage system must be able to handle uncertainty and gaps in the data as well as subjective descriptions and perceptions of symptoms, since data are provided by patients. Also, for the system to work efficiently, it is more important that the correct interpretation is made than that a correct diagnosis is made, initially.24
Large scale implementation of existing online triage tools
The studies investigating the large-scale implementation of existing online tools found that some major expected advantages were not clearly realised. Several studies found that workloads were not decreased14 25 and sometimes costs and workloads were increased.13 16 There were limited numbers of users of the online tools but, for some groups like patients with daytime work, access to primary care was improved.25 The nationwide introduction of ‘eHealth’ in The Netherlands reduced primary care usage after 2 years,9 and improved self-help by patients. However, 85% of users were advised to seek help from a doctor, even for common symptoms,15 which could have increased pressure on primary healthcare systems. The main hindrances to use were delays and technical integration problems, which could lead to loss of engagement. The presence of a superuser and innovative methods for promoting its appropriate use would facilitate implementation of the system.14 Elderly patients and patients with a low level of education could find that a lack of internet skills are a barrier to the use of online systems.26
Diagnostic accuracy of the triage tool
The four studies exploring diagnostic accuracy all used clinical vignettes, thus limiting conclusions on diagnostic accuracy in a real life setting. The chatbot ‘Quro’ was more accurate in suggesting the correct response for emergency cases than in making correct diagnoses.17 The direct comparison of physicians and digital symptom checkers found that physicians outperformed algorithms in diagnostic accuracy. The symptom checkers had difficulty in interpreting the vignettes with respect to both triage and diagnosis. Triage advice from the symptom checkers was generally risk averse and inappropriate for many of the vignettes. It was suggested that physicians should be aware that patients may be using online symptom checkers and that algorithms could be improved by adding more features such as locations, adverse effects and recognition of more commonly used medical entities.17 The evidence on diagnostic accuracy was considered sparse since the studies were vulnerable to bias.19 22