Original Research

Performance of national COVID-19 ‘symptom checkers’: a comparative case simulation study

Abstract

Objectives Identifying those individuals requiring medical care is a basic tenet of the pandemic response. Here, we examine the COVID-19 community triage pathways employed by four nations, specifically comparing the safety and efficacy of national online ‘symptom checkers’ used within the triage pathway.

Methods A simulation study was conducted on current, nationwide, patient-led symptom checkers from four countries (Singapore, Japan, USA and UK). 52 cases were simulated to approximate typical COVID-19 presentations (mild, moderate, severe and critical) and COVID-19 mimickers (eg, sepsis and bacterial pneumonia). The same simulations were applied to each of the four country’s symptom checkers, and the recommendations to refer on for medical care or to stay home were recorded and compared.

Results The symptom checkers from Singapore and Japan advised onward healthcare contact for the majority of simulations (88% and 77%, respectively). The USA and UK symptom checkers triaged 38% and 44% of cases to healthcare contact, respectively. Both the US and UK symptom checkers consistently failed to identify severe COVID-19, bacterial pneumonia and sepsis, triaging such cases to stay home.

Conclusion Our results suggest that whilst ‘symptom checkers’ may be of use to the healthcare COVID-19 response, there is the potential for such patient-led assessment tools to worsen outcomes by delaying appropriate clinical assessment. The key features of the well-performing symptom checkers are discussed.

Summary box

What is already known?

  • The availability and use of symptom checkers are increasing.

  • Symptom checkers are currently in use at a national level to help in the healthcare response to COVID-19.

  • There is limited evidence to support the effectiveness or safety of symptom checkers as triage tools during a pandemic response.

What does this paper add?

  • This study compares performance of symptom checkers across different countries, revealing marked variation between national symptom checkers.

  • The symptom checkers employed by Japan and Singapore are twice as likely to triage cases onward for clinical assessment than those of the USA or UK.

  • The US and UK symptom checkers frequently triaged simulated cases of sepsis, bacterial pneumonia and severe COVID-19 to stay home with no further healthcare contact.

  • We discuss the key aspects of the well-performing triage systems.

Introduction

COVID-19 is a new infection in humans. The symptom profile, disease progression and complication rates are still relatively unknown.1 From the available evidence, four broad categories of illness have been postulated. ‘Mild COVID-19’ makes up over 80% of cases and is typically a self-limiting infection similar to the common cold, resolving without intervention. ‘moderate COVID-19’ typically has features of viral pneumonia in the absence of hypoxia, progressing to ‘severe COVID-19’ typically when patients require oxygen therapy. ‘Critical COVID-19’, where ventilatory support is typically required, occurs in less than 5% of cases.2 The rate of disease progression is not fixed: early intervention and various management strategies can reduce the rate of progression to critical illness and death.2–6

While the infection fatality rate is yet to be determined, COVID-19 is associated with a substantive mortality. Over a period of 5 months, COVID-19 has led to more than 300 000 deaths, with more than half these deaths occurring within the last month.7

The risk of mortality is affected by a number of risk factors. Coexisting health problems such as diabetes, heart disease and cancer have been implicated as conferring a higher risk of mortality in COVID-19.8 Age appears to be the most striking and consistent risk factor for COVID-19 related mortality.9 Based on current data, the mortality rate in patients under 50 years of age is thought to be less than 1.1%, rising to around 14% in those over 80 years of age.10

Variation in mortality also seems to exist between countries.11 Initially, this variation was thought to be predominantly related to the method of recording deaths and the total number of tests conducted (ie, the detection of milder cases).12 As the pandemic spreads across the globe, it is becoming increasingly clear that how a country responds to the pandemic impacts the number of deaths their locality will experience.6 11

The national response to the COVID-19 pandemic has many important tenets. On the public health side, infection control initiatives attempt, in part, to mitigate the surge of infections that can accompany new pathogens where there is little circulating immunity. This reduces mortality by preventing the healthcare services from being overwhelmed, thus permitting improved access to medical management for those who need it.6 The clinical response to COVID-19 also centres on access to treatment. To successfully reduce the mortality rate, those patients who are developing more severe disease must be identified.3

Identifying those patients with COVID-19 that require treatment is challenging. First, COVID-19 has a broad range of presentations that can mimic common conditions that rarely require clinical assessment (eg, the common cold).1 Second, there are no clinical signs or symptoms that reliably predict who will progress to severe disease.3 As such, the clinical community is left with a large number of potential cases without any clear symptom indicators for: (1) who has the disease and (2) who is developing more severe disease. The problem is compounded further as more serious, life-threatening conditions (eg, bacterial pneumonia and sepsis) can mimic any stage of COVID-19 disease.13 14

National ‘Symptom Checkers’ have been implemented in many countries in the hope of reducing this burden faced by healthcare services. Symptom checkers are self-assessment tools. The individual—typically online or via computer application—enters their symptoms into a predetermined platform and from there a predetermined algorithm produces an outcome (usually advice). This is a form of self-led triage. It is hoped that such self-directed assessments will enable the identification of potential cases15 and will correctly triage those individuals who would benefit from clinical assessment and/or management into further care.16 For such a hope to be realised, symptom checkers must be able to determine mild conditions from severe conditions.

While self-triage has been used for some years in non-emergency conditions to varying degrees of success,17 self-triage has never before been used in a pandemic setting and as yet the efficacy and safety has not be formally studied. Caution must be exercised as, to date, studies examining symptom checkers have had mixed and disappointing results in general—demonstrating poor diagnostic performance (34%–58%) and questionable triage performance (55%–80%).18 The stakes are high, in that a failure to triage serious medical conditions (such as severe COVID-19, bacterial pneumonia or sepsis) in for further assessment will inevitably lead to delayed treatment and higher mortality.19–22

Here, we test the performance of four nationwide symptom checkers from four nations to ascertain how safe and efficient each symptom checker is in differentiating mild from severe COVID-19 cases, and how well they detect time-sensitive COVID-19 mimickers such as bacterial pneumonia and sepsis.

Methodology

Five countries were initially selected for analysis. Three (Singapore, Japan and Norway) were selected as they maintained low case fatality rates (CFRs) despite a demonstrable surge of cases in the preceding 2 months. Two countries (the UK and the USA) were selected due to concern regarding high CFRs.

Public health guidelines from each country were reviewed. Access was obtained to any available government sponsored online patient-led triage system (Singapore: ‘Singapore COVID-19 Symptom Checker’,23 Japan: ‘Stop COVID-19 Symptom Checker’,24 USA: ‘CDC Coronavirus Symptom Checker’25 and the UK: ‘111 COVID-19 Symptom Checker’26). Whereas the NHS ‘111’ COVID-19 Symptom Checker was and continues to be heavily used (with over 500 000 assessments completed on average each month27), there was no available data as to the usage of the other symptom checkers.

For the purpose of this analysis, data were extracted only from those countries with symptom checkers (Singapore, Japan, UK and USA), in an effort to compare the performance of symptom checkers specifically.

Case scenarios

Fifty-two standardised cases were designed simulating common COVID-19 related presentations with varying severity or risk factors.

Case scenarios included four distinct presentations: (1) cough and fever; (2) comorbidity, cough and fever; (3) immunosuppression, cough and fever and (4) shortness of breath and fever. These distinct presentations were then varied in relation to one or more of the following: (1) duration of symptoms; (2) age of patient and (3) severity of symptoms. The symptoms chosen for analysis are considered common in COVID-19: history of fever (50%–90%), dry cough (60%–86%) and shortness of breath (53%–80%).3 28

‘Fever’ was chosen as a core symptom of COVID-19 due to its high discriminatory value for infection. Even though it may only be present in less than half of COVID-19 cases at presentation,28 the presence of fever permits greater focus on infective causes in relation to shortness of breath and cough. Fever also presents commonly in sepsis and pneumonia,29 which are two of the key diagnoses that triage systems need to detect to prevent excess mortality. Fever has also been shown to relate to disease severity and mortality outcomes in COVID-19.30

‘Cough’ is a non-specific symptom covering a wide range of conditions. Combined with fever, cough raises the possibility of chest infection, including COVID-19 and bacterial pneumonia (one of the critical differential diagnoses in COVID-19). Detecting possible bacterial pneumonia is a prerequisite to a functioning triage system given the time critical need for antibiotic initiation to prevent unnecessary deaths.30

‘Shortness of breath’ is generally accepted as a marker of COVID-19 disease progression,31 although there are other reasons for shortness of breath, and specifically in COVID-19, patients may not experience shortness of breath despite being hypoxic—so called silent hypoxia.32

‘Duration’ was chosen as a severity marker as the prolongation of fever, cough and/or shortness of breath within the context of COVID-19 or a COVID-19 mimicker (pneumonia, sepsis and so on) carries a worse prognosis. In particular, an unremitting, persistent fever warrants further assessment in regard to COVID-1930 but also in relation to sepsis.29

‘Age’ is a well-defined risk factor for severe complications of COVID-19.9 10 As such, it was deemed useful to include age as a variable in the case simulations to test whether the symptom checker accounted for age when determining risk.

‘Severity’ of symptoms relates to duration of fever, cough and shortness of breath. Shortness of breath had its own severity scale and was crucial for staging level of complicated COVID-19, severity of pneumonia and sepsis.29 30 Mild shortness of breath was defined as shortness of breath during activities that did not stop one completing the activity. Moderate shortness of breath was defined differently depending on age. That is, respiratory reserve was considered to be less in adults aged >70 years of age in comparison with the younger age groups, and as such, we defined moderate shortness of breath in those >70 years of age as preventing the completion of most tasks, while for younger cases, moderate shortness of breath would still permit most tasks to be completed. Severe shortness of breath was defined as shortness of breath at rest.

The immunosuppression case simulations related to the development of cough and fever 4 days after chemotherapy, simulating potential neutropaenic sepsis. Neutropenic sepsis is a medical emergency requiring immediate medical attention, and early antibiotic therapy - door to needle time for sepsis should be less than 1 hour, and for neutropaenic sepsis less than 30 min.33 34

Except for the paediatric case, hypertension was chosen as the comorbidity due to its discriminatory value between mild and severe comorbidities. There is evidence that hypertension may be an independent risk factor for poorer outcomes in COVID-19; however, it remains, as do many of the proposed ‘high-risk’ comorbidities, unproven.8 Differentiating symptom checkers that account for milder comorbidities or make allowances for the uncertainty that remains in the evidence base for at-risk groups was deemed useful in regard to symptom checkers’ safety performance.

Where equivocal answers existed, such as for breathless: ‘yes’, ‘I’m not sure’ or ‘no’, the equivocal answer (‘I’m not sure’) was interpreted as mild symptoms. Unless stated in the specific case scenario, any question pertaining to comorbidity was answered as ‘no’. All other variations were as described for each case scenario (online supplemental data).

The combination of symptoms, duration and other severity markers were varied to simulate many of the common presentations of COVID-19 and COVID-19 mimickers. Upper respiratory tract infection (URTI) and mild COVID-19 were represented in scenario 1; moderate COVID-19, bacterial pneumonia and sepsis were represented in scenarios 1 and 4; severe COVID-19, septic shock and critical COVID-19 are represented in scenario 4; and neutropaenic sepsis in scenario 3 (see online supplemental tables 1–4)

Statistical analysis

The primary outcome was total number of cases referred onward for further clinical assessment, which was converted into a percentage ratio and then compared between countries.

Results

The key baseline population and testing data are presented in table 1. Notably, the highest rate of testing for COVID-19 was by Singapore with the lowest being Japan. The UK had the highest reported physicians per capita, while Japan and Singapore had the lowest. Cases per thousand inhabitants varied greatly, with Singapore and the UK maintaining similar rates. From the available statistics, Singapore had the lowest CFR (<0.1%) and the UK had the highest CFR (13.6%) currently. All population and testing data were extracted from The WHO as of 26 April 2020.

Table 1
|
Key population and COVID-19 testing data from each of the four countries

Fifty-two case scenarios were applied to each country’s patient-led triage systems. The results for each scenario are presented in tabulated format (online supplemental data). Singapore had the highest overall referral rate at 88%, and the USA had the lowest at 38% (table 2).

Table 2
|
Total number (percentage) of case simulations referred on by country

From the cases not referred, the USA and UK triaged a significant number of cases to ‘stay home’ that would typically have required early clinical assessment. The US triage system (CDC Coronavirus Symptom Checker) frequently triaged home case simulations with possible severe COVID-19, bacterial pneumonia and sepsis and triaged possible neutropaenic sepsis to healthcare contact within 24 hours. The UK’s 111 COVID-19 Symptom Checker frequently triaged possible severe COVID-19 and bacterial pneumonia to stay at home with no follow-up and is likely to have delayed treatment for sepsis, severe COVID-19 and neutropaenic sepsis. It is of note that while Japan’s symptom checker generally performed well, our simulation revealed a potential delay to treatment for neutropaenic sepsis. Indeed, all four symptom checkers failed to triage the simulation for neutropaenic sepsis into the ‘emergency department’ (table 3).

Table 3
|
Tabulated view of likely triage outcome of specific diagnosis in each country

High CFR versus low CFR countries

The main differences in triage criteria extrapolated from the national symptom checkers relating to COVID-19 between the low CFR countries and the high CFR countries are presented at table 4.

Table 4
|
Differences in triage criteria between low and high case fatality countries

Discussion

This case simulation study examined the symptom trackers from four countries. Following application of 52 standardised case simulations to each country’s symptom checker, the percentage of onward referrals were calculated. The low case fatality nations’ (Singapore and Japan) symptom checkers triaged in twice as many cases for direct clinical assessment than the higher case fatality nations (the USA and UK). Of clinical concern was the failure of both the US and UK symptom checkers to triage cases simulating bacterial pneumonia, sepsis and severe COVID-19 on to any healthcare contact.

The upside of symptom checkers, particularly during a pandemic is difficult to ignore. By reducing physical patient contacts, symptom checkers can potentially save valuable resources and avoid further viral transmission. While telephone and telemedicine triaging also protects staff and reduces transmission, such services require more healthcare staff than symptom checkers and hence carries a greater financial and human resource burden.

Evidence to date also suggests the majority of cases of COVID-19 resolve after a short, self-limiting viral illness.1 There are, though, no discriminatory signs or symptoms.2 COVID-19 can present like the common cold or influenza or indeed bacterial pneumonia. COVID-19 can also progress quickly6 35 and can even present with asymptomatic hypoxia.32 Sifting through the mild colds and self-limiting flus and trying to determine who will have a mild course of COVID-19 and also trying not to miss bacterial pneumonia, sepsis and signs of COVID-19 pneumonia is a challenge for even trained clinicians let alone an automated system.

It is here where Singapore’s symptom checker performs well. The checker is presented on a single webpage, more akin to an online risk calculator. There are six inputs required from the patient and one of three outputs generated. The algorithm powering the symptom tracker is not complicated. Age over 65 years, or the presence of any health condition, or duration of symptoms over 4 days triggers the advice to seek medical assessment. Any degree of shortness of breath is triaged directly to the emergency department. The Singapore COVID-19 Symptom Checker, if used by the public, is likely to reduce healthcare contacts by the young, fit patients who are early on in the illness, thus off-loading the healthcare burden to some degree while maintaining a relatively low risk to the public.

The UK ‘111’ symptom checker performs poorly in this regard. The algorithm is complex, attempting to quantify symptoms such as shortness of breath and the overall severity of illness by asking subjective, qualitative questions with multiple choices. The ‘111’ symptom checker seems to take on a much broader clinical role and attempts to triage out cases that would typically be triaged in or out of care based on an actual clinical assessment. For example, a 72 year old person who presents with a 7-day history of fever and cough is triaged by the ‘111’ symptom checker to stay at home with no clinical, nursing or healthcare contact. Faced with such a clinical scenario, clinicians would typically insist on at least basic clinical observations (pulse, temperature, oxygen levels and so on) before considering triaging such a patient to stay at home. The differential in this case includes sepsis, bacterial pneumonia and COVID-19 pneumonia, and while it remains possible that fever can persist for 7 days in mild/moderate COVID-19, complications or alternative diagnoses are much more likely.

The qualifying questions used by the ‘111’ symptom checker to discriminate between severity will have insufficient discriminatory value in such cases. Furthermore, the wording of the question encourages the self-reporting towards lower categories of illness:

Are you so ill that you have stopped doing all of your usual daily activities?

  1. ‘Yes - Ive stopped doing everything I usually do’.

  2. ‘I feel ill but can do some of my usual activities’.

  3. ‘No - I feel well enough to do most of my usual activities’.

(Extracted question from ‘111’ Coronavirus Symptom Checker).

It is the use of absolute and equivocal qualifiers that prevent the severity-qualifying question from achieving any useable clinical triage information: the use of ‘all’ in the question, ‘everything’ in the affirmative answer, and even the negative answer stipulates ‘most’. Our case simulation demonstrated that answering B, the moderately severe answer, still triages patients to self-isolate with no healthcare contact. As such, patients with cough and fever for 7 days would have to be so severely unwell that they are unable to do anything they usually do to be triaged to any clinical contact.

Our case simulation study indicates that both the ‘111 COVID-19 Symptom Checker’ and the ‘CDC Coronavirus Symptom Checker’, if used as the sole initial point of healthcare contact, are likely to delay presentations of serious medical conditions to appropriate care, and as such, are likely to confer an increased risk of morbidity and mortality. Both symptom checkers maintain a high threshold for referring onward to clinical contact, triaging the majority of patients to stay home with no clinical contact. Again, beyond the mortality impact, there is no evidence that such an approach actually reduces healthcare burden. Indeed, beyond the established evidence in pneumonia generally,19–22 there is direct evidence that early correction of hypoxia in COVID-19 prevents progression to mechanical ventilation,5 consistent with basic medical principles. Programming symptom checkers to aggressively triage patients to stay home may well lead to patients presenting to healthcare later, requiring more intensive healthcare to recover, and as such, symptom checkers ‘set’ to keep patients at home may actually increase the burden on intensive care facilities and perpetuate a healthcare crisis.

Symptom checkers are currently being used in the pandemic for two purposes: (1) identifying potential cases for testing/surveillance and (2) identifying ‘unwell’ patients who require medical attention. Both functions are likely to be enhanced by the use of symptom checkers when the intention is to ‘catch’ more patients or reach more cases. That is, when symptom checkers are used to identify more cases than would otherwise be detected and to direct more patients to medical care than would otherwise make healthcare contact, then symptom checkers are merely providing an additional ‘safety-net’, and therefore, in such a healthcare support role, the risk of harm from their use is expected to be relatively minimal. Conversely, if symptom checkers are being used to replace the assessment of patients by trained personnel and are programmed to try and prevent further healthcare contact, then, as our case simulation study highlights, there are real concerns about the potential risk of harm from such an unproven approach.

Considering that the efficacy of symptom checkers have not been established,17 caution would be advisable. Delay in the correction of hypoxia, failure to commence thromboprophylaxis and missing the opportunity for earlier initiation of steroids in the hypoxic patient with COVID-19, are all likely to carry a considerable morbidity and mortality cost.

If we are to accept the lesser option of an automated, self-directed triage system over the standard of care offered by the dynamic, experienced clinical assessment, then we must be mindful of what we are asking of the ‘symptom checker’. Based on our independent case simulation study, symptom checkers do not appear advanced enough to fulfil the ‘stay home’ intent with any sufficient level of safety. They may though be sufficient enough to assist in the improved identification of at risk patients requiring further clinical assessment, and some form of symptom checker may even be able to contribute to the increased ongoing vigilance required for all patients diagnosed with COVID-19. Evidence, though, should be provided before replacing actual clinical contact with an online self-directed triage system.

Strengths and limitations

This case simulation study was conducted using 52 standardised simulated cases. The cases were designed to test specific COVID-19 related scenarios and as such were symptom-based without the need for subjective interpretation. Nonetheless, there remains a risk of bias, particularly when facing subjective questions. The majority of simulations were though more quantitative, for example, duration, age and symptoms, and unlikely to be affected meaningfully by any bias.

The UK data are pooled from all four nations (England, Wales, Scotland and Northern Ireland). England (making up 90% of the total UK population) uses the same ‘111’ COVID-19 patient-led triage system analysed here, whereas Wales, Scotland and Northern Ireland have implemented their own individual patient-led triage systems. It was beyond the scope of this initial investigation to examine each triage system separately. A similar situation applies to the USA, where some individual states have implemented their own triage systems.

Conclusion

In this case simulation study, the UK and USA patient-led triage systems (COVID-19 Symptom Checkers) maintained a high disease-severity threshold for onward referral to healthcare assessment. Particular concerns were advising no clinical contact for elderly patients with COVID-19 related symptoms or patients who had developed shortness of breath or any patient with persistent fever. The low CFRcountries (Singapore and Japan) used symptom checkers to reduce clinical demand while maintaining a lower health risk to patients. Our study indicates that while symptom checkers may be of use in the healthcare response to COVID-19, the ‘CDC Coronavirus Symptom Checker’ and the ‘111 COVID-19 Symptom Checker’, if used as the sole point of initial healthcare contact, are likely to confer a tangible risk of delaying the presentation of time-critical acute illnesses. Our results support the recommendation that symptom checkers should be subjected to the same level of evidenced-based quality assurance as other diagnostic tests prior to implementation.