Clinician and computer: a study on patient perceptions of artificial intelligence in skeletal radiography
•,,.
...
Abstract
Background Up to half of all musculoskeletal injuries are investigated with plain radiographs. However, high rates of image interpretation error mean that novel solutions such as artificial intelligence (AI) are being explored.
Objectives To determine patient confidence in clinician-led radiograph interpretation, the perception of AI-assisted interpretation and management, and to identify factors which might influence these views.
Methods A novel questionnaire was distributed to patients attending fracture clinic in a large inner-city teaching hospital. Categorical and Likert scale questions were used to assess participant demographics, daily electronics use, pain score and perceptions towards AI used to assist in interpretation of their radiographs, and guide management.
Results 216 questionnaires were included (M=126, F=90). Significantly higher confidence in clinician rather than AI-assisted interpretation was observed (clinician=9.20, SD=1.27 vs AI=7.06, SD=2.13), 95.4% reported favouring clinician over AI-performed interpretation in the event of disagreement.
Small positive correlations were observed between younger age/educational achievement and confidence in AI-assistance. Students demonstrated similarly increased confidence (8.43, SD 1.80), and were over-represented in the minority who indicated a preference for AI-assessment over their clinicians (50%).
Conclusions Participant’s held the clinician’s assessment in the highest regard and expressed a clear preference for it over the hypothetical AI assessment. However, robust confidence scores for the role of AI-assistance in interpreting skeletal imaging suggest patients view the technology favourably.
Findings indicate that younger, more educated patients are potentially more comfortable with a role for AI-assistance however further research is needed to overcome the small number of responses on which these observations are based.
Introduction
Presentations relating to the musculoskeletal system account for more than 60% of emergency department primary diagnoses in the UK,1 and as many as 50% are investigated by means of a plain radiograph (X-ray).2
In the UK, this ubiquitous imaging is typically interpreted by junior doctors and nurse practitioners. Despite the introduction of safety netting measures (eg, virtual review clinics and remote reporting3 4), concerns remain regarding diagnostic inaccuracy and its sequelae.5 Fractures represent the majority of missed diagnoses,6 exemplified by the mid-foot where they are not recognised in 33%–40% of cases.7
In this context it is unsurprising that there is significant interest in the development of artificial intelligence (AI) algorithms capable of delivering point of care radiographic interpretation.
Early clinical studies have demonstrated significant successes for AI driven interpretation of mammograms8 and chest radiographs.9 Algorithms for skeletal radiology are less mature, but effective viability and concept studies have now been reported10 11 amid growing interest from healthcare providers and industry.12 13
Despite this, the integration of AI into healthcare systems has not been without controversy and research has raised concerns over how such technology may lead to a deterioration of human clinical skills as we are called on to use them less.14 The interface between AI-outputs and the human response to them is also an area of concern; so called ‘automation bias’, the propensity to over-rely on autonomous processes, has led to multiple deaths in aviation15 and automotive engineering.14
With an increasing likelihood that patients may soon have their imaging, at least in part, reviewed by AI it is crucial to understand their attitudes towards this technology so that software can be developed in a patient-centric manner. Much of the existing understanding of the public’s attitude to this technology is founded in consumer research,16 17 and while this has some relevance to healthcare, it does not reflect the importance individuals place on their health, nor does it address the unique relationship of trust between patient and clinician.
The work conducted into patient’s views of AI indicates significant variation by its application. One study showed that wearable biometric monitoring devices (that integrate AI technology) were seen as greatly beneficial by 20% of patients; however, 35% would refuse to use them.18 In screening for diabetic retinopathy, much higher favourability was observed with 96% being satisfied with an AI led assessment, and 78% preferring it to the manual alternative.19
This study set-out to explore patient attitudes to the potential use of AI in assisting clinicians with the interpretation, and subsequent management, of injuries identified through skeletal radiography. It also sought to investigate whether patient’s would prioritise their clinician’s opinion over that of an AI in the event of disagreement. Demographic factors, pain and technology-use were explored as factors potentially influencing these attitudes.
Methods
Questionnaire
The study was prospectively approved by Imperial College Healthcare National Health Service (NHS) Trust. A questionnaire was developed (figure 1), based on a scoping review of the literature20–22 and according to guidance issued by NHS England with regards to question order, structure and response formats.23 Plain English was used throughout as per the National Council for Voluntary Organisations’ guidelines.24 Data were handled in accordance with General Data Protection Regulations.
Participant questionnaire (page 1 of 2). NHS, National Health Service.
The questionnaire included a total of 14 questions, with patients asked to select the most appropriate answer from a list (figure 2). Given the complexity of AI and the variability of its understanding, the questionnaire began with a definition of the concept, including an example designed to be relevant to the widest possible audience (figure 1). Demographic data, employment status, educational level and smart phone/laptop computer use were recorded. Participants were also asked to rate their pain, using a 10-point Likert scale, both at its worst and at the time of questionnaire completion. The final four questions assessed participant’s confidence in a clinician’s capacity to interpret their radiographs, their feelings towards the use of AI to assist with diagnosis or management, and who they would trust in the event of a disagreement between clinician and algorithm. Pilot testing of the questionnaire was performed with subsequent review and finalisation by the authors.
The questionnaire was distributed in the fracture clinics of a large London university hospital (~1.3 million annual patient contacts25) from 01 July 2020 to 01 August 2020. On arrival, reception staff issued a single paper copy of the questionnaire to every patient over the age of 16, with instructions to place completed forms in a secure ballot box prior to attending their scheduled appointment. Completed questionnaires were collected at the end of the study period and the responses collated. IBM SPSS Version 26 software was used for statistical analysis. T-tests were used to detect differences between groups, and correlation assessed using Pearson correlation coefficient. A p value of ≤0.05 was considered significant.
Results
A total of 300 questionnaires were produced, 218 were completed and returned (72.2%). Two incomplete forms were excluded, leaving 216 questionnaires with a legible, interpretable answer to each question.
Confidence in clinician and AI-assisted interpretation
The mean confidence of participants in their clinician’s ability to correctly interpret their radiographs was 9.20, where 10=extremely confident, 1=not confident at all. Mean confidence in AI-assisted interpretation was 7.06, significantly lower (t(215)=−14.34, p<0.001). Across all participants, a positive correlation was observed between confidence in the clinician’s interpretation and confidence in AI-assisted interpretation (r(215)=0.21, p=0.002). When asked to identify the opinion they would favour in the event of disagreement between the clinician and AI, 95.4% (206) of participants selected the clinician, and only 4.6% (10) the AI.
AI-assisted interpretation and AI-assisted management
Participants rated their confidence in AI-assisted interpretation (7.06) significantly higher than their confidence in AI-assisted management (4.86), t(215)=11.03, p<0.001. A strong positive correlation was observed between confidence in AI-assisted interpretation and confidence in AI-assisted management (r(215)=0.693, p<0.001).
Age and gender
The mean age for participants was 40.20 years. More men (126) than women (90) completed the questionnaire (M:F=1.4:1), and on average female participants were 10.3 years older.
Small negative correlations were observed between age and both confidence in AI-assisted radiographic interpretation (r(215)=−0.170, p=0.0123) and confidence in AI-assisted patient management (r(215)=−0.244, p<0.001).
Of the 10 participants who indicated a preference for the assessment of an AI over that of their clinician, mean age was found to be significantly younger than the wider cohort at 24.5 (t(215)=−3.05, p=0.00125) . Of these 10 participants, 70% were men and 30% were women (table 1).
Table 1
|
Gender
Female participants showed significantly higher confidence in their clinician’s assessment than their male counterparts, t(215)=3.42, p<0.001. Conversely female participants displayed a lower confidence in the idea of AI-assisted management, t(215)=−4.51, p<0.001. No significant gender difference was observed with regard to AI-assisted interpretation.
Employment and educational achievement
A similar distribution of employment status was observed across both genders. The majority of the participants were in some form of work or study (69.4%), with the largest single demographic being those in full time employment (41.2%).
Those participants who were unable to work indicated a significantly lower confidence in their clinician’s assessment (t(215)=−3.22, p<0.001), AI-assisted interpretation (t(215)=−2.49, p=0.0067) and AI-assisted management (t(215)=−1.98, p=0.024).
Students reported significantly higher confidence in AI-assisted interpretation (t(215)=3.36, p<0.005) and in AI-assisted management (t(215)=3.68, p<0.005). Despite accounting for only 10.6% of the total, 50.0% of participants identifying a preference for an AI assessment over that of their clinician were students (table 2).
Table 2
|
Employment
A-levels/I-Bacc (30.5%) and bachelor’s degrees (30.1%) were the most widely held level of highest educational achievement (table 3). A small positive correlation was observed between increasing educational achievement and confidence in AI-assisted interpretation of radiographs (r(215)=0.137, p=0.045).
Table 3
|
Educational achievement
Laptop computer and smartphone use
One hundred and eighty-seven (86.6%) participants owned or used a smart phone, with 110 (58.8%) using it for greater than 2 hours each day (table 4). A small positive correlation was observed between duration of smartphone use and confidence in AI-assisted interpretation (r(215)=0.156, p=0.0201).
Table 4
|
Smartphone use
One hundred and sixty-five participants (76.4%) had access to a laptop, with 84 (38.9%) using it for greater than 2 hours each day (table 5). No significant correlation was found between duration of laptop computer use and confidence in AI assistance for either interpretation, or management.
Table 5
|
Laptop computer use
Pain
The mean maximum pain experienced post-injury was 6.81 (SD=1.70) where 10=worst pain imaginable, 1=no pain at all. Pain was found to be significantly lower at the time of questionnaire completion (3.88, SD=2.01) (mean difference=2.93, p<0.001). No correlation was observed between pain and confidence in the clinician or AI.
Discussion
Key findings
The most striking finding of this study is the widely held esteem for the reporting skills of clinicians. With an average 10-point Likert rating of 9.2, participants demonstrated extreme confidence in their capacity to correctly interpret radiographs. Although objective evidence of reporting errors suggests that this confidence may be over-valued,10 26 these observations are consistent with the enduring strength of the doctor–patient relationship, a rapport which continually places doctors among the most trusted of professions both in the UK27 and internationally.28 29 Consequently, the significantly lower rating for AI interpretation may reflect the robustness of this relationship rather than any inherent distrust in AI. Indeed, a mean rating of 7.06 suggests substantial confidence does exist in the technology, even at this early stage.
More broadly though, evidence indicates that people possess a strong preference for human advice over that received from an automated process. This inclination is only overcome when the perceived expertise (pedigree) of the automated adviser is made significantly greater than that of its human counterpart.30 The evidence of this study strongly supports such a conclusion, with all but 10 participants indicating a preference for the assessment of their clinician over that of an AI.
The great majority of participants owned or regularly used smart phones (86.6%) and laptop computers (74.5%), reflecting a broad societal awareness of novel technologies and potentially their application to new domains.31 Polling in the UK found 63% of people reported that ‘they knew something about AI’32 and that, after improvements in home energy efficiency, healthcare was the area to which they felt it would be of most benefit.33 With younger—more technologically minded—patients becoming a greater proportion of the patient population as they age, it would be reasonable to predict that confidence in the diagnostic use of AI will only continue to grow with time. Such a trend would be consistent with the results of literature in consumer and employment research, which indicates a growing familiarity with AI.33 34
While this study did detect a significant correlation between younger participants and greater confidence in AI-assisted interpretation and management, the magnitude of this effect was small. As a demographic, students showed higher confidence in these same measures and were statistically over-represented in the cohort who favoured the interpretation of an AI over that of their clinician. However, these findings should be interpreted with caution due to the limited number of responses on which they are based.
Limitations
The design of this study was dependent on patients presenting to an in-person appointment at the fracture clinic of a central London teaching hospital during a time when attendance was purposely and observably reduced due to COVID-19.35 Given these measures, and strict guidelines for ‘shielding’ of vulnerable patients,36 it is likely that participants numbers were reduced and skewed towards individuals with limited comorbidities and generally better health. However, the wide range of participants included in this study (age in particular) suggest this effect is unlikely to have significantly impacted the results.
While a questionnaire was central to this study, there is a scarcity of validated methods for questionnaire reporting.37 To mitigate the potential for error, validated question designs (such as the Likert scale38) were used to minimise the complexity of this interpretation. The extremely positive perception of clinicians held by participants (9.20 out of 10 on average) may be indicative of response bias whereby participants answered in a way which they perceived as desirable for the clinicians. However, various efforts were made to mitigate this bias: questions were written in readily comprehensible language, implicit communication in the questioning was avoided26–29 and completed questionnaires were anonymous and returned to a locked ballot box.
Conclusions
Ultimately, this study indicates a significant preference among its participants for the assessment of their clinician over that of an AI. This was set against a backdrop of high confidence in the capacity of clinicians to correctly interpret the skeletal radiographs of their patients. Despite this, the prospect of AI-assisted interpretation was widely supported, with significantly lower but still robust confidence scores.
Demographic factors were identified which may suggest greater patient support for the use of AI in skeletal radiography; in particular among those of younger age, higher educational achievement and students. These findings should be interpreted cautiously due to the limited number of responses from which they are drawn.
This study indicates a basal level of patient approval for the use of AI as an assistant to their clinician. When coupled with a population growing more accustomed to the technology’s role in wider society, the continued development of AI algorithms to target skeletal radiography is clearly justified. Further research into the demographics of attitudes to AI is needed to expand on these findings; in particular to clarify the role of patient knowledgeability in their willingness for such technology to be involved in their care.
Although not yet ready to prioritise its assessment above that of their clinician’s, patients do appear content for clinicians to look to AI for support in the diagnosis and management of skeletal injuries.