Discussion
In this study, we assessed the performance of ChatGPT, an AI-based language model, in providing treatment recommendations for glioma patients. To the best of our knowledge, this is the first study aiming to evaluate this novel chatbot within the framework of CNS tumour multidisciplinary decision-making. While ChatGPT demonstrated proficiency in accurately identifying cases as gliomas, it displayed limited precision in identifying specific tumour subtypes. Furthermore, the tool’s recommendations regarding treatment strategy and regimen were rated as good, while the ability to incorporate functional status in its decision-making process as moderate.
Rationale for CNS TB
Oncological patients discussed in the multidisciplinary CNS TB are more likely to benefit from a preoperative and postoperative staging and are more likely to receive the optimal adjuvant treatment.25 26 Barbaro et al presented the foundations of neuro-oncology and the need for multidisciplinary expertise in order to embrace the multiple disease aspects in CNS tumour-affected patients.14 The authors highlighted the prerogatives and missions of a CNS TB: (1) neuro-oncology, neurosurgery, radiation oncology, neuropathology, neurology and radiology are specialties necessary to compose the CNS TB; (2) the expert consortium’s main goal is to propose a collaborative treatment plan; (3) the development of novel clinical trials. Furthermore, a single-centre prospective evaluation of a CNS TB showed that the experts’ consortium influences the clinical management of patients suffering from a brain tumour through high-impact decisions.27 However, the organisation of CNS TB is limited by economic costs, time expenditure, resource availability and the limited presence of TB across the geographic and socioeconomic strata.26 New AI-based tools with underlying deep learning, such as ChatGPT, might represent a valuable complement or at least offer some help to centres lacking expertise or resources.
ChatGPT ready to assume the role of the doctor?
Two questions were asked ChatGPT that corresponded to the main aim of a CNS TB discussion: ‘what is the best adjuvant treatment?’, and ‘what would be the regimen of radiotherapy and chemotherapy for this patient?’. ChatGPT scored well on both parameters, but its responses were less accurate on other parameters such as incorporating the functional status of the patient, and glioma subtype diagnostic accuracy. Regarding the latter, the output provided by the chatbot was often incorrect (ie, pleiomorphic astrocytoma instead of glioblastoma in one case), or not detailed enough (ie, no distinction between grade II or III astrocytoma). On the other hand, the adjuvant treatment suggestion and its regimen were rated as good. In future studies, it may be worth exploring alternative questioning methods that align better with how chatbots process information. This approach could potentially lead to more accurate results.
In this cohort, 80% of the included patients were diagnosed with glioblastoma (WHO grade IV). In the literature, the treatment of glioblastoma WHO IV has been extensively studied.15–17 19 23 28 AI models used by ChatGPT are trained on a large dataset of information found online including websites, journals and digitalised books. It is thus comprehensible that ChatGPT’s output regarding the adjuvant treatment and its regimen related to glioblastoma is of better quality because the underlying knowledge base is well-documented. To this extent, ChatGPT’s performance is mediocre regarding recommendations that are based on less extensive knowledge base. The consideration of patient functional status was rated as moderate, even though the clinical preoperative and postoperative state of the included cases was presented to ChatGPT. This consideration is much less documented in the literature as only a few clinical trials studied adjuvant therapy for glioblastoma in patients with impaired functional status or in older adults.17
Strengths and limitations
Our results provide valuable information on the potential of human-AI interfaces in medical decision-making. To test the chatbot’s performance, we have used glioma cases which represent a homogenous sample of tumour cases which allowed us to test the performance in this setting but limited the generalisability of our findings to other tumour types. Of note, ChatGPT’s recommendations were conscientiously mitigated with disclosure statements that it was not designed to provide medical advice, which presents another limitation in a medical setting. Notwithstanding, it might be seen as an opportunity if similar algorithms would be designed specifically for this purpose. Given this, at the moment we cannot appreciate the full potential of ChatGPT in CNS TB. Notwithstanding this limitation, one could imagine that AI chatbots, with pursued development in the medical field, could hold great promise to complement the classic CNS TB workflow. Another limitation lies in the fact that the chatbot’s knowledge relies on content from the internet limited to 2021. Although information on more novel research developments in neuro-oncology were not accessible for the chatbot, this should not have impacted its recommendations for standard clinical care. If the chatbot had access to information on new clinical trials, it could greatly aid the therapeutic discussion and potentially lead to new development directions . Finally, ChatGPT recommendations cannot be taken at face value without specialist verification since it is not uncommon for the chatbot to provide erroneous information.13 In language models such as ChatGPT, a phenomenon known as ‘hallucinations’ frequently occurs and can span from rather benign, for example, providing plausible but non-existent scientific references, to very dangerous medical scenarios, such as recommending an ineffective or harmful treatment.2 Therefore, whether used to inform medical or other high-stake decisions, at this stage it is indispensable that the output is verified by a human professional. Finally, our study relied on textual neuroimaging information and did not involve a quantitative AI imaging analysis which could be a potential area of development.
Further developments
Six of the seven experts evaluated ChatGPT as useful if the system could learn and improve. This notion is supported by the medical community as AI is growing and holds immense promise in medicine.2 6 29–31 However, since its launch in November 2022, ChatGPT has raised scepticism in the scientific community regarding threats to the originality of scientific work.10 11 32–35 Another consideration is the risk that AI chatbots may be prone to bias or commit omissions and errors in the interpretation of medical information. Due to these shortcomings, AI-based systems in medicine should be used with a human-in-the-loop approach.
Even if our results suggest a reserved rating for ChatGPT’s performance on glioma subtype diagnosis and multi-modal information integration, AI-based chatbots may be a promising supplement in TB decision-making. Future studies could explore ways to refine ChatGPT’s functionality, such as incorporating more patient-specific data and refining its ability to provide nuanced recommendations based on the clinical context. Furthermore, future developments in the ChatGPT interface could introduce the ability to read medical imaging, such as preoperative and postoperative brain MRI, which could enormously improve its diagnostic ability and treatment recommendations.
Nonetheless, our results highlight the potential utility of ChatGPT in facilitating clinical decision-making. Chatbots could be used to quickly provide information related to a patient’s medical history, differential diagnosis, relevant diagnostic tests, experimental treatment options and potential side effects. Furthermore, we intentionally provided the chatbot with only one conversation log. Thus, it is possible that further interaction and additional discussion with the chatbot may have yielded increased performance.
However, ChatGPT’s ability to provide medical information was restricted as it did not have access to the latest clinical trial findings. This was because it lacks live internet access and access to research databases.28 Overcoming these barriers and facilitating AI access to the newest scientific information, could be a potential direction of future development as the novel clinical trials are a crucial part of a CNS TB discussion.14 AI-based chatbots could have the potential to integrate the newest trial and bench science information into multidisciplinary decision-making and help TB direct patients to potential applicable treatments.
AI language models are evolving at a tremendous speed, and by the time of the publication of this manuscript, a newer ChatGPT V.4.0 was introduced, offering a more versatile conversational tool. It is possible that future updates may include a neuro-imaging analysis tool, which would greatly enhance the complexity of AI tools available for the medical field.