Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
I read the article by Haemmerli et al on the performance of ChatGPT-3.5 in generating treatment recommendations for central nervous system (CNS) tumours, which were then evaluated by tumour board (TB) experts. While the study did illuminate promising aspects of the Artificial Intelligence (AI) model, the design of the prompt used to interact with ChatGPT warrants further consideration.
In the study, the prompt employed was a brief patient history, followed by two questions, which appears to have limited the model’s performance. As a sophisticated large language model (LLM), GPT-3.5 relies heavily on the context and specificity of the provided prompt.1 2 Based on cited literature, an alternative prompt structure could have included context, specific intent, a question and an expected response format. Moreover, pretraining the LLM with examples of the expected answer significantly improves the quality of the answer.2 3 Finally, the introduction of GPT-4 in early March 2023 has shown considerable improvement in understanding and generating responses when compared with ChatGPT-3.5.4 5
With the application of these techniques, researchers could have guided the predictive capabilities of the LLM to generate more relevant and contextually nuanced responses. This could have particularly helped in areas where the model underperformed, such as precision in glioma subtypes and considerations of patient functional status.
As an illustration, both ChatGPT-3.5 and GPT-4 were pretrained with eight examples (patients 1–8, patient history followed by TB response) from online supplemental material of the study. A more context-specific prompt was then used with the history of patients 9 and 10. Table 1 displays main output obtained using this technique, revealing enhanced precision in oncological diagnosis, treatment discussions and patient functional status from ChatGPT-3.5 compared with what was presented in the paper. GPT-4 seemed to align even more closely with the board’s opinion, which was defined as the gold standard. Full discussion with the chatbot is available in online supplemental material 1.
It is critical to acknowledge that the efficiency of LLMs applications heavily depends on the prompt used and the quality of the data given. Future research needs to employ a refined, context-driven approach in interacting with these models and the development and sharing of prompt engineering techniques should continue to be prioritised.
In conclusion, the exploration of LLM in CNS oncology research is commendable, but it is essential to optimise the methodology to fully unlock the true potential of AI tools in such a complex and challenging clinical landscape.
Patient consent for publication
The author would like to thank Dr. Marie-Claude Blatter, PhD, for raising interest in the use of LLMs in medicine, and GPT-4 for writing and proofreading assistance.
Contributors DG created the concept of the letter, reviewed the literature and wrote the manuscript.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.