Short Report

Generative artificial intelligence and non-pharmacological bias: an experimental study on cancer patient sexual health communications

Abstract

Objectives The objective of this study was to explore the feature of generative artificial intelligence (AI) in asking sexual health among cancer survivors, which are often challenging for patients to discuss.

Methods We employed the Generative Pre-trained Transformer-3.5 (GPT) as the generative AI platform and used DocsBot for citation retrieval (June 2023). A structured prompt was devised to generate 100 questions from the AI, based on epidemiological survey data regarding sexual difficulties among cancer survivors. These questions were submitted to Bot1 (standard GPT) and Bot2 (sourced from two clinical guidelines).

Results No censorship of sexual expressions or medical terms occurred. Despite the lack of reflection on guideline recommendations, ‘consultation’ was significantly more prevalent in both bots’ responses compared with pharmacological interventions, with ORs of 47.3 (p<0.001) in Bot1 and 97.2 (p<0.001) in Bot2.

Discussion Generative AI can serve to provide health information on sensitive topics such as sexual health, despite the potential for policy-restricted content. Responses were biased towards non-pharmacological interventions, which is probably due to a GPT model designed with the ’s prohibition policy on replying to medical topics. This shift warrants attention as it could potentially trigger patients’ expectations for non-pharmacological interventions.

Introduction

With the recent development of generative artificial intelligence (AI), particularly large language models which uses billions of parameters, a growing discussion exists about its usefulness and risks as a healthcare tool.1 Generative AI is expected to facilitate cross-cultural communication between patients with real-life experiences and medical professionals with rich medical knowledge. However, disadvantages such as bias in training data, a proliferation of false, harmful responses and ambiguous reasoning behind responses have been pointed out to using AI-generated information in healthcare.1

Although cancer survivors have sexual problems, they are particularly hard to communicate between patients and healthcare providers.2 Clinical guidelines provide practical ways to deal with sexual problems, and the first step is to connect the patient to a medical consultation.3 4 However, it is difficult for patients to confess their sexual problems to the doctor before them, and we hypothesised that patients would initially consult AI about this difficult-to-convey issue. For the Generative Pre-trained Transformer (GPT), the policy of its provider (Open AI) states that the model is not fine-tuned to provide medical information or adult content.5 We performed a generative experiment with a hypothetical cancer survivor to examine the characteristics of medical and sexual consultations, two areas not covered in this fine-tuning.

Methods

We conducted a dialogue generation experiment and performed an exploratory analysis. We used GPT-3.5 (Open AI) as the generative AI and DocsBot (docsbot.ai) to refer to specific documents (the latest version as of June 2023 in Japanese). The prompt ‘I am a cancer survivor. Please create a question about a problem that is hard to consult’ generated 100 questions by DocsBot that had learnt a survey on sexual problems among cancer survivors.6 The generated questions were categorised into seven topics based on the symptom categories specified in the clinical guidelines: sexual response, body image, intimacy, sexual functioning, vasomotor symptoms, genital symptoms and others. These questions were presented to Bot1 (standard GPT) and Bot2 (sourced from two clinical guidelines3 4).

The collected conversational data from Bot1 and Bot2 were tokenised into individual words, and linguistic features were extracted from the text data, including lemmatised and stop-word-removed text, noun phrases as keywords and verb lemmas. We then calculated a similarity score between the responses from Bot1 and Bot2 using word vectors to measure semantic similarity. Frequency and sentiment were also analysed. Fisher’s exact test was used to compare the rate of non-pharmacological and pharmacological interventions which were defined by GPT. We used Python V.3.11 package (sklearn and spaCy) for the analyses. All data and codes can be obtained from this link https://github.com/AkikoHanai/LLM_CancerConsul_Trial

Results

The topics of the generated questions were, in order of frequency, sexual functioning (N=24), sexual response (N=13), body image (N=17), intimacy (N=8) and others (N=38), including general lifestyle or health check-up in cancer survivorship (online supplemental file 1). The mean similarity score between Bot1 and Bot2 responses was 0.93 (ranging from 0.77 to 0.98); the less the guideline mentioned the topics, the lower its concordance rate. For sexual response and sexual function, the guidelines recommended both pharmacological and non-pharmacological intervention for them, non-pharmacological intervention (counselling) was significantly more frequently than pharmacological intervention (OR=47.3 in Bot1 (95% CI 9.55 to 233.81, p<0.001), 97.2 in Bot2 (95% CI 11.72 to 806.04, p<0.001)). Sentiment analysis showed a slightly positive polarity (Bot1 mean=0.18 (SD=0.12), Bot2 mean=0.19 (SD=0.15)).

Discussion

When disseminating information about cancer treatment and sexual health issues faced by cancer survivors, the generated chatbots functioned without refusing to answer, with or without training sources of medical guidelines. GPT responses have been noted to be as reliable as web searches and are closer to clinical guidelines, making it a promising tool to support medical communication.7 8 In this study, the GPT returned useful results comparable to the guidelines, not calling for excessive pessimism or optimism. However, GPT-based questions and answers tended to return counselling over pharmacological treatment options, with many responses encouraging consultation with medical staff. The advertising policies of consumer search engines or usage policy of Open AI limit the accessibility of information about medical contents or specific drugs, depending on the legal restrictions in each country. As a result, the AI created may have been biased toward medical consultation, which lies within the realm of ‘specific medical information’ subject to such legal restrictions.

Given the potential use of generative AI to address issues that patients may be hesitant to discuss with medical staff, such as sexual issues, generative AI may help patients clarify their concerns and facilitate shared decision-making. The limitations of this study include adjustments of the prompts and no actual trial with patients or providers to maintain that reliability or validity—however, the situation in which bias due to medical information regulations likely to be universal. Healthcare providers would need to consider the possibility that patients who use consumer web tools, including generative AI, may have expectations for non-pharmacological interventions such as counsellings.