Article Text

Download PDFPDF

10 Development and evaluation of a machine learning model to predict positive urine cultures in the outpatient setting and minimize the use of antibiotics
  1. Farah E Shamout,
  2. Phillip Wang,
  3. Nasir Hayat,
  4. Vee Nis Ling,
  5. Terrence Lee St John,
  6. Ghadeer Ghosheh,
  7. Lelan Orquiola,
  8. Vansh Gadhia and
  9. Zaki Almallah
  1. New York University Abu Dhabi


Objective Excessive prescription of antibiotics is amongst the principal drivers of antibiotic resistance, which is considered a surging threat to global health. The most frequent resistant pathogens are usually linked with urinary tract diseases, such as urinary tract infections (UTI). Studies have shown that clinicians may prescribe antibiotics based on presenting symptoms due to the prolonged time required to obtain the final results of urine bacterial cultures. While many of the current approaches to ameliorate prescribing behavior are educational or regulatory, here we develop and evaluate a logistic regression model that detects the risk of positive urine cultures based on the patient’s history and presenting physiological data extracted from the electronic health records, to help clinicians make informed antibiotic prescription decisions without the need to wait for urine culture results.

Methods We used an anonymized dataset collected between 2015 and 2021 in a multi-specialty large hospital with primary, secondary and tertiary care facilities. The retrospective study received approval by the Institutional Review Board (IRB) from both the research institution and hospital (IRB references: HRPP-2020-173 & A-2019-054, respectively). We included adult outpatient encounters associated with at least one urine culture test. For the input features, we extracted and pre-processed each patient’s demographics (age, sex), comorbidities (diabetes millutus, hypertension, cancer and hyperlididemia), vital signs (pulse, respiratory rate, oxygen saturation, temperature, systolic blood pressure, diastolic blood pressure and fraction of inspired oxygen), instant urine dipstick test results, all collected prior to the acquisition of the urine culture, as well as diagnosis codes (ICD-10 codes) and procedure codes (hospital custom codes) from the patient’s previous hospital encounter. We defined the output as a binary label indicating a positive or negative urine culture result by processing textual data within laboratory test results. We assume a positive urine culture if the concentration of urine pathogen is higher than 100,000 colony forming units per milliliter (CFU/ml). We split the dataset randomly into a training (70%), and test set (30%). We optimized a logistic regression model using the training set with stratified k-fold validation, and evaluated it on the test set with 95% confidence intervals computed using bootstrapping with 1000 iterations.

Results After applying the inclusion criteria, the overall dataset consisted of 11,388 patients with 17,452 unique encounters (56.1% females; mean age 49.1 standard deviation 17.5 years). Amongst all encounters, 2,431 (13.9%) were associated with a positive label. We evaluated the models on the held-out test set consisting of 5,236 encounters (14.2% of encounters had positive urine culture). The logistic regression model achieved a 0.851 (0.837, 0.865 95% CI) Area Under the

Receiver Operating characteristic Curve (AUROC) and 0.584 (0.546, 0.618 95% CI) Area Under the Precision Recall Curve (AUPRC). Amongst the female population, the logistic regression model achieved a 0.806 AUROC compared to a 0.905 AUROC amongst males. When investigating different patient age groups, the model achieved a 0.84 AUROC amongst patients younger than 40 years, compared to 0.848 AUROC amongst patients who are 40 years or older.

We binarized the predictions by adjusting the threshold to achieve approximately 80% sensitivity on the test set, which is a clinically acceptable level of sensitivity. Amongst the 4,460 encounters associated with a negative urine culture in the test set, 351 were prescribed with a UTI-related antibiotic during their respective encounters. With the fixed threshold, our model was able to correctly classify 59.0% (207/351) as negative amongst those who did not require an antibiotic.

Conclusions In this study, we develop and evaluate a machine learning model for predicting positive urine cultures which is associated with UTI amongst outpatients using a real-world dataset. Our results demonstrate that the optimized model has the potential to decrease false positives and as a result minimize unnecessary antibiotic prescription. In future work, we are interested in further improving the model by leveraging temporal sequences of the input features, extensively fine-tuning hyperparameters of the model, and decreasing the performance gap across different patient subgroups. While our study uses a dataset collected in a single cohort, the results can be translated into other settings via external validation or by simply fine-tuning the model. Overall, our novel application is of high relevance to the clinical informatics community considering the global threat of antibiotic resistance, especially in the context of managing urinary tract infections.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.