TY - JOUR T1 - Development of a customised programme to standardise comorbidity diagnosis codes in a large-scale database JF - BMJ Health & Care Informatics JO - BMJ Health Care Inform DO - 10.1136/bmjhci-2021-100532 VL - 29 IS - 1 SP - e100532 AU - Robert C Osorio AU - Kunal P Raygor AU - Adib A Abla Y1 - 2022/04/01 UR - http://informatics.bmj.com/content/29/1/e100532.abstract N2 - Objectives The transition from ICD-9 to ICD-10 coding creates a data standardisation challenge for large-scale longitudinal research. We sought to develop a programme that automated this standardisation process.Methods A programme was developed to standardise ICD-9 and ICD-10 terminology into one system. Code was improved to reduce runtime, and two iterations were tested on a joint ICD-9/ICD-10 database of 15.8 million patients.Results Both programmes successfully standardised diagnostic terminology in the database. While the original programme updated 100 000 cells in 12.5 hours, the improved programme translated 3.1 million cells in 38 min.Discussion While both programmes successfully translated ICD-related data into a standardised format, the original programme suffered from excessive runtimes. Code improvement with hash tables and parallelisation exponentially reduced these runtimes.Conclusion Databases with ICD-9 and ICD-10 codes require terminology standardisation for analysis. By sharing our programme’s implementation, we hope to assist other researchers in standardising their own databases.Data may be obtained from a third party and are not publicly available. Data are available through the Healthcare Cost and Utilisation Project. ER -