Introduction
Surveillance is the cornerstone of public health practice as well as research.1 Public health surveillance involves the systematic collection, analysis, interpretation and dissemination of health-related data to inform population health policies, programmes and interventions. The most prevalent type of data used by public health authorities to identify population health trends, examine healthcare access and assess population level outcomes is administrative data.
The term ‘administrative data’ in healthcare refers to data generated during routine healthcare delivery processes,2 which includes but is not limited to outpatient encounters, hospital admissions and pharmacy dispensing events. These data are used for a range of administrative purposes in healthcare, such as billing payers for healthcare services and measuring the efficiency of healthcare delivery. Administrative data include patient demographic information (eg, age, race), insurance plan enrolment (eg, Medicare, Medicaid, Blue Cross/Blue Shield, etc), hospital discharge information (eg, reason for admission, discharge disposition), procedures delivered during an outpatient encounter and pharmaceutical claims (eg, medication dispensed, route of administration). Administrative data are typically structured using information coding standards to enable interpretation and analysis. As a result, they represent accessible, available data suited for secondary purposes such as population health research.
Accessibility and availability are two important dimensions of data quality,3 yet they are only part of the quality equation with respect to population health research.4 To be of high quality, administrative data must also be accurate.5 Otherwise health policies, programmes and interventions that derive from observational research will be developed on false premises and may therefore fail to prevent disease, prolong life or promote health.
In the context of public health surveillance at the national level, scientists use diagnostic codes from administrative data in the form of International Classification of Diseases (ICD) codes to identify disease-specific cohorts for assessment of population level trends and outcomes. While other coding systems, such as the Logical Observation Identifiers Names and Codes, are used to electronically report positive disease cases to local and state health authorities,6 7 national surveillance systems lack these data as they are often removed when cases are reported to federal information systems. Moreover, scientists at federal health agencies are unable to measure positivity, adherence to testing guidelines and other population indicators when data reported to state health authorities come from just those individuals with a disease. Robust public health surveillance and research requires accessible, available and accurate data for population numerators and denominators. Therefore, national public health scientists leverage large, population health data sets that consist primarily of administrative data.
Existing literature on the accuracy of ICD codes in administrative data is mixed. Whereas a recent Canadian study found that ICD codes had a low positive predictive value (PPV) of 16% with respect to the identification of pertussis cases,8 an earlier study in Canada found that this data possessed high sensitivity (96.2%) and specificity (99.6%) with respect to the identification of HIV infection.9 The ability of ICD codes from administrative data to reliably identify true cases of disease therefore appears to vary, suggesting that validation of administrative data is important to conduct for all diseases of public health importance.
Sexually transmitted infections (STIs) are an important public health challenge and a focus of the Healthy People 2020 goals.10 Undiagnosed and untreated STIs are associated with adverse outcomes such as infertility, pelvic inflammatory disease, chronic pelvic pain, HIV acquisition, neurologic disease and adverse pregnancy outcomes. Several STI health services are recommended by the WHO to protect the reproductive and sexual health of men and women, yet referral to and performance of these services requires accurate identification of STI incidence, prevalence and outcomes. Chlamydia, gonorrhoea and syphilis are the most prevalent and curable STIs reportable under public health laws in the USA11 and more than 1 million STIs are acquired every day worldwide.12 Accuracy of administrative data for identifying cases of chlamydia, gonorrhoea and syphilis is unknown.
Objective
The purpose of this study is to review the extant literature for evidence on using ICD codes from administrative data to reliably identify populations with chlamydia, gonorrhoea and syphilis. Our findings will provide insights into the interpretation of results from STI studies that use administrative data for cohort identification. Further, these insights may inform future population health research.