Original Research

Reliability of comorbidity scores derived from administrative data in the tertiary hospital intensive care setting: a cross-sectional study

Abstract

Background Hospital reporting systems commonly use administrative data to calculate comorbidity scores in order to provide risk-adjustment to outcome indicators.

Objective We aimed to elucidate the level of agreement between administrative coding data and medical chart review for extraction of comorbidities included in the Charlson Comorbidity Index (CCI) and Elixhauser Index (EI) for patients admitted to the intensive care unit of a university-affiliated hospital.

Method We conducted an examination of a random cross-section of 100 patient episodes over 12 months (July 2012 to June 2013) for the 19 CCI and 30 EI comorbidities reported in administrative data and the manual medical record system. CCI and EI comorbidities were collected in order to ascertain the difference in mean indices, detect any systematic bias, and ascertain inter-rater agreement.

Results We found reasonable inter-rater agreement (kappa (κ) coefficient ≥0.4) for cardiorespiratory and oncological comorbidities, but little agreement (κ<0.4) for other comorbidities. Comorbidity indices derived from administrative data were significantly lower than from chart review: −0.81 (95% CI − 1.29 to − 0.33; p=0.001) for CCI, and −2.57 (95% CI −4.46 to −0.68; p=0.008) for EI.

Conclusion While cardiorespiratory and oncological comorbidities were reliably coded in administrative data, most other comorbidities were under-reported and an unreliable source for estimation of CCI or EI in intensive care patients. Further examination of a large multicentre population is required to confirm our findings.

Summary

  • The reliability of intensive care coded comorbidity data has not been previously studied.

  • Administrative coding of comorbidities is less reliable when more comorbidities are present.

  • When monitoring quality and safety of care in the intensive care unit, the level of comorbidity present are likely to be underestimated.

Introduction

Administrative data have traditionally been collected to assist funding, policy formulation, and epidemiological research.1 One example is the derivation of comorbidity scores as a measure of pre-morbid status. Such scores are useful for describing casemix and complexity, and are frequently included in risk-adjustment scores.

In Australia, the National Core Hospital-based Outcome Indicators (CHBOI)2 and the Health Round Table reports are derived from coding data, include comorbidity indices, and provide risk-adjustment for outcome indicators such as hospital mortality. Administrative data, while cost-effective, may have limitations in outcomes-based research.3–5

Two commonly used comorbidity scores are the Charlson Comorbidity Index (CCI)6 and the Elixhauser Index (EI).7 Both comorbidity indices can be derived from hospital administrative data.7 The CCI comprises 19 comorbidities and was developed 45 years ago from 559 cancer patients to predict long-term survival. It can be calculated from administrative data and has been validated for use as a predictor of mortality and morbidity since its inception.6 8–10 The EI comprises 30 comorbidities derived from the coding data of 1.8 million hospital patients in 1992. It has been found to be superior to CCI at predicting outcomes.11–15 Importantly, the EI attempts to avoid contamination by comorbidities derived on or after admission, such as complications and primary diagnosis, respectively. Summary comorbidities are able to condense a large number of comorbidities to aid in mortality risk prediction. However, a major limitation between research and use of these scores is the way in which such data are collected and presented.16

Evidence examining the reliability of coding data is variable, with a systematic review by Prins et al suggesting that just over 50% of comorbidities are coded in hospital discharge data.17 This is supported by one local18 and a number of international reports.19–25 Some studies of specific patient populations, however, suggest that coding data are reasonably reliable when compared with chart review by health professionals.26–29 Nevertheless, the use and interpretation of routinely collected hospital administrative data to assess patient complexity and performance indicators remain contentious.30–32

There are limited data for intensive care settings where complex patients may have a higher number of comorbidities.33 Chong et al suggest that the reliability of coding data may be inversely proportional to the number of comorbidities per episode.34 To our knowledge, the reliability of comorbidity scores in the Australian intensive care population has not yet been examined.

Medical language processing, which automatically extracts keywords from medical charts, has been shown to be similar to manual chart review of medical records,35 which remains the gold standard when assessing the reliability of coding data.36 Thus, we aimed to compare the reliability of routinely collected hospital data for deriving the CCI and EI scores compared with manual chart review.

Methods

Design

A cross-sectional study design was used. One hundred independent patient episodes were randomly selected from a total of 828 admissions to a university-affiliated, tertiary, adult, 10-bed intensive care unit (ICU) over a 12-month period from 1 July 2012 to 30 June 2013. The episodes were stratified into two equal subgroups: those requiring mechanical ventilation and those not requiring ventilation, in order to ensure that the population with higher comorbidities was captured as we theorised that patients requiring mechanical ventilation would be more likely to have a higher CCI or EI. Episodes belonging to each group were randomly sampled individually one at a time, alternatively, until each group was populated with 50 episodes. Both groups were examined for repeat episodes, which were removed, and the alternating process of random selection was continued until a total of 100 patients was reached.

All manual and electronic medical records (EMR) relating to the episode of care were routinely scanned for storage. Administrative coding data were derived from these charts and stored electronically. The researchers were not involved in any of the following duties: medical record keeping, scanning of records, and administrative data coding.

Scanned medical records for these episodes were audited by two medically-trained investigators blinded to the results of the administrative coding data. To ensure consistency, investigators cross-checked five episodes not included in the sample to minimise investigator bias. Data collected from medical records were specific to only those included in the EI and CCI. Administrative coding diagnoses with a ‘c’ prefix, indicating a complication or diagnosis not present on admission, were excluded.

An independent analyst, blinded to the medical review and the data coding process, then extracted International Statistical Classification of Diseases and Related Health Problems, 10th revision, Australian Modification (ICD-10-AM) codes relating to all CCI and EI comorbidity diagnoses using previously validated coding algorithms37 before comparing agreement with retrospective chart analysis.

Hospital ethics approval was provided by the institutional ethics committee before commencement of the study. Data for the investigation were de-identified and patient consent was deemed unnecessary.

Analysis

Frequency of each specific CCI and CI comorbidity was recorded, and CCI and EI scores were obtained.6 7 Paired t-tests were used to compare frequency of comorbidities from chart review audit and administrative data coding, with 95% confidence intervals (95% CI) of the mean reported. A Bland-Altman plot was prepared for the CCI and EI scores to determine systematic bias. The reliability between the Health Information Systems administrative coding staff and medical-trained coders was assessed by calculating kappa (κ) statistics for multiple raters.38 39 A κ≥0.4 was considered to have at least moderate association.40 Analysis was conducted using Stata version 14 (Stata Corporation, College Station, Texas). A value of p≤0.05 was considered to be statistically significant.

Results

From 1 July 2012 to 30 June 2013, there were 828 patients admitted to the ICU, and 257 (31.0%) received mechanical ventilation. The study population included 49 (19.1%) episodes requiring ventilation and 51 (8.9%) of the episodes not requiring ventilation. A total of 100 (12.0%) records were audited.

The characteristics of the study population and the two subgroups are presented in table 1. Study patients had a median of 8 (IQR 5.0–12.5) coded general comorbidities.

Table 1
|
Demographics

The number of Charlson comorbidities identified by chart review (mean 2.26±1.82) was significantly greater (p<0.001) than the number of Charlson comorbidities derived from administrative coding data (mean 1.39±1.19).

The mean CCI derived from the administrative data (cCCI) was 2.52 (95% CI 1.95 to 3.09). The mean CCI derived from a chart review audit (aCCI) was 3.33 (95% CI 2.77 to 3.99). There was a significant difference of −0.81 (95% CI −1.29 to −0.33; p=0.001) between the two methods.

The Bland-Altman plot (figure 1) did not reveal evidence of any systematic bias as the CCI score increased (taken as the average between the two methods of Charlson scores extraction).

Figure 1
Figure 1

Results. Upper left panel: scatter plot of Charlson Comorbidity Index (CCI). Lower left panel: scatter plot for Elixhauser Index (EI). Upper right panel: Bland-Altman plot for CCI. Lower right panel: Bland-Altman Plot for EI.

As expected, the number of EI comorbidities identified was greater than the number of CCI comorbidities in each record. The number of Elixhauser comorbidities identified by chart review (mean 4.15±2.75) was significantly greater (p<0.001) than the number of comorbidities derived from administrative coding data (mean 2.67±1.66).

The mean EI derived from the administrative data (cEI) was 7.96 (95% CI 6.55 to 9.37) and the mean EI derived from a chart review audit (aEI) was 10.53 (95% CI 8.42 to 12.64). Thus, there was a significant difference of −2.57 (95% CI −4.46 to −0.68; p=0.008) between the two EI scores.

Unlike the CCI, the Bland-Altman plot (figure 1) for EI did indicate a bias in the difference between coded and audited EI scores. For low range EI scores, the administrative (coding) data produced a greater score than chart review audit scores, whereas the reverse applied for high range EI scores.

The kappa statistic revealed a moderate to high (κ≥0.4) level of inter-rater agreement in only seven (37%) of the CCI comorbidities: congestive heart failure (CHF), myocardial infarction (MI), diabetes mellitus with complications (DMC), chronic kidney disease (CKD), metastatic cancer, solid-organ cancer, and peripheral vascular disease (PVD) (table 2). The kappa statistic for EI comorbidities showed a moderate to high level of inter-rater agreement for the same group of comorbidities (except MI and PVD), and also for hypertension, chronic pulmonary disease (COPD), anaemia, and drug abuse (table 3).

Table 2
|
Charlson Comorbidity Index inter-rater agreement
Table 3
|
Elixhauser Index inter-rater agreement

All remaining comorbidities had a lower level of inter-rater agreement (κ<0.4) in 12 (63%) of the CCI and 21 (70%) of the EI comorbidities.

Discussion

We undertook a retrospective cross-sectional review of patient records (chart review) and administrative coding data for comorbidities in 100 patients admitted to an adult general intensive care ward. We found that administrative data significantly under-reported comorbidities present in the patient records in the majority of cases. Our findings are, in general, consistent with several previous reports.17–25

In contrast to our overall findings, we found a small number of comorbidities that were reliably reported (κ≥0.4) in the administrative (coding) data. These were CHF, CKD, DMC, solid-organ cancer, and metastatic cancer.

In 1999, Kieszak et al performed a study examining the CCI of carotid endarterectomy cases at a single health service.25 Coded data obtained from an administrative database were compared with a medical chart review and concluded that medical chart review was superior to audited data. A few years later, Quan et al conducted a similar study looking at all inpatients in a large health service and showed that, overall, coded data tended to under-report comorbidities.41 Youssef et al examined data for general medical inpatients in Saudi Arabia and drew a similar conclusion.29 Recently, this has been confirmed in a Norwegian general intensive care population by Stavem et al42 (table 4). In addition to those comorbidities in our study that were more reliably reported, Stavem et al’s individual comorbidities were also more reliably coded for cerebrovascular disease, dementia, and mild liver disease. As our institution has a similar casemix and size to their study, such differences could be accounted for by differences in coding methodology. Nevertheless, from two studies in separate countries, it is clear that certain comorbidities are more reliably coded than others and may provide guidance regarding data that should be included in risk-prediction models when comparing health services in different geographical locations.

Table 4
|
Audited and coded inter-rater agreement

We selected adult admissions to an intensive care setting at a tertiary hospital with a high proportion of mechanically-ventilated patients because we expected these patients to more likely have comorbidities, and these comorbidities were likely to influence the level of casemix funding and thus be more reliably coded. It is not unexpected for chronic conditions that do not require intervention and do not affect funding to be excluded during the coding process. We did not ascertain the effect of coding on funding of patient episodes since this was not our primary aim; however, it has been suggested that the CCI is an inadequate predictor of resource utilisation.43

While there were no statistically significant differences in CCI and EI scores between the ventilated and non-ventilated patients, non-ventilated patients had a higher number of coded comorbidities, were more likely to stay in intensive care for longer, and had an increased incidence of mortality in the same admission. This may be explained by the possibility that the non-ventilated patient group might have included a sizeable population in which ventilation was either not deemed to be of therapeutic benefit because of a lack of a clear indication, or because of a poor prognosis due to a number of other comorbities that might not have been captured by the CCI or EI. Overall, our observation of lower inter-rater agreement compared with other hospital settings25 29 41 is consistent with the hypothesis that coding reliability may be inversely proportional to the number of comorbidities.34

The Charlson methodology is more commonly used in risk-adjustment than the Elixhauser methodology,7 even though it was derived from a small and specific cancer population using chart review. The EI, which was derived from administrative data from a large population and broad casemix, identifies a higher number of comorbidities. Our results suggest that the under-reporting of EI is comparable with the CCI, and that administrative data may not be reliable in generating either CCI or EI scores for intensive care patients.

There are several practical implications of our findings. The use of administrative data in ICUs to predict mortality through use of the CCI and EI should be viewed with a great degree of caution. The Charlson comorbidities and the derived CCI score are commonly used for risk-adjustment in several mortality prediction models constructed from administrative data. In Victoria, these include the Health Round Table Reports,44 the National CHBOI mortality index,2 and the Dr Foster methodology.45 This is in contrast to models such as the Critical Care Outcome Prediction Equation (COPE) and Hospital Outcome Prediction Equation (HOPE), which do not include comorbidities.46 47 Based on our study, predicted morbidity and mortality in ICUs is likely to be under-reported when such models are based on administrative coding data.

If the CCI or EI are included in a mortality prediction model, such as a hospital standardised mortality ratio (HSMR) that is derived from administrative data, then several errors may result. First, a systemic bias due to under-reporting will be incorporated into the model. Reliance on administrative data for CCI may result in under-reporting of comorbidities and incomplete assessment of patient risk.

Second, any variation in reporting of comorbidities between institutions will lead to misleading comparative results. A health service that under-reports comorbidities will have lower CCI and EI scores resulting in these patients appearing to be healthier. This will reduce the size of the mortality denominator and produce a higher than expected HSMR. Such an institution will misleadingly appear to be a poor performer.

Thirdly, chart review of a random selection of patients may aid a ‘poor performing’ health service in identifying this as a potential source of bias in their report card. A better solution is for prediction models to identify and incorporate only those comorbidities that are reliably coded (CHF, CKD, COPD, cancer), rather than rely on the less accurate index scores (such as CCI and EI) that incorporate comorbidities that are unreliably reported. The optimal source from where not only the most accurate, but also the most efficient, CCI can be obtained also warrants further investigation.48 The increasing prevalence of EMR provides a potential for capturing large data more uniformly.36 With this, questions are raised regarding which types of algorithms are more effective and whether medical language processing can be standardised across different practice settings and health services.43 Furthermore, the widespread use of EMR for national safety and quality purposes requires standardisation of data management processes and compliance with regulatory requirements.49

Our study had a number of limitations. The sample size was relatively small and limited to a single site, reducing the precision of the estimates and the power to detect differences for some conditions. We conducted our study in the intensive care setting, and our results may not be generalisable to other patient groups, departmental settings, or hospital sites. We found evidence of systematic bias in the EI score that may reflect local coding rules. Our results should be viewed with caution and require validation in a larger cohort.

Conclusions

Our findings suggest that there is under-reporting of comorbidities that are necessary to calculate the CCI and the EI in administrative data for seriously ill patients, such as those admitted to the intensive care ward. Derived total (CCI and EI) scores may produce misleading results. Consideration should be given to limiting and validating a revised CCI, using an alternative comorbidity model, or negating comorbidities entirely when calculating the HSMR as has been done by the COPE and HOPE models. Further studies are required to establish the reliability of the CCI and EI in other patient groups.