Discussion
Differences between the most severely deranged laboratory results of patients admitted to the ICU and locally used normal reference ranges were observed in every database studied. These differences persisted even when comparing those patients with the best outcomes against the normal reference range.
In addition, among the databases, differences in the degree of overlap between best and worst group laboratory distributions were observed, which may represent variability in case mix and therapies applied, and/or imply variable discriminatory function among laboratory values based on region. Our findings build on the single-centre work by Tyler et al14 by replicating similar observations across different contexts and geographies. They further support the need to consider context in reacting to abnormal laboratory values, as correcting abnormal values may not always be beneficial or benign.5 11
The differences observed between the reference range and selected ICU values across all five databases suggest that normal reference ranges are not useful in managing critically unwell patients. For instance, the haemoglobin results of most ICU patients fell outside normal reference ranges, irrespective of whether they had the best or worst outcome (figure 1). Patients with the best outcomes (here, ICU survival and shortest LOS) would be expected to have laboratory results that more closely align with the reference range, while those who die in the ICU should have results which are significantly worse. This is based on the assumption that the further patients’ results deviate from the reference range, the more severely deranged their physiology and the more likely they are to have a poor clinical outcome. However, given the difference observed between the reference range and best outcome group, it is clear that normal reference ranges are not meaningful in ICU contexts. This discrepancy likely represents how reference ranges are formulated: reference ranges, though used to define normal and abnormal for both healthy individuals and critically unwell patients, are typically derived from samples of healthy outpatients.1
As expected, patients with the best and worst clinical outcomes had differing laboratory results across databases. However, between databases, we found that the extent to which these groups differed was variable, as the degree of overlap in distributions changed across investigations and the context in which they were utilised. For example, in the UCIHJ23db from Spain, the creatinine of patients with the best and worst clinical outcomes had substantial overlap (OVL=0.67), suggesting a decreased ability for creatinine to differentiate between patients with good and bad outcomes in this context. By comparison, creatinine results in the PLAGH-ICUdb from China demonstrated a lower overlap between groups (OVL=0.48). Consequently, creatinine may serve as a better prognosticator in this database, as it better discriminates between those with good and bad outcomes.
Variation in the overlap of laboratory results between patients with the best and worst outcomes was also seen to different extents in the measurement of haemoglobin, lactate and sodium. These results imply that different investigations may represent good prognosticators in one context but not another and question the value of attempting to return these results to a healthy patient’s reference range. As with currently utilised reference ranges, context-specific reference ranges developed from a heterogeneous cohort of patients would need to be interpreted with an understanding of individual patient factors and how their acute pathology may alter the significance of specific results.
While outside the scope of our project, we included data regarding patient demographics and common critical care interventions (renal replacement therapy (RRT), PRBC transfusion and crystalloid vs colloid resuscitation), which may have indicated mechanisms contributing to these variations in overlap. For instance, PLAGH-ICUdb was seen to have a narrower distribution of creatinine results. While RRT data was not available from this database, it can be seen that intravenous fluid administration was similar to that in other databases, so is unlikely to explain variations in renal function. However, the overall younger age of patients in PLAGH-ICUdb may have contributed to their lower creatinine. Furthermore, PLAGH-ICUdb also displayed the lowest overlap in haemoglobin results between best and worst outcome patients. While they also had the lowest PRBC administration rate, whether this represents a causative relationship is unknown. More broadly, we have demonstrated that substantial variability in the case mix and therapies provided existed across the databases, which may have contributed to differences in results across countries. Considering the differences between centres worldwide, the concept of context-specific reference ranges may prove even more useful by guiding practice with the goal of improving patient outcomes rather than unnecessarily normalising pathology results. Future research must consider the impact of case mix and clinical practices when developing new reference ranges, which would then require prospective validation to confirm them as appropriate treatment targets.
As such, our current study forms the foundation for several avenues of future enquiry. First, we intend to analyse and compare more homogeneous subgroups of patients (eg, cardiac surgery) and define context-specific laboratory result ranges, which are associated with the best clinical outcomes, and may therefore represent ‘normality’ for these groups of critically ill patients. Such reference ranges may then be prospectively validated to determine if they represent appropriate treatment targets and whether deviation from these ranges are associated with poorer outcomes. Furthermore, prospective studies will allow for the collection of data regarding the therapies provided to patients and thereby an investigation of the mechanism through which context-specific variations may arise. In addition, among databases that have collected data over a greater length of time (eg, PLAGH-ICUdb from 2008 until 2019), we intend to investigate whether the association between laboratory results and outcomes varies over time and therefore suggests that the prognostic value of results and their corresponding reference ranges require periodic review.
Strengths and limitations
Our study has several strengths. It is an analysis of an extremely large dataset, including more than 250 000 patients from three continents. Moreover, the ICUs included are varied in their case mixes and the corresponding severity of illness of their patients.
However, several limitations exist within this study. As with all retrospective research involving multiple large databases, variation in the design, collection and coding of variables may vary across datasets, creating inaccuracy in results. In our study, this is mitigated through the use of objective variables including laboratory results, ICU LOS and ICU mortality. Retrospective research of this nature is also inherently limited by missing data. Notably in our study this included missing data regarding lactate and interventions from PLAGH-ICUdb and intravenous fluid therapy and transfusions in eICU-CRD, respectively. However, other than lactate, these variables were used purely for hypothesis-generating purposes and do not alter our primary findings. The included databases collected information from ICU admissions across varying years, so differences in results may reflect changes in global practices over time rather than differences between centres or countries. Dichotomising patients into those with best and worse outcomes using ICU LOS and mortality does not reflect patient outcomes beyond ICU discharge. This includes the possibility that patients classified as having the ‘best’ outcome may have been discharged quickly from the ICU to receive end-of-life care. However, these definitions improved interpretability of our results and are consistent with those used previously.14 Further, our study includes descriptive analyses without adjustment for potential confounders. Therefore, the associations between individual laboratory results and patient outcomes do not indicate independent causative relationships and should not be interpreted as such. Finally, comparing heterogeneous patient populations comprising varied case mixes is problematic. The possibility that context-specific reference ranges would also need to vary based on patient factors or specific conditions exists, though could not be concisely investigated in our present work.