DISCUSSION
This study examined the data quality of anthropometric measurements extracted from primary care EMRs in Ontario. Overall, data completeness was 66.2% and accuracy was 97.3%. Incompleteness was predominantly due to the high proportion of missing height data (32%). When we examined data completeness for measurements collected only at well-child visits, the proportion of complete records increased to 89%.
These results were similar to previous work on data completeness in EMRs. A study from Kaiser Permanente Colorado examined EMR data on children 3–17 years of age and reported 64% of patients had a BMI measurement at any primary care visit and >95% at well-child visits.25 Accuracy of data recorded in EMRs was also consistent with the previous literature. One study in children 3–5 years of age used a similar multi-step data cleaning strategy and only found 2% of data to be erroneous.22 Another study replicated 11 different methods for identification of potential errors and found the prevalence of data errors ranged from 0.3% to 2.1%.16
The main factor that influenced data completeness and accuracy was child age. The direction and magnitude of the effect of age on data completeness at well-child visits changed when examining children separately by age group. This is likely due to the high number of well-child visits that occur in the first 2 years of life.23 Primary care providers who see young infants more often in the early years may not complete both a length and weight if the child had been seen recently. Moreover, measuring length of a child less than 2 years requires appropriate equipment, such as a length board, which may be a barrier to a complete growth assessment.26 Older children attending well-child visits in the 5–19 year age group had marginally higher data completeness. One reason may be because older children are less likely to attend well-child visits, height and weight measurements may be completed more often if the primary care provider had not seen the child in a longer time interval. Until recently the recommendations for growth monitoring applied to well-child visits only.5,6 In 2015, the Canadian Task Force on Preventive Health Care changed the recommendation to performing both height and weight measurements at all visits for primary and secondary prevention of obesity.27 Future analyses of EMR data will be able to assess the uptake of this recommendation.
Similar to the findings on data completeness, the effect of age on data accuracy was highly significant and differed by age group. One possible reason for this discrepancy is the tendency of measuring infants in pounds and ounces instead of kilograms. Identifying age as a determinant of data accuracy is important for future uses of EMR to develop new data cleaning algorithms that capture multiple unit conversions, especially for the youngest age group where data is most abundant. Despite the differences in age, the high proportion of accurate data was encouraging. One advantage of using EMR data is the ability to examine multiple measurements on the same child.16,28 This not only aids the data cleaning process by being able to examine measurements before and after a suspect value, but it allows researchers to examine how the same population of children can change over time, including into adulthood.
There were several limitations to this study. There may have been misclassification of the data accuracy outcome for several reasons. The lack of units for each numeric value for height and weight was problematic. Although most imperial system values were excluded in the assessment of BIVs, the prevalence of invalid inliers for subjects contributing only one or two measurements is unknown. In the WHO computer program, weight-for-age is not calculated beyond 10 years of age, making it harder to determine which outliers for zBMI data are from weight data in adolescents. The BIV cut-offs suggested by the WHO for calculation of zBMI may be too conservative and be incorrectly excluding those patients with extremely high zBMIs.29 One previous study demonstrated the BIV cut-offs from the WHO underestimated obesity prevalence30 and recently, the Center for Disease Control (CDC) changed their upper limit for BIVs from >+5 to >+8.31 To the best of our knowledge, there are no validated or standardised rules on plausible changes over time to differentiate true errors from correct values.32 More research is required to determine valid BIV cut-offs that can be used for large data sets that are becoming more available with improved health information databases. Lastly, the clinic size variable may have been underestimated for 13% of observations because in eight clinics not all physicians contribute data to the EMRALD network.
The results from this study raise important considerations of the feasibility of using growth data from EMRs for public health and surveillance purposes. Visit type and age were important determinants for whether or not measurements were complete, specifically height. Previous research has shown a difference in zBMI between well-child and sick visits15; we found mean zBMI from sick visits to be significantly higher than well-child visits. It may also be likely that children who attend regular well-child care are systematically different than those who only attend when sick. Therefore, it is important to acknowledge possible selection biases that may be introduced when using data collected in routine primary care. Finally, our study population was skewed to be younger than Ontario rostered patients due to examining visits with growth data which are concentrated in children 0–4 years.
The next step to improving the quality of this EMR data should include developing more sophisticated data cleaning for efficiently maximizing the available data. This includes determining visit type through machine learning text classification of physicians’ common ‘short-hand’ for indicating well-child visits in patient progress notes, validating correct BIVs for patients with severe obesity and determining implausible changes in height and weight over specific time intervals. Future research should develop and validate these data cleaning algorithms for large study populations in order for researchers to standardise techniques. However, despite the need for continuous evaluation of data quality, the current state of growth data was highly complete and accurate. EMRs are a good data source to characterise weight status in a large population of young children and may be useful in assessing uptake of recommendations or interventions related to childhood growth monitoring or obesity.