Article Text

Download PDFPDF

Methods to describe referral patterns in a Canadian primary care electronic medical record database: modelling multi-level count data
  1. Bridget L. Ryan,
  2. Joshua Shadd,
  3. Heather Maddocks,
  4. Moira Stewart,
  5. Amardeep Thind and
  6. Amanda L. Terry
  1. Centre for Studies in Family Medicine, Department of Family Medicine, Schulich School of Medicine & Dentistry, Western University, London, ON, Canada, and Department of Epidemiology and Biostatistics, Schulich School of Medicine & Dentistry, Western University, London, ON, Canada
  2. Department of Family Medicine, McMaster University, Hamilton, ON, Canada
  3. Centre for Studies in Family Medicine, Department of Family Medicine, Schulich School of Medicine & Dentistry, Western University, London, ON, Canada
  4. Department of Family Medicine, Schulich School of Medicine & Dentistry, Western University, London, ON, Canada, Department of Epidemiology and Biostatistics, Schulich School of Medicine & Dentistry, Western University, London, ON, Canada, and Schulich Interfaculty Program in Public Health, Schulich School of Medicine & Dentistry, Western University, London, ON, Canada
  1. Author address for correspondence: Bridget L. Ryan 1151 Richmond Street Centre for Studies in Family Medicine Western Centre for Public Health and Family Medicine Western University London, ON N6A 3K7, Canada bryan{at}


Background A referral from a family physician (FP) to a specialist is an inflection point in the patient journey, with potential implications for clinical outcomes and health policy. Primary care electronic medical record (EMR) databases offer opportunities to examine referral patterns. Until recently, software techniques were not available to model these kinds of multi-level count data.

Objective To establish methodology for determining referral rates from FPs to medical specialists using the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) EMR database.

Method Retrospective cohort study, mixed effects and multi-level negative binomial regression modelling with 87,258 eligible patients between 2007 and 2012. Mean referrals compared by patient sex, age, chronic conditions, FP visits, and urban/rural practice location. Proportion of variance in referral rates attributable to the patient and practice levels.

Results On average, males had 0.26 and females had 0.31 referrals in a 12-month period. Referrals were significantly higher for females, increased with age, FP visits and the number of chronic conditions (p < 0.0001). Overall, 14% of the variance in referrals could be attributed to the practice level, and 86% to patient level characteristics.

Conclusions Both the patient and practice characteristics influenced referral patterns. The methodologic insights gained from this study have relevance to future studies on many research questions that utilise count data, both within primary care and broader health services research. The utility of the CPCSSN database will continue to increase in tandem with data quality improvements, providing a valuable resource to study Canadian referral patterns over time.

  • primary health care
  • referral and consultation
  • electronic medical records
  • multi-level negative binomial regression modelling

Commons license

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


This retrospective cohort study described referral patterns from family physicians (FPs) to other medical specialties. Patterns of referrals reflect standards of care, physician practice scope and patient expectations, and are influenced by policy,1,2 geography,3 physician4 and patient characteristics.3, 5, 6 Most variability in referral rates arises from the patient.3, 5, 6 Clinical factors such as chronic conditions are of particular importance.4,7 Primary care electronic medical record (EMR) databases are ideally suited to explore these clinical influences. Unlike registries or health administrative databases, primary care EMRs are clinically comprehensive and contain patient data unavailable elsewhere.

In order to take advantage of the rich data available in EMRs, appropriate statistical modelling must be employed. There are many outcomes in primary care research that take the form of counts; for example, physician visits, referrals to other providers, chronic conditions, medications and diagnostic tests. Logistic regression, where counts are dichotomised, is often used to model these data. While not incorrect, dichotomizing always results in the loss of valuable information.8 An alternative that maintains the variation in the outcome is to use a multi-variable technique such as Poisson regression which can model count data. One assumption of the Poisson distribution is that the mean and variance are the same. However, primary care data are often over-dispersed (with a large number of zero counts in the data), meaning that this assumption is not met. For example, in modelling the number of FP visits made by a population in a year, there will be many people who do not visit at all, some who visit only once or twice, and progressively smaller numbers of people with more visits. Poisson regression is not appropriate in this situation. The negative binomial distribution is more flexible and is well suited to handle over-dispersed data.9

Further complicating the study of primary care count variables is the fact that much of primary care data are collected about patients within practice settings. This is especially true in the growing area of EMR database research, where patient level data are collected for many practices. This clustering of the data must be accounted for, using for example, multi-level modelling techniques which allow for the apportioning of variance between patient and practice levels. Until recently, there was no readily available software that could perform multi-level negative binomial regression, a technique that can both properly model over-dispersed count data and account for the clustering of individual patient level data within practice settings. With the recent inclusion of multi-level negative binomial regression in statistical software packages, its use has grown in popularity.10

This paper provides an illustration of multi-level negative binomial regression which models over-dispersed health care count data (the number of referrals) and accounts for the clustering of patients within practices. The methodologic insights gained from this study have relevance to future studies on many research questions that utilise count data, both within primary care and broader health services research.


Setting and sample

This study used data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN), a national database of de-identified primary care EMRs. Eleven practice-based primary care networks contribute patient data from seven provinces and one territory which is merged into a single structured database. For this study, a five-year period of data, from 1 July 2007 to 30 June 2012, was extracted and contained the patient EMRs from five provinces, nine networks, and 57 practice sites with a total of 177,093 patients.

At the time of the extract, not all practice sites were contributing referral data. As well, some sites were not contributing useable data for the variables of interest for this study. In order to exclude the practice sites with missing or incomplete data, a conservative criterion was set. Practice sites with more than 10% of their referrals of an unknown type were excluded from the analysis. At some practice sites, the EMR referral field is used to record both outgoing referrals and incoming consultations. Therefore, criteria described elsewhere were applied to distinguish outgoing FP-generated referrals from incoming medical specialist consultant letters.11

Within the eligible practice sites, a cohort of patients was selected with at least two in-office visits at least 12 months apart, and complete sex and age information. Within this patient cohort, medical referrals were identified.


Analyses were conducted in Stata 13.12 For the descriptive analysis, the mean annual number of referrals per patient per year was described across patient-level factors: sex, age group, the total number of the eight CPCSSN validated chronic conditions (diabetes, hypertension, chronic obstructive pulmonary disease, depression, osteoarthritis, epilepsy, Parkinson’s disease and dementia),13 and the number of FP visits. The mean annual number of referrals per patient per year was also described for urban and rural practices, defined as the forward sortation area (first three digits of the postal code)14 which was the only practice-level factor available in the data.

For the multi-variable analyses, the outcome was the total number of referrals for each patient and was modelled as a count variable. Sex was a categorical factor and the remaining patient factors were modelled as continuous: age, the total number of conditions and the total number of FP visits. The urban/rural practice site factor was modelled as a categorical variable. Some patients had visits more than 12 months apart, increasing their exposure to receive a referral. To account for the unequal length of time patients appeared in the EMR data, exposure time in months between first and last visit in the time period was included as a variable in the model. The multi-level model was built as a series of steps that explored the clustering of patients (level 1) within practice sites (level 2) and examined the fixed effects of the patient factors.

Step 1. Testing the variance in referral rates across practice sites

An empty (that is, with no explanatory factors) mixed effects model was run to determine the proportion of variance in the overall referral rate accounted for by the patient and practice levels. This model represented the total variance in the number of referrals between the practice sites and was expressed as the intra-class correlation coefficient of 0.14. This indicated that 14% of the variance in referrals could be attributed to the practice level.

Step 2. Selecting a modelling distribution

The mean number of referrals per patient was 0.9, and the variance was 1.83 across practice sites, violating the assumption of the Poisson distribution that the mean and the variance are equal. To test the improvement of the negative binomial over the Poisson, empty models (controlling only for the unequal exposure times of the patients) using each distribution were compared using a likelihood ratio (LR) test with one degree of freedom and a critical chi square value of 11.96. This test was found to be significant (LR = 5024.59 and p < 0.0001). In addition, the extra parameter (/ln alpha) modelling the dispersion of the outcome variable in the negative binomial model was significant (p-value = 0.0001), indicating the appropriateness of the negative binomial distribution for this study.

Step 3. Modelling individual fixed effects of patient level factors

The individual fixed effects of each of the patient level factors were tested separately, and then included in a full model with all the factors. Coefficients were expressed as incidence rate ratios, and a 95% confidence interval was constructed around all parameters. LR test results found each factor to be a significant improvement over the empty model (results not shown).

A secondary objective at this step in the modelling was to assess whether the count of chronic conditions appeared to be a reasonable representation of the morbidity burden for these patients. Each of the eight CPCSSN-validated conditions was added into the model. All were found to be highly significant predictors of an increase in the incidence rate ratio of referrals (results not shown), suggesting that no one chronic condition was more influential in driving referrals. Therefore, to preserve the parsimony of the model, the total number of conditions (0 through 8), rather than individual conditions, was used as a patient-level factor.

Step 4. Modelling practice-level factors

To determine whether the urban/rural practice-level factor improved the model fit, it was included in the full model at level 2. The LR test compared the full model with and without this characteristic, and the inclusion of this level 2 factor was not a significant improvement over the full model. Thus, the final model did not include this additional practice-level factor.


Table 1 shows the location of the eligible practice sites across the provinces and networks and the number of patients within each.

Table 1 Networks and practice sites in the Canadian Primary Care Sentinel Surveillance Network (1 July 2007–30 June 2012)

There were 78,731 medical referrals during the study period for 87,258 patients from 28 eligible practice sites. The mean exposure time (first visit to the end of data period) was 39.6 months [Standard Deviation (SD) = 13.5]. Referral rates ranged across practice sites from 0.06 to 0.71 referrals/patient/year. The mean number of referrals was 0.29 referrals/patient/year (SD = 0.46). Table 2 reports the mean referral rates across patient-level variables. Patients had a mean of 14.6 FP visits (SD = 13.3).

Table 2 Mean number of referrals per 12-month period in the DELPHI database (1 July 2007–30 June 2012), n = 87258 patients

Table 3 reports the independent effects of each patient-level variable and the full model. Males were less likely to have as many referrals as females, and the probability of referrals increased with age, the total number of chronic conditions and the number of FP visits.

Table 3 Multi-level negative binomial regression models showing the association of patient-level characteristics with the total number of referrals in the CPCSSN database between 1 July 2007–30 June 2012 (n = 28 practices and n = 87258 patients)


Using multi-level negative binomial regression to account for over-dispersion of data, this study demonstrated statistical modelling that will allow for a more refined understanding of the influence of patient, physician, practice and jurisdictional levels on referrals. While this study extends our statistical modelling for primary care count data, there were limitations in the application of our model.

The number of level 2 groups (practice sites) was not large enough to guarantee unbiased estimates in a multi-level model. Several researchers have used Monte Carlo simulations to investigate the effects of level 1 and level 2 sample sizes on the precision of variance components, estimates and standard errors.15 While there is some disagreement over the robustness of fixed and random effects,16 there are similar findings that the standard errors are generally underestimated when the number of groups is below 50 and the model is complex, with additional parameters and a non-linear distribution (such as the negative binomial distribution we have used).17 For a more thorough explanation of cluster robust inference, the reader is referred to Cameron and Miller.18

Regarding representativeness of the data, the current age and sex of the CPCSSN database have been found to be somewhat representative of the Canadian population as measured by the Canadian census.19 Further, for this study in particular, missing or incomplete referrals data for half of the sites prevented modelling at the regional level in the current study and resulted in over-representation of one region, limiting generalizability. Therefore, improvements in the completeness of EMR data are needed in order to model additional levels in the analysis of count data.

In addition, the distribution of practice locations was unbalanced with over 90% of the sites urban. This may have limited the improvement in the multi-level model when we included this practice-level factor. Other practice-level factors were not measured, including an analysis by province, as several sites within provinces were ineligible for inclusion in the study, and the remaining sample was predominantly from Ontario. Improvements in our ability to capture accurate practice-level characteristics are needed to model levels beyond the individual patient level.

The overall referral rate (0.29 referrals/year/patient) was lower than two regional Canadian studies (0.563 and 0.466). While this may be attributable to time effects, differences in referral practice by jurisdiction, and/or differences in patient populations, it is likely that this reflects the effect of missing and/or incomplete data from some practice sites. Despite the lower referral rate, the model was consistent with previous research,35 where referral rates increased with patient age, female sex and more exposure to the physician through FP visits where a referral might occur. To our knowledge, this is the first referrals study to account for multiple chronic conditions. Unsurprisingly, the number of medical referrals increased with morbidity level.

Despite its limitations, the analysis conducted in this paper is a major step forward in the methods used to understand EMR data reported in the primary care literature.6,20 This multi-level negative binomial analysis can serve as an illustration for the modelling of myriad count outcomes that are important in primary and other healthcare research. In particular, researchers using pooled EMR data from several practice sites should employ multi-level modelling to account for the clustering of these data. As the accuracy and completeness of EMR data improve over time, the power of these analytic techniques will further increase.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.