## Introduction

This retrospective cohort study described referral patterns from family physicians (FPs) to other medical specialties. Patterns of referrals reflect standards of care, physician practice scope and patient expectations, and are influenced by policy,1^{,}2 geography,3 physician4 and patient characteristics.3^{,} 5^{,} 6 Most variability in referral rates arises from the patient.3^{,} 5^{,} 6 Clinical factors such as chronic conditions are of particular importance.4^{,}7 Primary care electronic medical record (EMR) databases are ideally suited to explore these clinical influences. Unlike registries or health administrative databases, primary care EMRs are clinically comprehensive and contain patient data unavailable elsewhere.

In order to take advantage of the rich data available in EMRs, appropriate statistical modelling must be employed. There are many outcomes in primary care research that take the form of counts; for example, physician visits, referrals to other providers, chronic conditions, medications and diagnostic tests. Logistic regression, where counts are dichotomised, is often used to model these data. While not incorrect, dichotomizing always results in the loss of valuable information.8 An alternative that maintains the variation in the outcome is to use a multi-variable technique such as Poisson regression which can model count data. One assumption of the Poisson distribution is that the mean and variance are the same. However, primary care data are often over-dispersed (with a large number of zero counts in the data), meaning that this assumption is not met. For example, in modelling the number of FP visits made by a population in a year, there will be many people who do not visit at all, some who visit only once or twice, and progressively smaller numbers of people with more visits. Poisson regression is not appropriate in this situation. The negative binomial distribution is more flexible and is well suited to handle over-dispersed data.9

Further complicating the study of primary care count variables is the fact that much of primary care data are collected about patients within practice settings. This is especially true in the growing area of EMR database research, where patient level data are collected for many practices. This clustering of the data must be accounted for, using for example, multi-level modelling techniques which allow for the apportioning of variance between patient and practice levels. Until recently, there was no readily available software that could perform multi-level negative binomial regression, a technique that can both properly model over-dispersed count data and account for the clustering of individual patient level data within practice settings. With the recent inclusion of multi-level negative binomial regression in statistical software packages, its use has grown in popularity.10

This paper provides an illustration of multi-level negative binomial regression which models over-dispersed health care count data (the number of referrals) and accounts for the clustering of patients within practices. The methodologic insights gained from this study have relevance to future studies on many research questions that utilise count data, both within primary care and broader health services research.