METHODS
This study was conducted at a Federally Qualified Health Center (FQHC) in New York City. The FQHC uses an EpicCare EHR system (Epic Systems, Verona, WI) with an associated patient portal called MyChart that allows patients to exchange electronic messages with their health care providers. A retrospective cross-sectional study was performed using data recorded in the FQHC’s EHR system between 1 July 2011 and 6 June 2012.
Initial analyses included 42,317 patients who resided in New York City and who had at least one clinical visit at the FQHC within the defined time period. The data set is restricted to patients age 10 or older since that is the minimum age at which one can sign up for MyChart at this FQHC. A patient was deemed to be a New York City resident if their recorded ZIP code was in a New York City neighbourhood as defined by the New York State Department of Health.18 From this population, we defined a subpopulation of 7653 (18.08% of the sample) patients who had either activated a MyChart account before 1 January 2012 or deactivated their account after that date, meaning that they had an active account for six contiguous months during the period. These patients were considered to be ‘MyChart users’. The outcome of interest was the number of messages a patient sent within the defined time frame.
We considered a number of demographic, socioeconomic and clinical variables that may predict messaging usage, based on previous research of patient portal usage.6,16 From the EHR we extracted age, gender, race and ethnicity, language preference, insurance status, number of office visits during the year, ZIP code and clinical diagnoses for the following chronic conditions: diabetes, hypertension, hyperlipidaemia, depression, congestive heart failure, asthma, drug abuse, alcoholism and HIV. Age was split into categories based on generation, which has previously been used to characterise Internet usage.9,19,20 People were categorised into the G.I. Generation (born before 1937), Silent Generation (born 1937–1945), Older Boomers (1946–1954), Younger Boomers (1955–1964), Generation X (1965–1976), Millennials (1977–1992) and the youngest generation (born after 1992).19 Race and ethnicity were routinely collected in the EHR using the categories introduced in the 2000 US Census and were analysed separately and in aggregate race/ethnicity categorisations. Insurance status was categorized as ‘private’, ‘public’ or ‘uninsured’. The number of office visits was categorized into tertiles. The diagnoses listed above have been investigated in previous studies of messaging usage5,6 and were investigated separately and in aggregate counts of the total number of diagnoses a patient had.
We used data in the five-year estimates from the 2011 American Community Survey (ACS) to classify patients based on socioeconomic factors.21 Neighbourhood data that were collected by the ACS at the ZIP code tabulation area (ZCTA) level as defined by the US Census Bureau were mapped to patients using a ZCTA to ZIP code crosswalk provided by the UDS Mapper.22 We considered median household income, percentage of people below the poverty level, percentage of people who are high school graduates, percentage of people who have bachelor’s degrees, and the percentage of people who are unemployed.
For each provider at the FQHC, we collected whether or not they sent any messages over the year period and the percentage of patients they saw over the year who had MyChart accounts (the ‘MyChart patient ratio’), which was categorized into tertiles, as measures of a provider’s affinity for using MyChart. These provider characteristics were attributed to patients based on which provider the patient saw most often throughout the year period (the ‘most-seen provider’).
Initial analyses used Pearson χ2 tests to compare demographic characteristics between people who did and did not use MyChart. Subsequent analyses focused on the subgroup of MyChart users and investigated the number of messages sent by these patients. Of this 7653 patient group, 2031 (26.54%) were missing values for one or more of the predictor variables and were thus excluded from analysis, leaving 5622 (73.46%) patients in the final analysis. The flow of patients through the study can be observed in Figure 1.
Due to over-dispersion in the count data for the number of messages sent and the preponderance of zero counts, a zero-inflated negative binomial (ZINB) model was fit to determine the patient characteristics that were associated with sending messages. ZINB models assume that observed zero counts could be attributed to random chance or to a structural reason, such as not having access to a computer, that arises from the nature of the data.23,24 The ZINB model contains two separate models: a logistic model investigating the odds of having excess zeroes beyond random chance and a negative binomial model to model the messaging counts of patients who send messages. The model produces an odds ratio (OR) for sending zero messages, as well as an incident rate ratio (IRR) for the messaging count.
Bivariate regression analyses were initially performed to determine which covariates were significantly associated (p < 0.05) with sending messages. To avoid collinearity, Spearman correlations were calculated between all possible pairs of covariates, and if covariates were correlated (ρ > 0.4), then the variable with the smallest effect size was dropped from modelling. Pearson χ2 tests were used to compare messaging count values across levels of covariates, and categories were collapsed when differences between levels were not statistically significant. All significant covariates were put into the final full models. Likelihood ratio tests and comparisons of Akaike information criteria and Bayesian information criteria values were used to compare the full model to nested models to determine the best fit model. With the given sample size, we can detect an effect size of 0.8% with a significance level of 1% and 99% power. All statistical analyses were performed with SAS 9.3.