This issue pulls together a valuable assortment of ideas and observations from research teams working with a variety of measurement feedback systems (MFS) to guide clinical care, while confronting and studying both the human and technological implementation challenges. Their collective insights paint a complicated picture that documents both promises and challenges associated with MFS. In our own efforts to improve service delivery in children’s mental health systems over the past 15 years, we have both used and developed such systems, and have thus encountered both the good and the bad firsthand. In the context of these papers, we would like to offer some general insights for consideration as the field moves forward.

Many of these insights concern the basic relation between technology and decision making. Technology generally does what we tell it to do, and thus not surprisingly, early developments in health information technology (HIT) managed the most essential and best-understood processes in health care systems, including utilization, billing, and documentation. These were the questions the field had to ask in order to function: who was seen when for what reason, and how it was paid for? But more recently, there has been a focus on the more complex questions we want to ask to guide clinical care: What treatments approaches are we using? Are they helping? What should we do when they are not? In hindsight, many of these questions were naively ambitious on our part, which may explain why they were often met with such equivocal answers as “22 sessions of individual therapy” or “30 days of residential,” along with the associated billing codes.

Thanks to a growing vanguard of thinkers, answering the clinically interesting questions is getting closer to a common reality. But precisely because technology does what we tell it to do, this function of HIT will not fully mature until we have better articulated the complex information models that underlie it. We can think of these latter questions as belonging to the domain of clinical decision support, as shown in Fig. 1. Note that there are many strategies to support decision making that don’t involve HIT (e.g., supervisor recommendations, use of a treatment manual), and there are many functions of HIT that don’t involve supporting clinical decision making (e.g., documentation, service authorization, scheduling).

Fig. 1
figure 1

Measurement feedback systems in the context of health information technology and clinical decision support. MFS measurement feedback systems

At this intersection lies a tremendous set of possibilities and opportunities, within which MFS serves a specific function. As the figure implies, although we believe there is much to explore within the context of MFS development—which entails a human-technology interaction defined by a user receiving measurement information in the form of a report or an alert (e.g., Bickman 2008)—there is even more work to be done within the broader context defined by the intersection of technology and clinical decision making (Fig. 1, shaded region). In this regard, we prefer a metaphor of telecommunication (or an interactive workspace), which suggests collaboration, communication, reasoning, interaction, and even design. Such systems of course can feed back information, but they should also be able to feed forward information to guide action (e.g., setting expectations for what should happen next, exploring “what if’s”).

As Bickman et al. (2014) suggest in their “final coda,” although the technology may be important, it should operate in the service of decision support. In our metaphor, we know that a telephone conference call must not drop participants and must be free of background noise, but its participants will also benefit from having an agenda, speaking a common language, knowing who else is on the call, having their thoughts organized, and sharing similar goals for their meeting. Thus, better articulation of the work processes to be supported on the “decision side” of the figure is prerequisite to any technology that could ultimately better serve those processes.

There is another matter, not represented in Fig. 1, involving the human implementation challenges that arise once appropriate technologies are developed. For example, Gleacher et al. (2015) highlighted the key facilitating role of organizational leadership in achieving widespread use of the contextualized feedback system. Without such support, providers may mute the phone calls, or agencies may disconnect the phone service altogether, metaphorically speaking. This is but one of many examples that involve technology implementation issues rather than design issues (of course the two are inevitably connected in the real world; e.g., Higa-McMillan et al. 2011; Lyon et al. 2015; and in treatment contexts, design alone has been associated with implementation success; e.g., Palinkas et al. 2013, Southam-Gerow et al. 2014). Nevertheless, in the measurement feedback context, this issue’s authors provide considerable discussion of implementation, so we focus on a brief list of ideas relevant to clinical decision support and its broad intersection with technology, followed by a simple illustration. These ideas include considerations for sources of the evidence to be displayed, the value of exposing discrepancies between what has happened and what should or could happen, the types and configuration of evidence to be displayed (e.g., beyond progress rating alerts or plots) to allow users to impose a logic model on interpretation, the importance of automated translation of relevant ontologies (e.g., DSM diagnoses vs. elevated scales on a standardized symptom measure), and the creation of structures to facilitate communication and collaboration.

Multiple Evidence Bases

Daleiden and Chorpita (2005) outlined a model to coordinate and inform service delivery, which among other things described four core evidence bases relevant to decision-making: case-specific historical information, local aggregate evidence, general services research, causal mechanism research. These are outlined in Table 1, with reference to corresponding traditions or schools of thought as well as example questions addressed by each. In the current context of MFSs, there is a predominant emphasis on only one of these four sources of evidence—the case-specific history (but see Steinfeld et al. 2015, for examples of using local aggregate evidence in the form of departmental and system level reports). We feel much can be gained from coordinating a fuller set of relevant information from all four of the evidence bases, which gives us the ability to detect, consider, and act on knowledge that might otherwise remain out of view, lost in our decision-making “blind spots.”

Table 1 Four evidence bases relevant to clinical decision making (“Evidence-Based Services System Model”)

This task is not possible without a significant amount of “background complexity,” given that each evidence base can have multiple indicators, which can even disagree (e.g., two independent randomized trials with discrepant findings; improvement on one measure of depression with deterioration on another). The challenge of determining which knowledge is “best” is likely an impossible pursuit, but fortunately, perhaps not a necessary one. Rather, a sufficient knowledge management function may be for systems to prioritize “better” knowledge from a number of sources through a series of rules or knowledge filters (e.g., psychometric validation for case-specific measures; strength of evidence models for literature review; e.g., Chorpita et al. 2011). This is possible because MFSs inherently create a self-correcting context, in which the real validation of the knowledge used is accomplished by observing whether the desired outcomes were achieved. This notion reflects our fundamental belief that a legitimate function of these systems can be to provide multiple promising contextualized ideas to a decision maker, rather than merely to provide a “right answer” or single prescribed action. In other words, ultimately an agent must prioritize, act, and test results based on the information made available, and the technology should support that process rather than replace it.

Expected and Observed Values

In clinical care, expected values (e.g. Chorpita et al. Research Network on Youth Mental Health 2008) refer to information that informs our best guesses about what should happen. For instance, if one needed to select a treatment that might work for a given youth, one might consider research trials involving similar youth to inform that decision (as would be consistent with the evidence-based treatment paradigm); the treatment with the best research support is thus the expected value for the treatment to be delivered. Expected values can be contrasted with observed values, which represent information about what has happened or is happening now. Staying with this example, if the youth is receiving a treatment that differs from the one indicated by strong research support, then there is a discrepancy between the observed value and the expected value. Such discrepancies can motivate and guide action, perhaps in this case to consideration of a different treatment, and we expect that a core function of MFS is to assist with making these discrepancies known.

Several papers in this issue provide examples of using expected values. For example, Steinfeld et al. (2015) describe reporting related to expected measure completion at each encounter. Similarly, Bickman et al. (2014) illustrate alerts related to the expectation that feedback reports be viewed by practitioners. Similarly, the model-specific implementations (e.g., Bruns et al. 2015; Nadeem et al. 2015) communicate an expectation that particular service activities occur, simply by incorporating descriptions of those activities into the workflow and visual displays. Although there is evidence that observed-value-only feedback offers advantages over no feedback (e.g., Lambert et al. 2005), it is worth considering the decision support value afforded by contextualizing these with expected values.

Multiple Domains

In the same manner that MFS often prioritize the case-specific evidence base and observed values, to date many systems have also placed a heavy emphasis on the progress domain (e.g., visualizing symptom change over time). However, to facilitate technology’s relevance to clinical decision making, we think MFS platforms should help examine any type of information (not just progress) that fits within the larger decision model used to guide care. For instance, if one believes practice is related to progress, then organizing information about practices delivered over time and configuring that information to be synchronized with progress measurements might be of considerable value. More generally, any events (e.g., change of medication, change of placement, stakeholder participation, session no-shows, or end of school year) deemed relevant to progress interpretation or strategy selection may be useful for decision support display. This special issue provides two illustrations of strategies for extending beyond progress measurement. For example, Nadeem et al. (2015) show how to track practices delivered directly on the feedback reports. Using a different strategy, Bruns et al. describe integration of service activities into the workflow of the system itself.

Multiple Languages

In keeping with our telecommunication metaphor, in addition to a shared channel, senders and receivers must share a language and concept system to transfer knowledge. Accordingly, we see a need for translating across the diverse ontologies that are found in mental health research and services (e.g., Diagnostic and Statistical Manual, Research Domain Criteria, Standardized Instrument Scores, Evidence-Based Practices, Practice Elements, etc.; see Chorpita and Daleiden 2014). Constraints that require a single common language (e.g., a fixed set of measures; a single clinical model) are less likely to generalize to diverse contexts and to facilitate communication in the language of the local jurisdiction or system. The papers in this special issue clearly illustrate the underlying dilemma. Several of the systems describe a capacity to support multiple outcome measures (e.g., Bickman et al. 2014; Bruns et al. 2015; Nadeem et al., 2015), whereas Steinfeld et al. (2015) highlight some benefits of committing to a single measurement model even though their electronic medical record could potentially support many. On the practice metric side, Bruns et al. (2015) illustrate the construction and use of a model-specific system whereas Nadeem et al. (2015) illustrate a model-specific configuration of a generalized platform for progress and event (practice) representation. In our work with systems, we have found tremendous value in the ability to support diversity (of models, measures, display preferences, etc.) within a single platform, but think that diversity is best supported in the context of a strong default configuration that is designed to bias users initially toward “best practices,” while allowing extension and adaptation as user expertise develops. For this to happen, “translator” functions (e.g., is an elevated score on the Children’s Depression Inventory sufficiently similar to a DSM-III-R diagnosis of Major Depression to draw an inference for this youth?) as well as diverse ontological libraries (e.g., configurable lists of practice elements, evidence-based protocols, or other metrics to represent practice delivery) are a necessary infrastructure operation for MFS.

Collaboration

Another fundamental premise is that MFS should both foster and structure collaboration. Rather than serving as a substitute for human decision makers, we believe a key role of these systems is to organize and inform those involved in care. Collaboration can be an implicit feature, for example, by requiring treatment team members to select targets and measures as well as selecting benchmarks (i.e., choosing expected values from various evidence bases); or it can be a more explicit feature, for example, by contiguously displaying scores from multiple informants or practices delivered by different members of the treatment team, allowing a full view of team activities and perspectives. Dynamic configuration, such as being able to toggle elements on and off, extends this capability, allowing different views for different users (e.g., sharing progress and practice history with a family member). Bruns et al. (2015) describe features built into the workflow of the TMS-WrapLogic system that prompt the type of collaboration that is central to the wraparound service model, and Lyon et al. (2015) found in their contextual assessment that communication, both internal and external, were key functions of service providers.

An Illustration

Nathan is a 17 year old Asian American male receiving treatment for depression. Figure 2 shows parent and self-reported depression scores on a depression scale over time in days (plots a and b). In terms of the concepts above, this panel represents the case-specific evidence base, using observed values, in a single domain (progress), in a single language (T-scores on a standardized measure).

Fig. 2
figure 2

A progress panel with observed values for depression scores over time. RCADS revised child anxiety and depression scale, RCADS-P revised child anxiety and depression scale-parent version

In Fig. 3, we enrich the display in a number of ways. First, the progress panel now displays two additional plots (c and d), corresponding to expected values. Both can therefore be thought of as scores representing goal states. When selecting expected values, it helps to consider all four evidence bases outlined in Table 1, keeping in mind that one or more expected value could be derived from each evidence base. For example, a case-specific expected value might be a discharge score from a previous successful treatment for Nathan. In this example, plot c is derived from the local aggregate evidence base and represents the average post-treatment score for youth in the system who had received treatment for depression. Plot d is taken from the general services research evidence base, and represents pre-post scores from a randomized clinical trial for depression that included youth of the same age and ethnicity as Nathan. Of note is that simply adding expected values for treatment progress can have a dramatic effect on interpretation of progress by creating context. That is, it is quite possible to infer satisfactory progress when examining Fig. 1, but less possible when inspecting the top panel of Fig. 2, given the discrepancies between observed and expected values.

Fig. 3
figure 3

A progress and practice panel showing observed and expected values over time. CG indicates a caregiver directed practice, RCADS revised child anxiety and depression scale, RCADS-P revised child anxiety and depression scale-parent version, Psychoed psychoeducation. The shaded region of the practice panel indicates a hypothetical illustration of practices supported by relevant research trials (i.e., expected values for practice)

This discrepancy may effectively indicate when to act, but without additional information, it may say less about how to act. Given a basic logic model that practices affect outcomes, inclusion of a practice panel can help in this regard. The bottom of Fig. 2 uses white circles plotted on the same time axis as the progress ratings to indicate that Nathan has had 15 separate treatment sessions involving 9 different clinical procedures. This panel is one place to explore to determine why he has lagged the expected rate of progress. Once again, expected values can help, and plotting practices coded from evidence-based treatments (which is in essence, a translation exercise, as noted above) indicates that the 12 practices in the “focus” region of the practice panel are part of an evidence-based protocol for depression in adolescents. Comparing these expected practices with those observed, we find that three sessions involved “off focus” practices, targeting anxiety (possible errors of commission), and only one of five caregiver-directed practices has been delivered (communication skills), occurring 150 days into treatment (possible errors of omission). One other notable observation is the increasing latency between sessions starting at about day 60.

A treatment team may thus hypothesize that caregiver engagement may be an issue, and begin deeper inquiry, which could include adding additional measures to the progress panel (e.g., a caregiver assessment of barriers to treatment or treatment expectancy) or otherwise enhance caregiver services. The system remains interactive, collaborative, and exploratory. Once the system may have indicated how to act, it should ideally support the treatment team in taking the next steps, including referencing the provider analogue of the case-specific history: does this provider have experience and expertise with promoting caregiver engagement? If not, are there written materials or learning resources that could be launched from the display to facilitate action?

Of course, this is but one example, restricted to two domain panels, two of the evidence bases, often with only a single expected value from each. The possibilities, however, can be as varied as conversations, consistent with our telecommunication metaphor. When everything aligns neatly, these conversations can be swift and clear, indicating next steps, but even when information does not align (e.g., the research literature indicates one set of expected values for practice, whereas an expert supervisor recommends a different set of practices), the communication and collaboration is biased toward investigating and resolving the discrepancy prospectively using evidence. More complex examples and illustrations are available elsewhere in the literature, specifically regarding multiple provider teams (e.g., Bruns et al. 2014); observed and expected values and model integrity (Chorpita et al. 2008; Regan et al. 2013), coordination of multiple evidence bases (e.g. Daleiden et al. 2005), and complex collaboration structures (Chorpita and Daleiden 2014).

Conclusion

This group of authors is to be commended for their efforts to implement and evaluate MFS in a variety of real-world contexts. The promises of such systems are clear, and the design challenges, although considerable, can and will be resolved. For technology to serve our will, however, our field must continue to wrestle with models for how we wish to select and organize health information. Thus, we disagree with the notion that this burden lies with HIT developers. There is, for the moment, a significant underspecification of the general information and decision models needed to dictate the functional requirements of promising new technologies. Technology will do what we tell it to, and thus, the burden, for now, lies primarily with mental health experts.

That said, if we do our jobs right, the complexity of these fully articulated models may soon become a constraint. Our theories of psychopathology have moved to multifactorial risk and protective factor models, and our intervention research has identified a multitude of interventions that, although effective when measured at the group level, involve uncertainty at the case level. To help manage that uncertainty, the current technologies struggle tremendously and with only modest success at displaying a “human readable” form of a “progress only” model, much less a basic practice-yields-progress logic model. Although we encourage practice-progress type models for current applications to help those using the technology of today, we await the truly disruptive technology needed to support the highly elaborated models necessary to help humans contextually detect and respond to abstract phenomena (e.g., behavior, interactions) in the service of their personal pursuits and in their duty to help others.