Introduction
While apps can have the potential to give great benefits, they also have the potential to cause physical, mental, reputational or financial harm to patients, healthcare professionals and their organisations if they are not evaluated for clinical safety. For example, an app may miscalculate a drug dose or give incorrect medical advice to a consumer or patient. National Health Service (NHS) digital highlights that apps to be used by the NHS cannot be endorsed unless they have been evaluated for potential harm,1 while Public Health England’s (PHE’s) health app assessment process requires developers to outline plans and policies to limit and mitigate potential risks associated with their apps. This paper is concerned with web and smartphone apps designed to offer treatment or support with the common mental health problems of depression, anxiety and stress—collectively herein termed ‘e-therapies’, which are used or recommended by the NHS in England. It is important to note that the landscape of e-therapies shifts rapidly, and indeed has done so since the data presented here were collected 4 years ago.
Regulatory approval provides patients and healthcare professionals with the assurance that an app is of high quality, safe and ethical.1 There are two types of regulation presently available in the UK: the Medicines and Healthcare products Regulatory Agency (MHRA) Medical Device Registration and Care Quality Commission (CQC) Registration. Both are relevant to e-therapies. MHRA provides a ‘device determination’ flowchart that enables developers to check whether their app is defined as a medical device. The two main questions in determining this are whether the app has a medical purpose and whether it works directly with data obtained in vivo. At the time of writing, developers of apps that meet these criteria and who want to market them to the public are required by UK regulation to register the app with the MHRA and to obtain for it a Conformité Européenne (CE) marking, indicating conformity with health, safety and environmental protection standards for products sold within the European Economic Area.2
CQC set out 14 regulated activities: personal care; accommodation for people who require nursing or personal care; accommodation for people who require treatment for substance misuse; treatment of disease, disorder or injury; assessment or medical treatment for persons detained under the Mental Health Act 1983; surgical procedures; diagnostic and screening procedures; management of supply of blood and blood-derived products; transport services; triage and medical advice provided remotely; maternity and midwifery services; termination of pregnancies; services in slimming clinics; nursing care and family planning services. If an app provides a health or social care service that fits one of these activities the developers are required by the PHE to register with the CQC before the app can be accessed via the PHE app assessment process.
It is essential for the public to know whether an e-therapy is effective. Some have argued that many apps have no evidence to support their effectiveness,3 but deciding what constitutes ‘evidence’ for apps is not straightforward. Within healthcare research, there is a hierarchical structure depicting the strength of evidence.4 The higher the level, the greater the internal validity and hence the more persuasive and trustworthy the evidence is. The randomised controlled trial (RCT) is currently the gold standard for providing evidence of clinical efficacy.4 However, RCTs take time to design, implement and publish and thus are poorly matched to the pace at which technologies and tools are evolving. This means that there is presently no clear consensus on how best to evaluate apps, although policymaker and researcher efforts are being directed to the issue, as outlined below.
An European Union Working Group was set up in February 2016 to create mHealth assessment guidelines but unfortunately failed to reach a conclusion.5 A report by the group highlighted that building the guidelines had been found to be a much more complex exercise than initially expected at the beginning of the process, and the work required went far beyond the original mandate of the group.6
Separately to this, a toolkit for appraising e-therapies was developed and released by MindTech in October 2017. The toolkit offers a standard set of criteria for evaluating existing digital mental health tools (apps and mobile websites) and a final report discussing the framework was published.7 Other examples of app assessment methods that have been developed in recent years include the Mobile App Rating Scale (MARS), developed by an Australian research team in 2015.8 MARS is a scale that aims to provide researchers, clinicians and developers with a way to score digital tools based on a list of evaluation criteria.8 Item 19 of the scale regarding clinical evidence was ignored due to researchers having yet to test the impact of the mental health apps included in the study.8 Similarly, the British Standards Institution in conjunction with Innovate UK has developed the PAS 277:20 159 code of practice. The PAS recommends during the preliminary stages of app development that developers read academic research to ensure that their app is built on clinical evidence. It also recommends that app publishers/developers should collect data during testing with users to validate any clinical benefits that the app’s intended use delivers9; such an exercise would likely require the require involvement of academic researchers to ensure the evidence was of a sufficiently high standard. As well as helping to design new tools, the code can be used to evaluate existing ones.9
In October 2017, PHE released a health app assessment process developed to encourage the creation of effective health apps and to enable health professionals to consider health apps for use in General Practice.10 The process covered eight different areas: evidence of effectiveness; regulatory approval; clinical safety; privacy and confidentiality; security; usability and accessibility; interoperability; and finally, technical stability.
More recently, NHS digital have introduced a Beta Digital Assessment Questionnaire (DAQ) 1.2 for the assessment of mobile apps11 and the National Institute for Health and Care Excellence (NICE), along with their partner organisations have published a set of evidence standards for digital health technologies which includes apps.12 The evidence standards have been developed to ensure new technologies are clinically effective and represent value for money to the NHS, while also aiming to make it easier for innovators and commissioners to understand what good levels of effectiveness for digital technologies should look like. NHS Digital is working closely with NICE to incorporate these standards into future versions of the DAQ.11
Current study
It is apparent that the landscape for assessing apps is complex and ever-changing. The aim of the present study was to examine the quality of apps in use by the NHS by examining the manner in which they have been developed. At the time of this study, the majority of existing app review methods either focused on the technical rules and regulations of app design and overlooked (often by necessity, MARS8) the question of effectiveness, that is, whether the actual app does what it says and meets the claims its developers make for it. While this has been rectified in the DAQ,11 the current research precedes this. In an ideal world, before being released for general use, every app would have undergone rigorous user trials that demonstrated its effectiveness. However, rigorous trials are a costly and challenging business, and current models of app development and publishing seem to encourage less rigorous approaches. Here, in an attempt to gauge the quality of existing apps as providers of therapy without performing user trials on each and every one, we have adopted the approach of probing more deeply into the processes employed in their development. More specifically, we are interested in the psychological model, theories, or therapies used, the extent of clinical and academic involvement, and any published (or otherwise) evidence in support of each app. The developers of each of the apps identified in our previous study13 were contacted and asked to provide the relevant details. It is worth noting that the quality assessment frameworks detailed above did not precede the development of many of the apps reviewed here, so developers were likely operating in a quality assessment vacuum at the time of creating their products.
Important indicators of quality
The purpose of this study was to evaluate the apps that were previous identified as being used or recommended by the NHS.13 We were specifically interested in the following four indicators of quality: clinician involvement, academic involvement, research or other evidence and use of a specific psychological approach or theory. These indicators were selected because they build on the premises that effective digital psychotherapy interventions come about as a result of rigorous theoretical and empirical works by experienced clinicians and academics, utilising a known psychological approach. We discuss the advantages of each below.
Clinician involvement
Healthcare staff routinely use apps to perform their roles.14 This makes it essential that the information given in these apps be grounded in the best and most up-to-date knowledge, derived from research, clinical experience and patient preference.1 Unfortunately, many app stores do not carry out rigorous reviews regarding the accuracy of app content before publication, meaning some apps potentially have inaccurate information.15 Other publications have highlighted that when assessing digital mental health apps, it is important to assess whether clinicians have been involved in the development process.16 17 This is because clinician involvement can help to ensure that any established modes of treatment are appropriately deployed within the app. For instance, an app based on cognitive behaviour therapy (CBT) but made by someone who is not qualified to deliver CBT may fail to give an accurate implementation. The involvement of a clinician who specialises in CBT would improve the quality of app content by ensuring treatment fidelity.
Academic involvement
Academic involvement in the process of developing an app can help to ensure the implementation of empirically supported interventions and principles, providing a foundation for an app’s use in clinical practice. Responsible academics strive to bring neutrality and remove bias, to expose the app to peer review and publish evidence of an app’s feasibility, acceptability and clinical effectiveness.
As mentioned previously it is essential that an app can show evidence of its effectiveness. In PHE’s app assessment process, developers must provide evidence that their app improves outcomes for patients and users; provides value for money; meets user needs and is stable and simple to use, and that people use it. Independent research is weighted highly in the assessment criteria, and apps that have a high level of clinical evidence are considered by NICE for ‘NICE evaluated’ status. This status is considered to represent the gold standard for NHS health apps. In addition to this, all apps are required to show that they meet the criterion set out by NHS Digital covering: clarity of purpose and intended use; their evidence basis; the data that forms the basis their evidence and findings; any published academic studies.1 The involvement of academics in the development of an app can be helpful in ensuring that data are collected in a manner that makes it possible to evaluate effectiveness, although it is also important that evaluation is conducted by researchers independent of the app, without a personal interest in the results.
Research evidence/other evidence
While RCTs are the gold standard, it is not expected that all apps will have published research evidence at the time of writing in part due to the rapid pace of change and the unwieldy nature of conducting RCTs. However, there might be other forms of evidence that can indicate whether an app may be beneficial to a patient. This evidence may take different forms such as practice-based evidence methodologies (eg, detailed case series) that assess the acceptability, feasibility and initial effectiveness of an app and may also include early pilot trials.
Specific psychological approach or theory/set of techniques/therapy
Apps claiming to help with mental health problems such as depression, anxiety or stress would be expected to use established approaches to treatment that have been found to be effective through high quality research studies. The psychological therapies that are designated by NICE regardless of disorder are all underpinned by a clinical theory. The risk of not having an organising theoretical framework for an app is that the change techniques that are used may be cherry-picked by developers on the basis of inappropriate criteria (eg, selecting techniques that can be gamified easily rather than those that are most effective) and so lack theoretical coherence and consistency.
Using indicators of quality to evaluate apps
If we accept the premise that effective psychotherapeutic interventions only come about as the result of rigorous theoretical and empirical work by experienced clinicians and academics, it follows that apps need clinician and academic involvement, psychological theory and research/other data to support their effectiveness. We have previously collated a list of NHS endorsed e-therapies (meaning therapeutic apps (both phone and web) that are used or recommended in NHS settings in England)13 designed to target stress, depression, or anxiety. In the current study, we evaluate these NHS e-therapies for compliance with the indicators of quality described above. To do so, we surveyed the developers of all the apps identified in our previous study13 regarding the key indicators of quality described, and whether there were any differences between web and phone apps.