Review

Definitions of digital biomarkers: a systematic mapping of the biomedical literature

Abstract

Background Technological devices such as smartphones, wearables and virtual assistants enable health data collection, serving as digital alternatives to conventional biomarkers. We aimed to provide a systematic overview of emerging literature on ‘digital biomarkers,’ covering definitions, features and citations in biomedical research.

Methods We analysed all articles in PubMed that used ‘digital biomarker(s)’ in title or abstract, considering any study involving humans and any review, editorial, perspective or opinion-based articles up to 8 March 2023. We systematically extracted characteristics of publications and research studies, and any definitions and features of ‘digital biomarkers’ mentioned. We described the most influential literature on digital biomarkers and their definitions using thematic categorisations of definitions considering the Food and Drug Administration Biomarkers, EndpointS and other Tools framework (ie, data type, data collection method, purpose of biomarker), analysing structural similarity of definitions by performing text and citation analyses.

Results We identified 415 articles using ‘digital biomarker’ between 2014 and 2023 (median 2021). The majority (283 articles; 68%) were primary research. Notably, 287 articles (69%) did not provide a definition of digital biomarkers. Among the 128 articles with definitions, there were 127 different ones. Of these, 78 considered data collection, 56 data type, 50 purpose and 23 included all three components. Those 128 articles with a definition had a median of 6 citations, with the top 10 each presenting distinct definitions.

Conclusions The definitions of digital biomarkers vary significantly, indicating a lack of consensus in this emerging field. Our overview highlights key defining characteristics, which could guide the development of a more harmonised accepted definition.

Introduction

Biomarkers are defined as a set of characteristics that are objectively measured and used as indicators of normal biological processes, pathogenic processes or biological responses that appear due to exposure or therapeutic interventions.1 This comprises physiological, molecular, histologic and radiographic measurements.2 The US Food and Drug Administration (FDA) subclassifies susceptible/risk, diagnostic, monitoring, prognostic, predictive, response and safety biomarkers.1 They highlight that a full biomarker description must include the source or matrix, the measurable characteristic(s) and the methods used to measure the biomarker.1 The digitalisation of our world impacting daily living and healthcare broadens the spectrum of the possible source and methods used to measure biomarkers and introduces a novel dimension of measurable characteristics. This allows digital devices used daily, such as smartphones, wearable devices, sensors and smart home devices, to provide a new category of biomarkers, often called ‘digital biomarkers’. In recent years, digital biomarkers became increasingly present in routine care and in research in many areas of medicine, such as cardiology, oncology or COVID-19. For example, smartphone recorded cough sounds have been used as a digital biomarker to detect asthma and respiratory infections in clinical trials,3 4 or deep learning was applied to data from a three-axis accelerometer to predict sleep/wake patterns.4 5 Moreover, such digital biomarkers have spread in the field of neurology, which has a large unmet need for non-invasive and objective biomarkers reflecting cognitive and motor functions that are traditionally assessed with specific tests performed by neurologists.6 Beyond monitoring health and disease status, predicting the occurrence and development of diseases would be promising applications of such novel approaches.7

Thus, digital biomarkers have the potential to offer valuable insights on the health of patients. They usually have high temporal resolution (up to (quasi-)continuous), are usually objective (and not subject to interobserver variability) and can have high external validity as they may be applied in the patient’s routine environment (as opposed to, eg, the clinic or a research environment).8

Many everyday digital tools used mainly for entertainment/leisure purposes (eg, fitness trackers) are increasingly considered as a source of helpful information that may be transformed into digital biomarkers. Yet, with all this diversity in application and complex interaction with rapidly evolving technology, it becomes necessary to provide a clear and precise definition of the fundamental underlying concepts to facilitate research and decision-making with and on these novel approaches.

One of the first definitions of this novel type of biomarker was provided by Dorsey et al, who defined digital biomarkers as ‘the use of a biosensor to collect objective data on a biological (eg, blood glucose, serum sodium), anatomical (eg, mole size) or physiological (eg, heart rate, blood pressure) parameter obtained using sensors followed by algorithms to transform these data into interpretable outcome measures, helping to address many of the shortcomings in current measures.’ Furthermore, they stated that these new measures ‘include portable (eg, smartphones), wearable, and implantable devices, and are by their nature largely independent of raters.’9 A later definition given in 2020 by the European Medicines Agency (EMA) was based on ‘digital measures’ (‘measured through digital tools’) and did not include the requirement of algorithms as a defining feature: ‘a digital biomarker is an objective, quantifiable measure of physiology and/or behaviour used as an indicator of biological, pathological process or response to an exposure or an intervention that is derived from a digital measure. […]’)10

Others gave broader definitions including further defining features, for example, defining digital biomarkers as ‘objective, quantifiable, quantitative, physiological and behavioural data that are collected and measured by means of digital devices such as portables, wearables, implantables or digestibles. The data collected are used to explain, influence and/or predict health-related outcomes’.2 6 11

Overall, such a disagreement between definitions used by regulators and in articles published in high-impact biomedical journals raised concerns that no clear consensus exists among researchers and users of this novel approach and terminology, increasing the risk for miscommunication. There are numerous examples where differences in definitions have been recognised as critical cause of inefficiencies and delay in health research and avoidable controversy, uncertainty and potential harm in clinical care and public health.12–15 The Biomarkers, EndpointS and other Tools (BEST) framework developed by the FDA and US National Institutes of Health with ‘the goals of improving communication, aligning expectations, and improving scientific understanding’ highlights that ‘unclear definitions and inconsistent use of key terms can hinder the evaluation and interpretation of scientific evidence and may pose significant obstacles to medical product development programmes’.1We aimed to provide a systematic overview of the emerging literature on digital biomarkers and characterisation of the definitions of digital biomarkers that are provided in biomedical journal articles by performing a systematic mapping and citation analysis of all articles that prominently used the term ‘digital biomarker’. We sought to determine differences in characteristics of common definitions to provide a foundation for subsequent activities to develop clearer and consistent definitions that ensure improved application of digital biomarkers in research and healthcare decision-making.

Methods

Design

We analysed all articles published at any time in PubMed that prominently used the term ‘digital biomarker’, that is, either in title or abstract.

We systematically explored definitions of digital biomarkers that are provided and/or referred to in the biomedical literature, that is, journal articles that are indexed in PubMed, in a mapping review without a formal assessment of included studies.16 We structured our review report to the ‘Preferred Reporting Items for Systematic Reviews and Meta-Analyses’ guidance, where applicable.17 We did not use a prespecified protocol.

Eligibility criteria, information source and search strategy

We searched PubMed and included all articles mentioning ‘digital biomarker’ or ‘digital biomarkers’ in their title or abstract (by searching PubMed for ‘digital biomarker*(tiab)’; date of last search: 8 March 2023). We excluded animal research.

Study selection

One reviewer (AKMA) screened titles, abstracts and full texts for eligibility. Confirmation by a second reviewer (JH or LGH) was planned for situations where the reviewer was unsure, but this case never occurred given the clear and objective selection criteria.

Data extraction

We developed a spreadsheet to structure the data extraction process. One reviewer (AKMA) extracted data with confirmation by a second reviewer (JH or LGH) in case of any uncertainty.

We extracted from every article: author(s), publication year, title, journal, corresponding author, and country of correspondence, article type (ie, primary research, review or other type (eg, editorial, comment, opinion-based letter)). Of primary research articles, we additionally extracted definitions of digital biomarkers that are provided and/or referred to (based on a semantic search for indicators of definition such as ‘digital biomarkers are’, ‘… are defined as’, ‘… can be defined’, ‘the definition of … is’), medical context, and whether the article is about the development and/or validation of a digital biomarker. The number of global citations was obtained by using metadata from OpenAlex18; accessed via the Local Citation Network19 (as of 26 June 2023).

Data analysis and categorisation of definition components

We considered the BEST framework to derive components of definitions for digital biomarkers.1 We analysed the identified digital biomarker definitions by assessing if they contained descriptions that fall within three key components, that is, the (1) type of data that is measured (eg, whether data were measured objectively, continuously or quantitatively), (2) data collection method (eg, whether sensors, computers, portables, wearables, implantables or digestibles were used to collect data) and (3) purpose of the digital biomarker (eg, whether a biomarker was used as measure of disease progression or to predict health-related outcomes). We defined definitions as duplicates when they used the same sequence of words. We illustrate the frequency of various terminologies used in all provided definitions with a word cloud.20 We analysed the structural similarity of definitions that were provided without a reference by performing hierarchical clustering on the distance-matrix containing pairwise ‘Indel’-distances, that is, ‘the minimum number of insertions and deletions required to change one (definition) into the other’.21 Since we aimed at exploring how digital biomarkers are defined in the biomedical literature, we did not critically assess the included articles and studies. For the analysis of citations, we calculated the quotient of number of global citations (retrieved by the Local Citation Network19) and years since publication per article. To create a citation network of citing and cited relationships between the articles, we used the Local Citation Network with the OpenAlex scholarly index.19 22

We used descriptive statistics by reporting numbers and percentages. For all analyses, we used R (V.4.2.2) or Python (V.3.11.4).

Results

We identified 415 articles that had ‘digital biomarker’ in their title or abstract (online supplemental S1). The first article was published in 2014 (median publication year 2021; figure 1; online supplemental S2). Most articles described primary studies (n=283; 68%) and were published in digital medicine specialty journals, including Digital Biomarkers (n=35; 8%), Journal of Medical Internet Research (n=21; 5%) or npj Digital Medicine (n=19; 4%; table 1). Of the 415 articles, 128 (31%) provided at least 1 definition of a digital biomarker.

Figure 1
Figure 1

The annual number of published article types referring to digital biomarkers as of 8 March 2023 (n=415).

Table 1
|
Characteristics of all 415 articles in PubMed using ‘digital biomarker’ in title or abstract

Characteristics of articles providing a definition of digital biomarker

The 128 articles with a definition of digital biomarker were published between 2015 and 2023 (median: 2021). Of them, 59 articles were primary studies, 50 were reviews and 19 were other types of articles (table 1).

Almost all primary studies described the development of one or more digital biomarkers (53 of 59 articles), and many described a validation process of biomarkers (35 of 59 articles). The most frequent medical field of the primary research articles that described the development of one or more digital biomarkers was neurology (25 of 53), while the spectrum of medical fields was overall very wide (table 1). The most frequent diseases were dementia and related disorders (16 of 53 articles, ie, (mild) cognitive impairment or Alzheimer’s disease), Parkinson’s disease (5 of 53 articles) and diabetes (3 of 53 articles), with numerous other conditions addressed in one or two studies (eg, atrial fibrillation, cervical cancer, depression, heart failure and muscular dystrophy; online supplemental S2).

The corresponding authors were mostly from the USA (69 of 128 articles), Switzerland (22 of 128 articles), Germany (16 of 128 articles) and the UK (16 of 128 articles; table 1).

The articles were cited a median of 6 times (range 0–517, IQR 2–20, overall 2,705); on average two times per year (range 0–86, IQR 1–5; online supplemental S2). We show the citation network (ie, citing and cited relationships within the sample of these 128 articles) online (https://LocalCitationNetwork.github.io/?fromJSON=Digital-Biomarker-Definitions.json).

Definitions of digital biomarkers

Overall, 128 articles reported between 1 and 7 definitions (median 1, IQR 1–2). In 91 articles, at least 1 reference was provided for these definitions made by the authors (median 1, range 1–13, IQR 1–2, overall 274 references); for 37 articles with 51 definitions, no reference was provided (online supplemental S2).

The mostly used references to support the definitions were Coravos et al4 (referenced by 51 of 91 articles); Dorsey et al9 (11 articles); Califf23 (9 articles); Piau et al24 (9 articles); Babrak et al6 (8 articles) and Coravos et al25 (8 articles). All these articles were among the 415 articles analysed here. The original definitions in these top-cited articles can be found in table 2. Other references were used by less than five articles.

Table 2
|
The top cited definitions of Digital Biomarkers within the 415 articles

In total, the 128 articles reported 202 definitions; 75 of which were duplicates. Hence, we identified 127 unique definitions across the 128 articles.

The 10 most frequently used terms that most of the 127 unique definitions contained were ‘digital’ (125 of 127 definitions; 98%), ‘biomarkers’ (109 of 127 definitions; 85%), ‘data’ (62 of 127 definitions; 48%), ‘collected’ (55 of 127 definitions; 43%), ‘devices’ (50 of 127 definitions; 39%), ‘health’ (42 of 127 definitions; 33%), ‘physiological’ (37 of 127 definitions; 29%), ‘objective’ (37 of 127 definitions; 29%), ‘wearable’ (34 of 127 definitions; 26%) and ‘behavioural’ (33 from 127 definitions; 25%; figure 2).

Figure 2
Figure 2

Word cloud with the most frequently used terms in the analysed digital biomarker(s) definitions.

Of the 127 unique definitions, 56 definitions refer to the type of data that are collected, 78 definitions contain information on the data collection method, and 50 definitions provide information on the purpose of the digital biomarker. Only 23 of 127 definitions involve all 3 components and 26 contain none of these components (table 3; online supplemental S3; online supplemental S2).

Table 3
|
Definitions of digital biomarkers that include three key components: type of data, data collection method and purpose of a digital biomarker (n=23)

There were almost no structural similarities between the 51 identified definitions in 37 articles without a reference (for those with a reference, similarities such as paraphrasing are expected; online supplemental S4).

Discussion

We systematically searched and characterised the biomedical literature that used the term digital biomarker and analysed the provided definitions of the concept. We identified 415 articles using ‘digital biomarker’ in title and/or abstract that were published between 2014 and 2023. Of them, 128 articles provided 127 different definitions. By comparing the defining features, we aimed to better understand what those who use this term in the context of biomedical research or healthcare mean by ‘digital biomarker’ and which components are deemed the essence of it.26

The first definition of a digital biomarker is from 2015.27 Within 8 years, more than 127 definitions have been used, with none of them clearly being the most widely used; indicating a high heterogeneity of the concept of digital biomarkers. The definitions often cover different aspects of definitional components that are traditionally used to describe more conventional biomarkers. Authors have created their own concepts and gave an identity to this type of biomarker. The variation in these definitions and the fact that only 23 of them provide a full description containing all components of FDA’s BEST framework, shows how broad the current understanding of this fundamental concept is.

Digital biomarkers emerged as a concept in medical and technological domains, although with a diverse terminology across different academic journals. In the medical field, digital biomarkers are often referred to as biomarkers of health or disease obtained through digital health technologies. In the technical field, these biomarkers are viewed as data-driven indicators collected from sensors, wearables and other portable digital technologies that provide an assessment of the health status. These diverse terminologies and definitions reflect the interdisciplinary nature of digital biomarkers with their application in a broad spectrum of biomedicine which underlines the importance of unified concepts to enhance the communications and cross-disciplinary collaborations on this evolving field.

Regulatory perspectives

The EMA has defined digital biomarkers in 2020 in their draft guidance ‘Questions and answers: Qualification of digital technology-based methodologies to support approval of medicinal products’, stating their ‘clinical meaning is established by a reliable relationship to an existing, validated endpoint’.10 EMA draws a clear line to electronic clinical outcome assessments (eCOA), whose ‘clinical meaning is established de novo’. According to EMA’s terminology, both digital biomarkers and eCOA are derived from ‘digital measures’ and can be used as ‘digital endpoints’.10

On the other hand, the term ‘digital biomarker’ cannot be found in the FDA draft guidance ‘Digital Health Technologies for Remote Data Acquisition in Clinical Investigations’, which instead features eCOA as examples of digital health technologies.28 Figure 3 contains our semantic interpretation of the terminology used by EMA and FDA.

Figure 3
Figure 3

Semantic overview of terminology used by EMA and FDA. Digital health technologies obtain digital measures, which include digital biomarkers and electronic clinical outcome assessment (eCOA). Digital biomarkers and eCOAs both can provide digital endpoints. EMA, European Medicines Agency; FDA, Food and Drug Administration.

This distinction can rarely be observed in the medical literature—we found this term in 8 of the 415 articles analysed and a PubMed search for ‘electronic clinical outcome assessment*’ returned also only 8 articles mentioning it in title or abstract (as of 31 August 2023), compared with the 415 for our search term ‘digital biomarker*’. As Vasudevan et al stated in 2022: ‘There are currently multiple definitions of the term digital biomarker reported in the scientific literature, and some seem to conflate established definitions of a biomarker and a clinical outcomes assessment (COA)’.11

This divergency in the terminology of digital biomarkers between the academic literature and the regulators’ language raises challenges and ambiguity. Consequently, a more cohesive and comprehensive framework within the digital biomarker field is needed to strengthen the clarity and continue growing the potential that this data could bring for health.

The development of a substantive and unified definition of digital biomarkers would be an important step in shaping a conceptual framework for the development, assessment and reporting of digital biomarkers. Our results may inform this process by using the existing understanding of digital biomarkers systematically analysed in this study as a basis. To achieve a common and more unified understanding of what digital biomarkers are—and are not—a Delphi study could be useful.29 30 Such a study would aim to combine multiple views and expectations on the existing definitions of digital biomarkers and their components until a consensus is reached. Ideally, that would be achieved by an international panel with expert’s representative of all relevant stakeholders covering a range of medical fields (eg, cardiology, neurology), professional backgrounds (eg, clinical care/rehabilitation/nursing, software developers, device manufacturer, editors, guideline developers), and professional perspectives (eg, academia, regulatory, industry/technology, publishing) and involving patients.

Limitations

There are some limitations to our study.

First, we used a limited search only in a single database using the single term of ‘digital biomarker*’, which may have overlooked some other relevant studies. PubMed was chosen as literature database given its outstanding role, reflecting the most impactful journals in biomedicine.31 We focused on this single term because we assume it to be the most central and widely used term describing the concept of ‘digital biomarker’. It is very unlikely that the definitions would be much more uniform in potentially overlooked studies or would we have included other potential concepts, and it is quite possible that many more different definitions would emerge, especially from digital biomarker developments contained in technical literature databases (such as IEEE Explore or ACM Digital Library). Therefore, we may have even underestimated the large number of different definitions.

Second, the screening and data extraction were performed by a single reviewer only. This may have resulted in some studies that were overlooked and some misclassifications, but it is unlikely that our main interpretation would change. Third, we developed a simple framework with three key elements of definitions based on a well-established framework (BEST), but the categorisation of elements is subjective to some degree. However, we employed a structured analysis that confirmed the observed heterogeneity across definitions.

Conclusions

Clear and unambiguous communication and research reporting is essential for the effective implementation of scientific innovations and developments. This requires clear definitions and consistent use and understanding of key terms and concepts. A lack of clarity and consistency can lead to research waste, delay or even misdirection of promising developments and potential. Digital biomarkers offer the opportunity to collect objective, meaningful, patient-relevant data cost-effectively with unprecedented granularity. An exact understanding of what they are and how they are described in biomedical literature is essential to let them shape the future of clinical research and enable them to provide most useful evidence for research and care. Our study can inform the development of a harmonised and more widely accepted definition, for example, with a Delphi study.