Ontologies a key concept in informatics
Ontologies are a key concept in informatics, and the leading article in this issue addresses their importance.1 Ontologies describe key concepts within a domain and their relationships. This leading article describes how to use an ontological approach to identify data sources and combine data.
We advocate that the approach to developing datasets and coding lists should also be ontological.2 This assertion is based on a realist review of the literature3 and an exploration of how this approach might lead to a more explicitly defined datasets when using routine data for chronic disease management,4 integrated care5,6 and vaccine benefit–risk research.7.
Creating an ontology should be an explicit process so that it is clear how a case, an intervention or exposure, or an outcome measure is derived from routine data. We are adding papers describing ontologies to the type of paper we will accept in the Journal of Innovation in Health Informatics. Such papers should describe an ontology in the way we set out below (Figure 1) and describe the ontology and its parts.
Our recommended process for creating an ontology is to follow the three-step process shown in Figure 1. The first step is constructing the ontology per se; the second is to select codes relevant to the data being studied. The granularity of the ontology will need to reflect the nature of the coding and classification used in a given health care system8 and the quality of data recording,9 as only very rarely are all possible codes used. The final step in the process is to test if usable data can be extracted using the planned approach. If not, the ontology and coding list are revised until a usable outcome is produced. Creating a high-quality ontology is an iterative process.
Step 1: Constructing the ontology
The ontological layer defines the relevant concepts. For an ontology that defines a diagnosis, this might include aetiology, diagnosis and other clinical features of the condition and its therapy. The ontology reflects the requirements and purpose of the investigation. An example of how an ontology might be created to define a case of diabetes is set out in Box 1.
An example of how an ontological approach might improve case finding in diabetes
An ontology for diabetes would explicitly set out the criteria used in a study so that it is possible to understand how a particular prevalence might be defined. It might be restricted to one or more categories of data or require a combination (e.g. a case of Type 1 diabetes must have a Type 1 diabetes diagnostic code AND currently prescribed insulin).
Step 2: Coding layer – creating a coding list from the ontology
Each of the types of information included in the ontology should be included in the coding list. If you restrict your ontology to one or more categories of information (e.g. simply to diagnosis), then the same will apply to the coding list (in this example, it would just comprise diagnostic codes).
Step 3: Logical data extract model
The third step in using this ontological approach is to check that it is possible to extract the data you anticipate. Sometimes codes do not have sufficient granularity. Just because a code exists within a terminology, do not expect that clinicians or those involved in data entry will necessarily use it! Literature reviews, pilot searches of data sources and speaking to practitioners in the field about their data recording all help inform if your first pass model is likely to be effective in achieving its goals.
In summary, an ontological process should enable code lists used in research based on routine data to be constructed in a logical and open way. This process will enable others to use the ontology and as is, update or modify it, or apply it to other coding systems.