Abstract
Objective Identification of significant problems encountered and solutions adopted while implementing SNOMED CT to replace legacy coding schemes in a busy research and surveillance unit using patient level coded General Practice data held in a database populated by extraction from a subset of English General Practices:
Setting up a full SNOMED CT database from scratch
Changing data extraction/search processes throughout the unit away from the use of legacy Read version 2 and Clinical Terms Version 3 codelists to reusable SNOMED CT ‘variables’ held in a library
Establishing a robust process for curating, storing and maintaining SNOMED CT ‘variables’
Methods Retrospective review of an implementation project.
Setting up full SNOMED CT database. Research required to find clear instructions as to how the release files available from TRUD should be processed to build a fully functional database and to avoid pitfalls. Further research to develop understanding of SNOMED CT concept inactivation and how to mitigate effects
Collation of legacy codelists into consistent format to pass through cross mapping tables
Design and implementation of infrastructure to hold reusable SNOMED CT ‘variables’ taking into account naming, provenance, metadata to be included, handling of inactive concepts
Development of robust and time efficient SNOMED CT variable curation process o Development of supporting tools o Training of clinicians to curate
Explaining to researchers the concept of reusable ‘variables’ and the need for them to modify practices in order to match research and surveillance data needs to an existing library of ‘variables’ and to seek curation of new variables to fill gaps o Consideration of problems with defining research/surveillance data requirements
Providing the means to search the library
Explanation of the implications of inactivations
Version controls
Consideration of how best to convey the coverage and definition of ‘variables’ to others
Results
SNOMED CT database successfully set up: Combination of experimentation, outdated advice found in grey literature, informal help from terminology expert colleague
Legacy codelists: Found 350 in multiple formats, little or no provenance or definition, idiosyncratic naming. All translated in batch via cross mapping tables. Resulting outputs used as substrate for full curation. Only 154 of these taken forward. Full curation typically added many extra active and inactive concepts
Infrastructure developed: Supporting:
Unique naming and numbering of ‘variables’ o Agreed editorial principles for naming o Recording of dates and names of curator and checker
Agreed metadata including output type, option for free text comment
Storage of ‘variables’ in supertype/subtype format
Generation of concept flatlists for searches on demand
Agreed curation process, making best use of supertypes that can be added or subtracted. ‘SNOMED CT helper tool’ developed. Curating team trained in its use. All ‘variables’ checked by second team member
Interaction with researchers.
Difficulties with:
Shifting thinking away from fixed code lists
Obtaining plain English definitions of requirements
Matching requirements to existing ‘variables’/to identify gaps; help needed from curation team
Explaining implications of inactivations
Scepticism about re-usability
Conclusions
SNOMED CT database implementation hampered by poor quality, inaccessible, guidance
Cross mapping legacy codelists of limited value. Significant time wasted in inferring definition/purpose. Curation against full SNOMED CT led to richer more complete concept lists, and rejection of some original concepts as erroneous. Less than half of legacy codelists were fully processed into the library. Better to start afresh and apply clear definition direct to SNOMED CT
Infrastructure o ‘Variables’ stored in supertype/subtype formulation easily exportable as Expression Constraint Language (ECL) statement which is human readable and computable. Built-in mitigation for inactivations occurring over time
Easy to overlook resources required to design and implement fit for purpose supporting infrastructure
No agreed standards for:
▪ Naming ‘variables’
▪ Associated metadata
Curation process:
Good support tooling essential to achieving major savings in time and increased efficacy.
Curators should
▪ Have clinical knowledge
▪ Work as a team
▪ Check each others’ work
Interaction with researchers o Reproducibility of ‘variables’ still dependent on code lists whereas SNOMED CT version plus ECL formulation might be more robust and meaningful
Data requirements evolve as projects develop, leading to variable mapping changes. Version control of documentation essential