Table showing number of interview respondents commenting on each item from the original framework, to gain an understanding of an end users view of the initial framework categories
Category from initial framework | Dimension | Definition | Times mentioned in interviews |
Description | Metadata completeness | Level of metadata completed | 2 |
Metadata quality | Richness of metadata completion—including within required formats and quality of qualitative fields | 5 | |
Characteristics and Service | Data source | The modality or source of data (eg, Electronic Health Record, study specific) | 14 |
Data model | The data model or schema used by the dataset (eg, Observational Medical Outcomes Partnership (OMOP), Informatics for Integrating Biology and the Bedside (i2b2)) | 16 | |
Data dictionary | Provided documented data dictionary and terminologies | 9 | |
Provenance | The original source or jurisdiction of the dataset | ||
Usage restrictions | The df to use the data for different purposes (eg, commercial licences, consent, expiry) | 6 | |
Format | The technical presentation of the data format (eg, Digital Imaging and Communications in Medicine (DICOM) images vs Portable Graphics Format (PNG)) | 3 | |
Timeliness | How quickly the data can be provided—in a useful timescale | 7 | |
Fairness of the data | Extent to which the data are findable, accessible, interoperable and reusable | 0 | |
Phenome | Extent and description of included patients/conditions (links with Phenome work re standards) | 2 | |
Scale | Coverage | No of individuals, data points, lab tests, images, etc included in the dataset | 9 |
Duration | Length of time to which the data relates | 3 | |
Depth | Amount of information available per individual (eg, number of fields/records, types of data) | 3 | |
Quality | Completeness | The proportion of data entries that should be populated are populated (and inverse—proportion that should not be populated are not) | 11 |
Missing data handling | Description of missing value handling and default values | 4 | |
Consistency/uniformity | Data are presented in the required format and a similar wayfor example, field types, date formats | 1 | |
Uniqueness | Lack of duplication | 3 | |
Validity | Data are valid based on acceptable ‘rules’ for example, age between 0 and 120, pregnancy in male patients, physiological readings within normal ranges | 7 | |
Accuracy/verification | The extent to which the data reflects the ‘real-world’, for example, level of certainty that fields are accurate | 6 | |
‘Usefulness’ | Qualitative, subjective measure by user (eg, Net Promoter Score/star rating) | 12 | |
Added value | Linkage/mapping | Ability to link with other datasets | 10 |
Transformations/derivations | Level of derived data and descriptions, manual versus Natural Language Processing, etc | 1 | |
Accuracy/verification | Level of manual verification/sampling | 0 | |
Annotation | Additional fields added to provide further information, including phenotyping | 1 |