Table 1

Table showing number of interview respondents commenting on each item from the original framework, to gain an understanding of an end users view of the initial framework categories

Category from initial frameworkDimensionDefinitionTimes mentioned in interviews
DescriptionMetadata completenessLevel of metadata completed2
Metadata qualityRichness of metadata completion—including within required formats and quality of qualitative fields5
Characteristics and ServiceData sourceThe modality or source of data (eg, Electronic Health Record, study specific)14
Data modelThe data model or schema used by the dataset (eg, Observational Medical Outcomes Partnership (OMOP), Informatics for Integrating Biology and the Bedside (i2b2))16
Data dictionaryProvided documented data dictionary and terminologies9
ProvenanceThe original source or jurisdiction of the dataset
Usage restrictionsThe df to use the data for different purposes (eg, commercial licences, consent, expiry)6
FormatThe technical presentation of the data format (eg, Digital Imaging and Communications in Medicine (DICOM) images vs Portable Graphics Format (PNG))3
TimelinessHow quickly the data can be provided—in a useful timescale7
Fairness of the dataExtent to which the data are findable, accessible, interoperable and reusable0
PhenomeExtent and description of included patients/conditions (links with Phenome work re standards)2
ScaleCoverageNo of individuals, data points, lab tests, images, etc included in the dataset9
DurationLength of time to which the data relates3
DepthAmount of information available per individual (eg, number of fields/records, types of data)3
QualityCompletenessThe proportion of data entries that should be populated are populated (and inverse—proportion that should not be populated are not)11
Missing data handlingDescription of missing value handling and default values4
Consistency/uniformityData are presented in the required format and a similar wayfor example, field types, date formats1
UniquenessLack of duplication3
ValidityData are valid based on acceptable ‘rules’ for example, age between 0 and 120, pregnancy in male patients, physiological readings within normal ranges7
Accuracy/verificationThe extent to which the data reflects the ‘real-world’, for example, level of certainty that fields are accurate6
‘Usefulness’Qualitative, subjective measure by user (eg, Net Promoter Score/star rating)12
Added valueLinkage/mappingAbility to link with other datasets10
Transformations/derivationsLevel of derived data and descriptions, manual versus Natural Language Processing, etc1
Accuracy/verificationLevel of manual verification/sampling0
AnnotationAdditional fields added to provide further information, including phenotyping1