Informatics for Health 2017: Advancing both science and practice

Introduction The Informatics for Health congress, 24-26 April 2017, in Manchester, UK, brought together the Medical Informatics Europe (MIE) conference and the Farr Institute International Conference. This special issue of the Journal of Innovation in Health Informatics contains 113 presentation abstracts and 149 poster abstracts from the congress. Discussion The twin programmes of “Big Data” and “Digital Health” are not always joined up by coherent policy and investment priorities. Substantial global investment in health IT and data science has led to sound progress but highly variable outcomes. Society needs an approach that brings together the science and the practice of health informatics. The goal is multi-level Learning Health Systems that consume and intelligently act upon both patient data and organizational intervention outcomes. Conclusions Informatics for Health 2017 demonstrated the art of the possible, seen in the breadth and depth of our contributions. We call upon policy makers, research funders and programme leaders to learn from this joined-up approach.

Introduction Clinical performance is increasingly quantified through analysis of structured electronic health record (EHR) data. This is especially true in UK primary care, which routinely captures clinician-coded data at the point-of-care. Cochrane reviews show that feedback of such analyses can be effective at improving care quality. However, the mechanisms by which this works are poorly understood. Consequently, the delivery of clinical performance intelligence is often suboptimal. Hence, there is a need for stronger theoretical underpinnings of such systems. We report findings from an ongoing study that is filling this evidence gap by: 1. Developing a detailed clinical performance feedback theory (CPFT). 2. Testing CPFT by using it to inform prototype software for UK primary care (the performance improvement plan generator (PINGR)), and evaluating its usability and potential impact on patients.
Methods Objective 1. Meta-synthesis of findings from qualitative studies of clinical performance feedback interventions (PROSPERO CRD42015017541). Qualitative studies tend to generate theory, though no previous attempt has been made to review this literature. Studies were synthesised through line-by-line coding. Framework analysis identified causal pathways in intervention effectiveness. Objective 2. We used CPFT to design and implement PINGR and performed a multi-stage evaluation. Methods entailed: a) usability inspection studies conducted with experienced software evaluators employing heuristic evaluation and cognitive walkthrough, b) laboratory-based mixed method usability tests carried out with primary care clinicians, assessing task performance accuracy, visual search behaviour and user satisfaction and c) ongoing field tests in primary care practices in Salford UK, consisting of usage pattern and EHR data analysis, user interviews and observations. results Objective 1. From 16,413 screened papers, we synthesised findings from 65. CPFT posits that effective clinical performance feedback is a cyclical process consisting of goal setting data collection and analysis feedback message production perception and acceptance of feedback desire and intention to respond and action. This process is influenced by a number of moderating variables such as feedback message design.
Objective 2. Informed by CPFT, a defining feature of PINGR is that it recommends improvement actions tailored for both the practitioner/clinic and individual patients ('decision-supported feedback'). Usability inspection studies (n=8) and usability Introduction Adverse childhood experiences (ACEs) were first explored in the US with adults who had medical insurance with Kaiser Permanente. Participants were asked about childhood experiences covering psychological, physical and sexual abuse and household dysfunction. Half experienced at least one ACE. More recently, a review of ACEs in England also suggested half of adults experienced 1+ ACE. ACEs have been linked to a range of adverse physical and mental health outcomes in childhood and adulthood. Despite the current interest in ACEs and their seeming importance in relation to future outcomes, a recent Scottish report concluded that 'Although data exists on various aspects of household dysfunction in Scotland, no published studies exist to date of the prevalence specifically of ACEs in the general population of Scotland'.

Introduction
The electronic nature of modern audit and feedback interventions creates opportunities to create a fine-grained picture of their effects on clinical decision making, by analysing interaction data that are a by-product of their use. The Salford MedicAtion Safety dasHboard (SMASH) intervention uses an electronic dashboard and trained clinical pharmacists to improve medication safety in primary care. The objective of this study was to assess how the dashboard was used, and how this was associated with improvements in medication safety.

Method
The SMASH intervention was rolled out in 11 general practices in Salford, UK. The dashboard interrogates electronic health records using a set of 13 medication safety indicators and presents the resulting information to its users in both aggregated form and as lists of individual patients with potential safety hazards. Clinical pharmacists were aligned to participating practices to assist practice staff in resolving safety hazards identified by the dashboard for a period of 12 weeks (intervention period) and were free to continue using the dashboard (follow-up period). We analysed the database of identified safety hazards and log files of user interactions with the dashboard during the first 6 months of its deployment.
results Eleven general practices had used SMASH for a mean period of 17 weeks (range 4 to 25) at the time of analysis. During the intervention period, 729 potential medication safety hazards in 677 unique patients were identified by the dashboard. The dashboard was used 1.6 (SD, 0.6) times by pharmacists and 0.2 (SD, 0.2) times by practice staff per week during intervention period, respectively, and 1.0 (SD, 0.7) and 0.4 (SD, 0.3) times per week in the follow-up period. Use by pharmacists decreased over time (−0.025 times per week 95% CI, −0.500 to −0.001), whereas the use by practice staff remained constant. Users viewed a page listing one or more patients with potential safety hazards in 56% (n = 217) of interactions 50% of hazards had been viewed after 7 days and 90% after 59 days. Hazards had been resolved after a median time of 102 days. At the end of the study period, 97% of hazards had been viewed at least once and 72.4% of identified hazards had been resolved. Higher interaction frequency was associated with faster resolution of hazards (36.7 days faster for each additional interaction per week 95% CI, 7.7 to 65.6).

Discussion
This study illustrates how user interaction logs can be used to evaluate use of health informatics interventions in clinical practice. We may have overestimated the time it took for hazards to be resolved because several indicators relied on 3-month follow-up data to ensure that prescriptions were not reissued. However, this is unlikely to have affected the identified relationship between interaction frequency and hazard resolution follow-up research should point out whether this represents a causal relationship. If so, this research implicates that increased efforts to make sure participants use the dashboard benefits patient safety.
conclusion A more frequent use of an electronic medication safety dashboard was associated with quicker resolution of medication safety hazards.
Introduction Stroke survivors are at a high risk of a recurrent stroke, which is likely to be more disabling and fatal than first time strokes. Secondary prevention requires health professionals to offer interventions to monitor and manage risk factors (e.g. blood pressure and antithrombotic treatment) and patients to change health-related behaviours, such as smoking and diet, and adhere to preventative medications. Currently, vascular risk factors tend to be neither well managed nor controlled. In this study, we engaged stakeholders to iteratively design a decision aid informed by an integrated clinical and research dataset, aiming to facilitate shared decision making (SDM) on effective treatments for secondary stroke prevention and motivate the patient to adhere to the selected treatments thereby reducing the risk of recurrence.

Methods
We used a range of methods to engage stakeholders (n=37), including service users (n=11), general practitioners (GPs, n=6) and other health and social care professionals (n=10), commissioners, service managers, policy makers, the third sector (n=6) and researchers (n=4). This engagement process involved: 1) initial exploration of priorities in long-term stroke care and intervention solutions through stakeholder engagement meetings, focus groups, nominal group techniques (priority setting and consensus building) and face-to-face interviews, 2) group discussions with stakeholder representatives (service users, GPs, health care professionals and commissioners), as part of core stakeholder group to discuss preliminary design interventions, reach agreement by consensus and prioritisation to develop a decision support aid targeting secondary prevention after stroke and 3) subsequent iterative review and design of the intervention with stakeholder representatives and a stroke service user research group. All qualitative data were analysed thematically.
results The final design of the decision aid enables: • the patient to indicate his/her perceived risk of having a recurrent stroke • calculates the patient's predicted stroke risk based on rules generated from the South London Stroke Registry (SLSR) using risk prediction algorithms • displays the most effective treatments and their relative benefit • presents common concerns for each treatment to elicit preferences • allows the GP and patient to decide on a management plan while identifying desired clinical and patient outcomes.

Discussion
The stroke decision aid is a personalised multifaceted tool to be used by the GP and stroke patient during the consultation to facilitate SDM on effective treatments for secondary stroke prevention and motivate patients to adopt healthier behaviours, thereby reducing the risk of recurrence. The tool contains several unique features that may not have been identified in a researcher-led design process. These include prioritising treatments, communicating risk in an understandable way and incorporating patients' desired outcomes on the management plan, which first need to be evaluated and then adapted to other decision aids supporting complex clinical conditions. conclusion The design of the tool has the potential to improve secondary prevention among stroke survivors by helping physicians to propose the most effective patient-centred treatments for the patient and allowing patients to decide on the treatments that best suit their preferences and desired outcomes. The evaluation is currently ongoing and initial findings will be reported.
Introduction Allergic sensitizations can be assessed with high resolution through component-resolved diagnostics (CRD), which measures specific IgE antibodies to a large number of individual allergenic proteins (components) from multiple sources. We hypothesize that there are distinct longitudinal developmental patterns of component-specific IgE responses that are associated with different clinical presentations of allergic diseases (such as asthma and rhinitis) and that we can use the pattern of responses in early childhood to predict later clinical outcomes.
Methods In a population-based birth cohort study, we measured sIgE to 112 components using ISAC multiplex allergen chip at ages 1, 3, 5, 8, 11 and 16 years. At each age, we clustered allergen components based on their sIgE response profiles across participants to identify sets of closely associated components. We developed a Bayesian method to estimate a mixture of Bernoulli distributions from the binary data and used it to discover the number and composition of clusters at each age. Each participant's IgE response profile was reduced based on their responses to each of these clusters. We assessed clinical outcomes at age 16 years (current wheeze, asthma and rhinitis) and investigated the associations of clusters at age 5 with clinical outcomes at age 16 years.

Results
After testing on synthetic data, we applied our clustering method to CRD data available for 922 children. One sensitization cluster was identified at age one year, 3 at age three, 4 at ages five and eight, 5 at age 11, and six at age 16 years. We qualitatively labelled clusters based on the profile of allergen components to which sensitization occurred. For each time point, the 'broad' cluster comprised of components originating from multiple sources and was the only cluster identified at every time point. From age three, the 'House Dust Mite' cluster (consisting of four mite components) formed and remained unchanged by age 16. At age three, a single-component 'grass' cluster emerged. This cluster absorbed additional three grass components and one cat component Fel.d.1 to form the 'grass/cat' cluster at age five. Two new clusters formed at age 11: 'cat' cluster (comprising of Fel.d.1) and 'PR-10/profilin' cluster. The latter cluster divided at age 16 into the 'PR-10' and 'Profilin' clusters. Cluster membership at age 5 predicted clinical outcomes at age 16 years. Asthma and wheeze were strongly associated with the 'grass/cat' cluster (ORs 9.97 [95% CI, 4.58-21.70, P<0.001] and 5.68 [95% CI, 2.82-9.60, P<0.001], respectively), while rhinitis was associated with sensitization to the 'broad' cluster (ORs, 7.40 [95% CI, 4.35-11.48, P<0.001).

Discussion and Conclusion
Different patterns of sIgE responses to multiple allergen components evolve throughout childhood and can be uncovered using our clustering method. Sensitization patterns at early ages are predictive of disease status at age 16. Recent NICE guidelines do not recommend the use of CRD in the diagnosis or management of asthma, citing a lack of evidence. Our results provide the first evidence on clinical utility of CRD data in the prediction of allergic disease throughout childhood.
Introduction The information contained within medical data is often used to make new medical discoveries. However, the most common way to use such data has been to query the data to answer very specific questions. For example, does having diabetes cause some patients to experience falls? If researchers have good questions, then the data can provide good answers. However, are there any other equally important questions that could be asked of the data that people have not yet thought to ask?
We are exploring a new strategy that we have developed to look for unusual and interesting patterns about falls in the elderly subgroups level to see the different risks associated with different groups. Some of these risks will be associated with questions that are already well known, but some should point to new and important questions that have not yet been asked. This opens up a better opportunity to identify patients at risk of falls, helping guide policy so as to reduce falls.

Methods
We mapped patient records into a low-dimensional space using the notions of semantic similarity (Resnik nodebased) and machine learning (principal component analysis) to provide a good representation of the data. This representation was used for clustering and visualisation through the DBSCAN algorithm. To look for enrichment in the resultant clusters, we analysed each cluster separately and look at the sets of patients defined in these clusters. Then, classic data mining techniques were used in order to generate hypotheses. The associations found were then be tested using more traditional comorbidity measures such as relative risk and its confidence intervals.

Results and discussion
We demonstrated the methodology on 589,169 older adults from the Clinical Practice Research Datalink. We successfully identified six distinct subgroups of falls from the elderly population who are identified with different risks. Some of the associations found are well defined in the literature, for example depression and musculoskeletal conditions are significantly associated with falls. However, a number of associations are not reported in the clinical literature. Such hypotheses need further exploration by epidemiologists.
Introduction Around the world, we are seeing an increasing number of people suffering from mental health conditions. The World Health Organisation estimates that there are over 3.5 million global patients with depression. In the UK, nearly 25% of the population suffers from at least one mental illness. The global economic burden of mental health problems reached U.S. $2.5 trillion in 2010 and is expected to grow to U.S. $6.0 trillion by 2030. Thus, new approaches are needed to deal with the scale of the problem. With the advent of social networks, people tend to disclose their emotions, feelings, and thoughts on social network platforms such as Facebook and Twitter, resulting in growing interest in detecting early stages of mental illness associated with user-generated content on these platforms using machine learning to find representative symptom patterns and construct predictive models.

Method
We have surveyed the relevant sources to establish the current state of mental health research using social network data. The search was conducted using PRISMA methodology on PubMed, IEEE, ACM, Web of Science and Scopus.
results In total, 4,606 articles matched our keywords for search and were further screened according to defined inclusion criteria, giving a final set of 39 papers. The important processes of predicting mental health based on social network data were categorised into data collection, pre-processing, feature extraction, feature selection, model construction and model validation. The standard machine learning techniques focused on textual posts by the users, with the Linguistic Inquiry and Word Count extraction tool, 1,2 while some predictive models partly based on image analysis 3 and social graph analysis. 4 To build predictive models, support vector machine, regression, decision trees, and deep learning techniques 5 were trained by extracted features from those methods. We also focused on the ethical concerns surrounding use of social network data for research.

Discussion
Predictive models were successfully used to detect users with mental health problems in several studies; however, the framework for conducting this type of research work is still in its infancy, e.g. there is no consensus on the ethical requirements, with some studies going for full Institutional Ethics Board approval, while others assumed that the data scraped from social networks can be considered public.
conclusion Based on the reviewed articles, we found that despite promising technical achievements, successful automated mental health interventions based on these technologies are still lacking. This is largely due to a methodological gap that prevents these ideas from being evaluated in standard clinical studies.
references Introduction Mappings between codes in different medical terminologies are seen as a part of medical data analysis among ICD10, SNOMED CT and READ clinical terminology. Looking up terminologies from Unified Medical Language System Terminology Service (UTS) is one of the ways of achieving this. However, the service provided by the US National Library is too general for users to be able to filter out unwanted information. Another route is to use the NCBO BioPortal, via an online ontology browsing facility. Terms collected in each ontology have been organised in a structured tree for a better visualisation. However, multiple ontologies cannot be used simultaneously, and there is a need to facilitate this task for medical data researchers who do this on a regular basis.
Introduction Learning health systems (LHSs) rely on routine extraction, aggregation and transformation of medical data from a variety of sources into actionable clinical knowledge. Diagnostic decision support systems (DDSSs) are tools that are suitable for delivery using the LHS model; however, their acceptance has been hampered by perceived usability problems that hinder clinicians' workflow. A significant challenge identified is the need for fully integrated DDSS into EHRs. Before addressing semantic integration and data privacy issues involved in the communication between a DDSS and an EHR, there is a need to agree on a standard dialogue of messages exchanged in the process of generating a DSS recommendation. To that goal, we present an abstract protocol based on service-oriented architecture for integrating DDSS with EHR, describing the messages and data content required at each step of the task. Methods Abstract integration model: We assume a general DSS is split into three logical units. The evidence service (ES) is the diagnostic knowledge base, the decision support mediator (DSM) coordinates communication between the EHR and ES decision support interface (DSI), which is a graphical front-end embedded into the EHR. The sequence of interactions comprises three phases.
Initialisation and data extraction: The diagnostic consultation starts by extracting patient EHR data, while the DSI captures the main presentation reason. All data are then passed to the ES as a diagnostic question. The ES response is an initial ranked list of diagnoses to consider, each accompanied by a list of cues and examinations pertinent to each diagnosis.
Data capture: Further diagnostic cues are captured in a structured manner by the DSI. In each iteration, every newly captured cue is sent to the DSM and the ES to obtain an updated ranked differential diagnosis list for display to the clinician, followed by an optional capture of a working diagnosis. Data storage: The final step is to write back the captured diagnostic cue data into the patient record using EHR compatible format.
results The resulting interaction protocol has been successfully implemented in a prototype DDSS, supporting both data extraction and recording to the EHR. The tool, developed as part of the TRANSFoRm project, has been integrated with Vision v3, a leading UK EHR system for general practice. The interaction with Vision is through a specialised API that requires XML formatting of data. Data extracted involves risk factors, lifestyle activities and demographics. The communication with the ES uses rest service calls and XML format for data exchange. The usability evaluation of this prototype has shown that the clinical decision-making has improved by 8%. 1 Discussion and conclusion This work presents a step towards the standardisation of the integration between decision support tools and electronic health record systems. We have outlined the main interactions steps that are needed to perform such integration, and we explained how it was applied with a leading EHR system in the UK.
publications, let alone their construction process. This process is the key, as the explicit methodology could be peer reviewed, and reused by others to improve their research. We review methods for managing (constructing, sharing, revising and reusing) clinical code sets reported in the literature and develop recommendations.
Method A PubMed literature search was performed to exhaustively search for methodological papers on code set management published until August 2016. This included papers whose title/abstract contained any of 'code set', 'set of codes', 'code list', 'list of codes' or 'value set'. The list was supplemented with papers identified by searching citations of relevant material, via snowball sampling. In total, 659 papers were screened with 629 rejected for lack of relevance (544 title, 46 abstract and 39 full text). This review is based on 30 papers. results Although differences existed between the methods described, common themes emerged. A popular approach was to reuse an existing code set (n = 21) from a previous study (n = 5) a national clinical quality management scheme (n = 11) or both (n = 5). The reused set was often updated or extended (n = 20). Authors reported some specific strategies: exploiting the hierarchical nature of coding terminologies (n = 23) preparing a synonym list to search for (n = 20) employing an iterative approach after preliminary searches (n = 13). The putative sets were usually reviewed (n = 26), mostly by clinicians (n = 20), before definitive use. There were frequent calls for openness and sharing of code sets and code set management methods (n = 14), with some giving actual suggestions or platforms for sharing (n = 8). The need for sensitivity analysis (n = 19) and caution due to the temporal and dynamic nature of code sets (n = 13) were also mentioned. Seven papers described software to support the selection of code sets and a further two suggested features for such tools.

Discussion
The process of constructing clinical code sets is time consuming and error prone. This review has identified and analysed the code selection methods that are commonly reported, which probably reflects better practice than in those studies where methods are unreported. However, despite the existence of relevant software tools, their use is seldom reported, suggesting that they are underused. Potential barriers to their uptake might be lack of awareness of their existence ignorance of their necessity or deficiencies in the tool themselves, either in functionality or that they are time consuming to use. To facilitate the widespread adoption of software tools for code set construction, they should be quick and easy to use have minimal setup facilitate the reuse, validation and sharing of code sets and not simply their construction.
conclusion Research using healthcare databases could be improved through the further development, more widespread use and routine reporting of the methods by which clinical codes were selected.
On the MAASTRO data, the classifier was trained on 272 training instances from 30 patient records and applied to 1116 instances across 133 patients. Measured over instances, F1 = 0.85 in a three-way classification (metastasis confirmed/suspected/ruled out). Over patients, the tool labelled 32 as requiring manual review, 65 as ineligible (fallout 6.0%) and 36 patients as likely eligible (precision 89%). A baseline system using regular expressions gave 32% fallout.

Discussion
The i2b2 replication indicates how, for practical applications, the much reduced system complexity might outweigh the limited drop in performance. The MAASTRO application indicates that transfer of the classifier to a new domain is readily feasible, giving reasonable output with limited training. This classifier reduces manual screening time by 75%.
conclusion A versatile and efficient machine classifier is presented, which deliberately avoids language, domain, or computationally expensive resources such as stemmers, parsers, thesauri, ontologies and word embedding algorithms. It is demonstrated to be a compelling candidate for practical application.
references Introduction Selective serotonin reuptake inhibitors (SSRIs) have been associated with reduced risk of seizure-related death in murine models of epilepsy. We therefore sought to examine the relationship between SSRI use and mortality in patients with epilepsy using electronic health records (EHRs).
Methods A published case definition for epilepsy was used to extract a cohort of patients from the CALIBER resource (which contains national linked structured EHR data from primary care, hospital care and a cause-specific mortality registry) between 1 January 1997 and 31 March 2010. We selected only those patients with active epilepsy defined by the failure to achieve 12-month seizure freedom over the duration of follow-up. The primary outcome was all-cause mortality treating SSRI use as a time varying covariate in a Cox proportional hazard regression model. Patients were considered exposed following their second SSRI prescription. We also evaluated the temporal association between SSRI use and mortality by dividing follow-up into 6-and 12-month epochs. We then used competing and cause-specific risk models with Firth correction to evaluate the association between SSRI use and possible seizure-related death. All regression models controlled for age, sex, depression status (past or current as defined by a previously published case definition), Charlson comorbidity index and Townsend index (a measure of social deprivation).

Discussion
Patients with active epilepsy exposed to an SSRI appear to be at increased risk of all-cause mortality, especially over timespans greater than 1 year, though the risk is lower than that for a matched population without epilepsy. The influence of SSRI use on possible-seizure related death appears negligible at best and not robust enough to offset the risk of all-cause mortality.
Introduction Data science is a burgeoning field that offers the opportunity to extract knowledge and new insights from the explosion of health care data such as electronic medical and health records' genomic, biomarker and imaging studies. Yet, which aspects of data science are most relevant to the field of health services and policy research remains unclear. The Institute for Clinical Evaluative Sciences (ICES) explored this question in order to develop priorities in data science for health services research at ICES.
Methods A committee of scientists and staff from our institute, led by a statistician/health services researcher, undertook a review of published and grey literature, and consulted external experts and key informants. We reviewed trends and innovations in health services research, including evolution in the kinds of data being used, novel methodologies and new approaches to distributed data analyses. The report and recommendations were presented to the institute's international scientific advisory committee for their input and approval.
results Approximately 62 reports and studies were reviewed and 12 external experts interviewed over 1 year. The report's final recommendations were as follows: 1) expanding partnerships to pursue novel types of analyses rather than developing this capacity entirely in-house, such as with computer science for machine learning and text mining on unstructured data in EMR, and linking genetic or biomarker data to clinical and administrative phenotypic data for gene-association studies, 2) strengthening our data quality framework in areas such as de-identification and linkage, assessing the validity of study-specific data elements, and ensuring robust data quality tools, audit and oversight processes are fully integrated into studies, 3) creating, with computer science partners, a data safe haven infrastructure to allow external researchers to securely store and link their data to ICES data or that of other researchers, conduct advanced analytics with access to an existing high-performance computing environment, and provide efficient, privacy-preserving and secure data access, 4) focus on exploiting existing biomedical big data at ICES to extract meaning from large, messy structured and semi-structured data with deep clinical information (e.g. population level electronic laboratory results data) and unstructured (e.g. primary care EMR) and make them research ready, rather than focus on acquiring new novel biomedical big data, 5) identifying gaps and opportunities to train staff and scientists in modern data science methods and appropriate statistical software to implement them, such as machine learning and data visualization and 6) supporting expanded multi-jurisdiction research through the development of distributed data research networks and the necessary 'build-once, use many times' infrastructure such as common data models, harmonized algorithms and analytic protocols and associated methodologies.
conclusion The rapidly increasing availability of health data combined with the expanding field of data science presents not only opportunities for health services researchers and institutes but also challenges to determine priorities for exploration and investment. Through a thoughtful and deliberate process, ICES identified six priority activities that will guide our institute's approach to building data science at ICES and its integration into the more traditional health services research undertaken at ICES.
Introduction Clinical trials are conducted over populations within a defined time period in order to illuminate certain characteristics of a health issue or disease process. These cross-sectional studies provide a 'snapshot' of disease heterogeneity across populations but do not provide an explicit means of examining the temporal nature of the disease. Longitudinal studies can be used to explore these properties but are expensive and time consuming to conduct. As a result, machine learning approaches have been developed to produce algorithms that can infer reliable time series models from large amounts of historical cross-sectional data.
Interestingly, in the field of single cell genomics, similar methodological ideas have also been developed. Recent advances in high-throughput genomic technologies have enabled experimentalists to capture thousands of cells and to interrogate each using molecular profiling. In temporally evolving biological systems, such as cellular differentiation processes, the data represents a cross-sectional profile of the cellular population. As a result, considerable advances have been made in extracting temporal information in order to better understand cellular heterogeneity and function over time using what are commonly referred to as 'pseudotime ordering' algorithms.
Method Recognising the potential synergies in both domains, we have developed a generalised pseudotime approach arising from our single cell genomics research, which we call 'phenotime' that can operate across a range of data modalities and applications. The algorithm uses a novel covariate-adjusted Gaussian process latent variable model (c-GPLVM) to order input objects obtained from a cross-sectional study in terms of latent pseudotemporal progression using phenotype covariates to constrain and guide the pseudotemporal assignments. The Bayesian inference for the model allows for full characterisation of statistical uncertainty. Briefly, the c-GPLVM performs nonlinear dimensionality reduction placing objects with similar high-dimensional data profiles close together in a latent one-dimensional pseudotemporal space. If there are certain phenotypic covariates of interest, these can be used to modulate the pseudotime assignment process allowing objects with distinct phenotypic traits to form alternate pseudotemporal trajectories.
results To demonstrate the utility of 'phenotime', we applied the algorithm in an integrative analysis of high-dimensional molecular profiles (gene expression) and metastatic phenotypes of cancer patients from The Cancer Genome Atlas. This enabled us to perform a pseudotemporal adjusted expression quantitative trait loci analysis to identify gene expression behaviour that differed between patients with and without metastatic disease taking into account that patients may be at different stages of disease progression. Our analysis identified novel associations between progressive alterations in cancer-related lipid metabolic pathways and immunosuppressive processes driven by T-cell regulators with metastatic status that are not determined by conventional analyses, which do not account for latent disease progression.

Introduction
There is a large amount of data collected routinely by hospitals, but it is held in different locations, in different formats, on different timescales and by different stakeholders. There have been few attempts to integrate these data to inform the assessment and management of risk by taking proactive action to avoid or reduce the risk in situations known to be associated with adverse outcomes. Little is known about the basic determinants of organisational performance, such as the effects of increased patient numbers, or high patient acuity on outcomes such as adverse incidents, patient complaints and patient experience.
The CARE model of organisational performance in healthcare proposes that outcomes emerge from misalignments between demand and capacity and the adaptations that are required to successfully manage those misalignments. Adjustments to functioning occur often in healthcare in response to surges in demand, staffing shortfalls, equipment unavailability or novel problems that have not been encountered before. In this project, we are empirically testing the CARE model by integrating multiple hospital metrics and indicators and developing predictive models of performance. The aim is to test the feasibility of integrating hospital administrative data and to develop tools to assist organisational planning and decision-making.

Method
The study site is a large University teaching hospital in London. Data were gathered from a wide range of sources across the hospital and include data on patient spells, length of stay, diagnoses, treatments, equipment availability, adverse incidents, patient complaints, patient experience, staff experience, trust bed availability, staffing levels, locum cover, staff sickness and escalations. These data presented challenges due to their differing formats and timescales and their interpretation and reliability. Exploratory workshops were held with key informants to assist in interpretation. Extensive work was required to clean and transform the data in preparation for analysis. A longitudinal time series database was built and analysed using appropriate statistical techniques.
Introduction High profile initiatives and reports highlight the potential benefits that could be realised by increasing linkage of, and access to, health data. The question is do members of the general public support the proposed uses of what can be considered "their" data? The objective of this study was to gain insight into the general public's attitudes toward users and uses of linked administrative health data in Ontario, Canada.
Method From 2015 to 2017, a series of nine 2-h focus groups were held in Ontario, Canada, including sessions in downtown Toronto and Thunder Bay in northern Ontario. All sessions began with a brief overview of the process used by the Institute for Clinical Evaluative Sciences to remove or code identifying personal information prior to making linked health datasets available for research. Participants of focus groups were asked to discuss and respond to written information such as exemplar research studies, options for data access, and case studies designed to highlight potential benefits and risks from the general public's perspective. The research team identified themes across the series of focus groups.
results For some types of studies (e.g. a study of the safety of a prescription drug product), many members of the public assumed that research based on linked administrative data is happening more broadly than it actually is and had no expectation of being asked for their approval or consent. For other studies (e.g. use of public data to inform marketing efforts by a commercial organisation), focus group participants disagreed with use of public data and/or stated that consent should be obtained before public data were used. When presented with options for how analyses for the private sector might be performed, participants preferred models that had independent analysts performing the analyses versus providing private sector employees with access to data.

Discussion
There was no blanket approval of research based on linked administrative health data. Public views depended on the purposes for which data would be used. Because of security and trust concerns, which extend beyond health data, participants preferred models that limit the number of individuals or organizations accessing data.
conclusions Members of general public were generally supportive of research based on linked administrative health data for specific purposes, but there were limits to this support. Ensuring that research based on public administrative data is aligned with public values and preferences will require consultation and solicitation of feedback on different types of studies.
Introduction Potentially preventable hospitalisations (PPHs) have drawn increased international attention in recent years. Such hospitalisations are characterised by being potentially avoidable through the provision of appropriate non-hospital health services. It is of interest to health policy planners to be able to accurately predict areas that are expected to have persistently higher rates of PPHs than on average in future time periods. There is a scope for development of improved statistical methodology to make such predictions.
Method Using linked admissions data and census information for small geographic areas in Western Australia (WA), we developed validated prediction models to identify areas expected to have persistently high rates of PPHs throughout a three-year future time period. Potential predictors consisted of the age, sex and ethnicity distributions within each area socioeconomic indicators measures of accessibility to emergency department and general practice past trends of persistently high rates of PPHs and rurality. We developed state-wide and metropolitan area prediction models for four exemplar PPHs, namely high-risk foot COPD heart failure and Type II diabetes mellitus.
Our methods used a combination of standard logistic regression, repeated fivefold cross-validation and exhaustive model selection. Approximately, 4,500 candidate models were repeatedly cross-validated 500 times in order to identify stable optimal model structures for prediction. This process required efficient utilisation of high capacity parallel processing.
results Up to 200 cross-validation repeats were required to stabilise the model selection process. The optimal prediction models achieved mean validation positive predicted value (PPV) of between 65% and 95% while maintaining sensitivity of at least 50%. These models identified a number of both rural and metropolitan priority areas across WA.

Discussion
Health interventions require sufficient time to develop and implement. Therefore, long range forecasting of areas expected to have persistently high rates of PPHs allows for appropriate interventions to be potentially implemented within a realistic time frame. We have described the application of complex statistical techniques to make such predictions; these methods utilise high capacity parallel processing to optimise the validation sensitivity and PPV among candidate models. Consideration of PPV, together with associated intervention costs and potential savings, allows for the estimation of return on investment associated with intervention. Our models have identified some areas in WA that were predicted to have persistently high admission rates for multiple different PPHs these areas represent high priority areas for non-hospital interventions aimed at reducing health inequality.
conclusion To our knowledge, this study is the first to focus on developing validated prediction models to identify geographic areas expected to have persistently high rates of PPHs in long-term future time periods. Our models performed well when applied to multiple exemplar PPHs. We suggest that these methods can assist in the development of appropriate non-hospital interventions targeting PPH-related health inequality. Abstract no. 234 Predictive validity of measured obesity versus obesity ascertained from administrative health data for osteoporotic fractures

Lisa Lix, Shuman Yang, Lin Yan, Aynslie Hinds, and William Leslie, University of Manitoba, Winnipeg, Canada
Introduction Obesity is a risk factor for many chronic health conditions, but is reportedly protective for osteoporosis and most osteoporosis-related fractures. While administrative health data have been used extensively for predicting risk of chronic conditions within populations, most predictive models lack information about obesity because it is infrequently coded in administrative data. Our purpose was to compare the validity of obesity defined from administrative data with measured obesity from clinical registry data for predicting osteoporosis-related fracture risk.

Methods
We identified 36,372 individuals (50+ years) in a clinical registry database for bone mineral density (BMD) from the province of Manitoba, Canada, with body mass index (BMI) measured between 2001 and 2015. Measured obesity (MOB) was defined as BMI ≥ 30 kg/m 2 . Linked administrative data were used to ascertain obesity and fracture diagnoses. Obesity was defined from administrative data as (a) at least one hospital or physician International Classification of Diseases (ICD) code within three years prior to the BMD test date (DOB1) or (b) at least one hospital or physician ICD code, laparoscopic surgery procedure code or appetite-supressing prescription medication within three years prior to the BMD test date (DOB2). Cox proportional hazards models were used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) for incident osteoporosis-related fractures (hip, forearm, clinical spine and humerus fracture) associated with MOB, DOB1, and DOB2 before and after adjusting for confounding covariates (e.g. age, sex, income quintile, prior fractures and comorbid conditions). Measures of discriminative performance (i.e. c-statistics) were compared for models with and without obesity variables.

Discussion
Obesity ascertained from administrative health data can be used as a proxy for measured obesity in fracture risk prediction models. Further research is warranted to test whether obesity defined from administrative data can be used to predict risk for other chronic conditions.
conclusions Consistent with measured obesity, obesity ascertained from administrative data was associated with reduced osteoporosis-related fracture risk. However, obesity ascertained from administrative data was slightly poorer than measured obesity for discriminating between individuals with and without osteoporosis-related fractures.
Introduction In healthcare structures, operational applications routinely record huge volumes of data and allow users to exploit them in the primary objective for which they were developed (i.e. medico-legal aspect and administrative purpose). As these operational applications have now been implemented for several years, they may provide significant volumes of data of interest for research for example. Initiatives have been undertaken to reuse these data, it faces various difficulties, e.g. operational databases contain errors due to input error or poor documentation from users and monitoring artefacts from monitors. Moreover, the way data are stored is not optimized to make the reuse of data easier. In order to improve the data quality, it is necessary to identify precisely the difficulties faced and provide adapted recommendations to fix them. The first step is to define an illustrated taxonomy of data quality problems. This presentation deals with the design of this taxonomy and its usefulness to assess the overall quality of data recorded within an anaesthesia database.
Method An exhaustive list of data quality problems was identified from existing published works in scientific literature or based on our experience in designing data warehouses. These items were then ordered in a taxonomy so that it was as operational as possible for easily identifying data quality issues. Finally, this taxonomy was used to assess the quality of data recorded between 2010 and 2016 in the Anaesthesia Information Management System implemented in Lille University Hospital.
results We identified 100 items of data quality problems from eight papers. After selection of relevant items and deduplication, we added new types of issues not yet reported that we met during our previous experiences (n = 6), leading a total of 50 different data quality problems. Those items were classified in the taxonomy according to the levels of granularity of the databases they are related to single field of a single record, single field of multiple records, multiple records, etc. Based on this taxonomy, the quality of data recorded for 388 026 interventions was assessed. The main problems were 'imprecise values' (e.g. values outside normal ranges) and 'missing data' (data not documented or not linked between systems). We noticed that the quality of data evolved over the years, e.g. 'missing data' about patient's weight concerned 25.8% of the interventions in 2010 and 1.0% in 2015. Data quality was also perceived as department-dependant.

Discussion
Through an exhaustive and intelligible taxonomy, our work provides an operational tool to identify and describe data quality issues in a healthcare database. Its impact on improvement of data quality with the aim of data reuse has to be assessed in further works.
Introduction Existing data linkage projects in Wales have primarily focused on health datasets. To build the complete picture of care service provision, there is a need to broaden the linked data available to included health, social service provision by local authorities and provision of support by third sector organisations. The aims of this pilot project are to (i) test the feasibility of linking datasets from a local authority, the NHS and third sector organisations, (ii) build a more complete picture of service provision using adults who have been referred to social services in order to avoid admission to hospital or to facilitate their discharge from hospital and (iii) assess the range and quality of data available in each of the organisations providing services to those individuals.
Method Bangor University led a research team partnered with Gwynedd Local Authority to explore the governance issues and practicalities of providing an anonymised dataset to the SAIL databank (Swansea University). Two third sector agencies were also approached. With the various required service level agreements in place, data were put through the SAIL process for analysis.
results Details of 20,373 referrals generated by 12,228 social services clients and 21,220 service delivery records provided to 6,278 clients of Gwynedd Local Authority from the period 2008 to 2015 were transferred into the SAIL databank. The personal data relating to people known to Social Services were anonymised through the NHS Information Service from a table of 24,431 Gwynedd records. Removing duplication, 17,431 (96%) individuals were linked and 744 (4%) remain unmatched. Examination of the referral and service records related to the unmatched clients revealed no pattern or bias in the data related to these individuals. Of the 12,802 clients having a referral, service or both, we were able to link 84% to GP recorded events 94% to hospital day case or in-patient spell 71% to accident and emergency attendance and 95% to outpatient activity. A cohort of 162,831 Gwynedd county residents has been constructed from the SAIL core datasets, of which Social Services clients with referrals and/or service records form an 8% sample. Matched controls are being constructed to make health and service utilisation comparisons between clients and non-clients. We will present results from these comparisons.

Discussion
This has been a slow-moving project due to its pathfinder nature, getting the right permissions in place, agreements created and signed off, staff available to prepare data and technical problems with the downloads. Attempts to bring in and link third sector data are still underway. Now that the data are linked, many of the steps developed will allow a more rapid progression for similar linkage in the future.
Introduction Neurological conditions are a major cause of mortality and morbidity among children and young people (CYP), accounting for around 10% of their hospital admissions. Some conditions, for example epilepsy, can be managed through scheduled outpatient appointments. This study aimed to quantify any association between non-attendance at outpatient appointments and incidence of accident and emergency (A&E) attendance and emergency inpatient admission.
Method A cohort of CYP with neurology conditions was identified using an ICD-10 coding framework applied to inpatient admissions in England from 1 April 2003 to 30 March 2015. CYP were included in the cohort in financial years in which they had at least one inpatient admission at least one framework diagnostic code recorded in any diagnostic field. The inpatient data were linked with A&E and outpatient data. Analysis of A&E attendance was restricted to 1 April 2007 to 30 March 2015 as A&E data were only available for this period. In each year, individuals were categorised as either attending all available appointments or missing one or more appointments (appointments missed during inpatient stays were excluded). This was used as a predictor in multilevel (reflecting dependence across years for single individuals) negative binomial regression models, one with the count of A&E attendance in the year as the dependent variable and one with the count of emergency inpatient admissions in the year as the dependent variable. Other predictors were ethnic group, age group, deprivation category, Government Office Region of residence and primary diagnostic group. Time at risk was included in the models.
results There were 524,958 individuals in the cohort overall (420,062 in the years for which A&E data were available) with a mean of 1.3 A&E attendances and 1.2 emergency inpatient admissions per person per year. Individuals who had missed one or more outpatient appointments had significantly more A&E attendances (incidence rate ratio (IRR) 1.205, 95% CI 1.197-1.214) and emergency inpatient admissions (IRR 1.187, 95% CI 1.179-1.195) than individuals who attended all appointments. When only missed neurology-related outpatient appointments and emergency inpatient admissions with a primary neurology diagnosis were considered a smaller association was observed: IRR 1.037, 95% CI 1.015-1.059. Ethnic group, age group, deprivation category, Government Office Region and primary diagnostic group were also significant predictors of A&E attendance and emergency inpatient admission.

Discussion
The association between missed outpatient appointments and increased A&E attendances and emergency inpatient admissions may be indicative of missed outpatient appointments increasing the need for subsequent unplanned care, impairing management of conditions and preventing early response to deterioration through planned treatment. The lower IRR for emergency inpatient admissions with primary neurology diagnoses related to missed neurology outpatient appointments may be due to diagnostic coding issues. There are confounding factors, for example, individuals with more severe illness may miss more outpatient appointments and also have more A&E attendances and emergency inpatient admissions due to illness.
conclusion Non-attendance at outpatient appointments is associated with higher incidence of A&E attendance and emergency inpatient admission.
Introduction Electroencephalography (EEG) is a non-invasive method for measuring electrical activity in the brain and crucial for the detection and evaluation of neurological and psychological disorders. However, it traditionally requires complex hardware and computing set-ups restricting its application to clinical environments. Therefore, classical EEG systems provide low accessibility that is particularly disadvantageous for health care in regions with limited infrastructure. Furthermore, traditional EEG restricts subjects in terms of their movement limiting research.
We present BrainLab, a mobile system for brain research consisting of a mobile Android application combined with a wireless commercial consumer EEG device. Since the chosen EEG device is primarily a headset, it is placed easily while data acquisition is enabled via Radio/Bluetooth. At the same time, visual and auditory experiments can be conducted on the mobile device, measuring the brain reaction to predefined events. Implementation of experiments is possible without previous programming knowledge using a simple structured text file. The recorded data containing brain reactions and occurrences of the stimuli can be exported using the common EDF format or, in the future, be analyzed directly on the mobile device without an Internet connection. For this purpose, we embedded a visual programming interface into BrainLab, allowing loading and plotting of records as well as basic processing. Further functions are in development. To evaluate BrainLab, several experiments regarding usability and functionality were conducted until now, including the replication of a scientific EEG experiment.
results In the course of the evaluation, an expert confirmed brain activity recorded with BrainLab appeared valid, while known brain reaction to specific stimuli was detectable in the data. Furthermore, the average cross correlation between the signals at the electrodes and the recorded signals was calculated to be 0.9875. This shows that no significant distortion is induced through the recording process.
The evaluation results show that our system already allows for fully mobile and feasible EEG based experiments in an arbitrary environment while being well suited for use by inexperienced users.
conclusions BrainLab is a promising starting point for mobile brain research. It allows for the creation and conduction of custom experiments while recording brain reactions to specific stimuli. Enabling analysis and interpretation of the resulting data on the mobile device without additional infrastructure is currently in progress. Even though measurement quality is limited in comparison to clinical EEG systems, our research indicates a feasibility for mobile screening for neurological disorders, which could increase accessibility to neurological examination for immobile patients or resource-austere areas. Furthermore, it could enable situational neurofeedback training in rehabilitation and novel, unrestricted EEG experiments. Thus, BrainLab holds the potential to improve research and health care.
Introduction Hearing loss is one of the most prominent health burdens, with over 360 million sufferers worldwide. Medical apps can become key drivers for pervasive, effective hearing health care (HHC) for very large populations with hearing problems; however, there is still a lack of methods for informed adoption of apps. The use of specific models for app assessment is crucial to identify the relevant attributes and to highlight the emerging needs in the HHC field. Moreover, due to the usually large set of features needed to characterize apps, it is important to devise informative methods for data analysis so to extract focused and relevant information in a given study sample.
Method We here develop a novel approach for the characterization of mobile apps for HHC that combines the ALFA4Hearing model (At-a-glance Labelling for Features of Apps for Hearing health care) with data visualization techniques. The ALFA4 Hearing model is a recently developed method for apps for HHC that characterizes apps against a core set of 29 features grouped into five components (promoters, services, implementation, users and descriptive information). The model is used here to provide descriptive pictures of a sample of 137 apps (iOS and Android) covering the whole spectrum of HHC services. Data visualization techniques were used here to analyze the relationships between the model components and features in our sample of apps as well as in specific subsets. We analyzed data by using several data visualization techniques, with different weighted graphs algorithms and network layouts, with a varying number of layers and nodes.
results We found that among the several data visualization approaches here tested, three methods showed greater potential: (i) a two-dimensional cluster graph was helpful to describe the relevance of the different components in the model and the distribution of features within each component, (ii) a three-dimensional, three-layer layout was found to be explanatory about the relationships between the apps (layer 1), the features (layer 2) and the model components (layer 3), and made it easy to extract information about specific apps, apps subsets, or features clusters, and (iii) a twodimensional network was able to highlight effectively the relationships among features within-and between-domains in any given app sample to explain the differences in feature patterns among samples of apps (e.g. different target groups, different services).

Discussion and conclusions
In this study, we were able to identify three graph layouts in order to effectively highlight the relationships among the app features. The relevance of these features and the role of the different model components in any sample of apps were described by a clear representation. Our combined approach provides a promising tool able to represent a large amount of information from multiple perspectives to study the current trends in the field. Moreover, this approach could be of great value to identify emerging research questions and potential opportunities for developers, stakeholders, or clinicians and drive their directions for research, professional training, clinical use of apps, as well as technical developments.
Abstract no. 263 Investigating educational attainment at age 16 years in adolescents who are looked after or in need using record linkage and a birth cohort study Alison Teyhan, Andy Boyd, and John Macleod, University of Bristol, Bristol, UK Introduction In the UK, 'children in need' (CIN) have social services involvement, and 'children looked after' (CLA) are in the care of their local authority. Through record linkage, we identified adolescents who were CIN or CLA in a population-based birth cohort study and examined their educational attainment at age 16 years.

Methods
The Avon Longitudinal Study of Parents and Children (ALSPAC) has been linked to the National Pupil Database (NPD), a repository of educational data for pupils in England. The NPD has been linked to the Department for Education's CLA data return (since 2006) and CIN census data (since 2009). ALSPAC participants (n ~12,000) were coded as being CLA during school years 10 and 11 (aged approximately 14-16 years) if they had a CLA record while in Year 10 or Year 11 and as being CIN if they had a CIN record in 2009 that stated they had been registered as in need before the age of 16 years. A 3-category exposure variable was derived: 'not CIN/CLA', 'CIN (not CLA)', and 'CLA'. GCSE exams are taken at the end of Year 11. Two measures of attainment were examined: capped GCSE percentage score (summed score of eight best subjects, converted into a percentage of maximum possible score) and attainment of five GCSEs at grades A*-C including English and Maths. Covariates included maternal characteristics (e.g. age and mental health), and measures of socio-economic position reported by the mother during pregnancy (e.g. maternal education and financial difficulties) and from the NPD (free school meals and area-based deprivation).
results 81 adolescents were identified as being CIN (0.67%) and 81 CLA (0.67%). The main reasons for being a CIN were child disability (36%) and abuse/neglect (25%). Most CLA were living with a foster family (72%) and the main reasons for being a CLA were abuse/neglect (36%) or acute family stress (23%). Compared to their peers, CIN and CLA individuals were more likely to have a young mother and to be from a disadvantaged family. GCSE results data were available for all of the CLA, and 53 of the CIN. Their attainment was low relative to peers: for CIN, the mean GCSE percentage score was 28.0 and 7.6% passed 5+ GCSEs including English and Maths. For CLA, the results were 31.2 and 6.2%, respectively, and for 'not CIN/CLA', 68.7 and 51.8%. After adjustment for a range of maternal and socio-economic position covariates, relative to 'not CIN/CLA', CIN had mean GCSE percentage score 34.3 points lower (95% CI, −40.4 to −28.2) and CLA 30.3 lower (−35.4 to −25.2), and they remained substantially less likely to achieve 5+ GCSEs including English and Maths: CIN, OR 0.17 (0.05 to 0.52) CLA, OR 0.08 (0.02 to 0.26).

Discussion and conclusions
Although CLA and CIN statuses were strongly socially patterned, the low educational attainment of these individuals relative to their peers was not explained by the maternal and socio-economic position factors considered. Future work will also examine the influence of school-level factors on the academic achievement of adolescents who are CIN or CLA.
Introduction Routinely collected datasets including electronic health records are increasing widely used in healthcare research. Among other advantages, they allow research to be done on a much wider scale, using hundreds of thousands of patient records to answer questions about what is happening in the 'real world'. Although routine datasets provide exciting opportunities for research, their size and variability can make them difficult to analyse and might discourage some researchers.
Here we present statistical methods that facilitate the analysis of large routine datasets. We focus on the aim of investigating the association between patient characteristics and an outcome of interest, while allowing for between-practice variation.

Methods
We fit mixed effects regression models to outcome data, including predictors of interest and confounding factors as covariates. To allow for variation in outcome among general practices, we include random intercepts. We implement the analysis using weighted regression or meta-analysis of estimated regression coefficients from each practice. These methods are likely to be familiar to applied healthcare researchers, though not for the purpose of analysing large datasets. Both methods reduce the size of the dataset, thus decreasing the amount of time required for statistical analysis. We compare the methods to an existing subsampling approach.
results All methods yield similar effect estimates. Weighted regression and meta-analysis give similar precision to analysis of the entire dataset, but subsampling is substantially less precise. We compare the methods through application to two contrasting primary care datasets from the Clinical Practice Research Datalink (CPRD). For example, in a dataset comprising 2,116,948 observations recorded by 674 general practices between 1996 and 2014, we investigate the impact of financial incentives in UK primary care on the recording of elevated cholesterol in patients with severe mental illness. In conventional mixed effects logistic regression of the entire dataset, the estimated immediate effect of an incentive introduced in 2011 has odds ratio 1.80 (95% CI: 1.70 to 1.92), showing increased recording of cholesterol. Weighted regression reduces the size of the dataset by 98.5% and gives identical results. Meta-analysis and analysis of representative 10% of the dataset give similar estimates of 1.67 (95% CI: 1.54 to 1.80) and 1.63 (95% CI: 1.32 to 2.01), respectively.

Discussion
Fitting mixed effects regression models that allow for multiple sources of variation have become relatively straightforward in standard statistical software. However, it is time consuming and not very practical to fit such models to data comprising hundreds of thousands of observations nested within hundreds of general practices. An existing subsampling approach allows us to extract the data required to answer the research question and has performed well in example applications, but it may be preferable to make use of all the data. Where all data are discrete, weighted regression is equivalent to fitting the model to the entire dataset. In the presence of a continuous covariate, meta-analysis is useful.
conclusion We have identified scalable statistical methods for analysing large routine clinical datasets. These methods would be accessible to applied researchers and are easily implemented using standard statistical software.
Introduction Given an ageing population, the importance of patient frailty in clinical decision making is increasingly recognised for both treatment and care. The electronic Frailty Index (eFI) is a tool developed at the University of Leeds and implemented in GP systems that is based on a cumulative deficit model of frailty, consisting of 36 'deficits' constructed using around 1,500 clinical codes from a patient's primary care record. There is much debate on whether frailty is unidimensional or multidimensional. Unidimensional models, like the eFI, generate a single frailty score. Multidimensional models produce scores on frailty sub-domains, such as physical, cognitive and psychosocial. The aims of this study are to identify the multidimensional models of frailty that best fit actual patient data on health deficits and then use these models to generate frailty profiles for individual patients and explore subdomains. The work uses two large UK primary care databases of electronic health records, the Clinical Practice Research Datalink (CPRD) and ResearchOne, and is funded by the School for Primary Care Research.

Methods
The main theoretical multidimensional models of frailty and their associated sub-domains will be identified from the literature. The deficits from the eFI will then be mapped to each sub-domain, independently by a group of GPs and a group of lay PPI contributors, disagreements being resolved through discussion. Each theoretical model will be assessed for goodness-of-fit to the deficit data using confirmatory factor analysis using samples of CPRD patients aged >=60 from 2006 and 2015 interpretation of the models will be determined by consensus among the research team and any modifications made will take into account the theoretical basis. Exploratory factor analysis will also be conducted to find the best fitting purely empirical model of deficit groupings. An expert clinician panel will make a final choice of one or more wellfitting multidimensional models. Different models may be suited to different purposes, e.g. a many-dimensional model to aid clinical decision making, and a broader model with fewer dimensions for epidemiological work. External validity will be assessed by applying the model(s) to equivalent samples of patients from the ResearchOne database. Predictive validity in relation to mortality will be assessed and compared in performance to the original eFI. Multidimensional frailty score profiles for individual patients will be generated and patients clustered according to their profiles to identify homogenous subgroups.

results/Discussion
This work has just begun and we are currently mapping deficits to theoretical models of frailty. The work will be completed in time for presentation at the conference.
conclusion Frailty profiling could aid clinical decision making in primary care, including identifying individuals who would benefit from longer appointments, supported self-management, care and support planning, or a comprehensive geriatric assessment. Knowledge on numbers and types of frail patients at national, local and GP levels can help predict future healthcare needs. It could also inform the design of trials of healthcare interventions using frailty measures to stratify patients to better target frail person care.

Introduction
The effect of the wider social environment on physical and emotional health is a longstanding area of study. The impact of the individual's social residence type, such as living alone, or living in a care setting, adds a layer of complexity to health and social research and policy. For example, infection transmission in student accommodation could lead to improved sanitation measures. Mental and physical health outcomes of chronic disease patients living alone compared with living in a community care setting could highlight the need to additional social care support. Surveys, including the UK Census, are collecting this data. However, these surveys are expensive, time consuming and usually cover a subsection of the population. Large-scale, linked databanks, such as the Secure Anonymous Information Linkage (SAIL) Databank at Swansea University, holding administrative datasets, including health data, are broader in scope, both in terms of the nature of the data held and the population. Despite known data quality issues, a method for anonymously modelling household types on the population as a whole, using routinely collected administrative data, would be beneficial to linked data research.
SAIL contains anonymous linked data on over 3 million Welsh residents. Address data is not stored, but postcodes are anonymised by a Trusted Third Party to create the residential anonymised linkage field (RALF). RALFs can be mapped to local super output area (LSOA) codes for geographical analysis. Using anonymised demographic and addressed data, individuals can be grouped into likely households. This study aims to classify these households into types for use for further research.
Method Residences containing more than 10 individuals were extracted from administrative datasets. Age ranges and moving dates for each residence were calculated from the minimum and maximum age at which the residence members moved into that residence. K-means clustering of the residence demographics was performed. These clusters were validated against existing data from sub-populations.
results 8259 residences containing more than 10 individuals were clustered into five distinct types of household: • One group of elderly people (mean range = 63-94 years), suggesting care homes. • Two groups contained young adults (mean ranges = 17-25 years, and 16-37 years), but one group contained much larger households. This group tended to move into the residence in September and suggests student halls of residence. • Two further groups contained varying ages (mean range = 5-48 years, and 7-74 years) and no obvious pattern of moving.
These groups suggest blocks of flats.

Discussion
Classification of residence type (e.g. student accommodation, flats, elderly communities) is possible using administrative data. However, household units exist within these residence types and previous work has found that it is difficult to extract individual households from, for example, blocks of flats.
conclusion This project has created residence type classifiers using administrative data. This gives more complexity and detail to potential research projects. Further work to identify households and household types (e.g. single parent families) within the residence types is needed.
Introduction Maternal health behaviours and psychosocial factors are associated with poor birth outcomes and increased hospitalisations throughout childhood. We examined birth and infant outcomes according to maternal age and adversity indicators using population-level healthcare data.

Methods
We extracted linked maternity baby hospital data from Hospital Episode Statistics for births between 2009 and 2012 in England. Indicators of maternal adversity were identified using ICD10 diagnosis codes for drug/alcohol abuse, violence or self-harm from any hospital admission up to two years before delivery. Mothers were categorised as current young (<21 years at current birth), prior young (<21 at first birth but not current birth) or never young (>=21 at first birth). Multivariable regression (adjusting for additional maternal/baby risk factors including deprivation) was used to determine the association between maternal exposures and preterm birth (<37 weeks gestation), low birthweight (<2500g), and unplanned readmissions and deaths within 12 months post-discharge from the birth episode. It contains multimedia resources aimed at improving self-management, including traditional information leaflets, interactive educational tools and videos. MDMW also offers users access to their clinical data via its novel electronic personal health record (ePHR). The ePHR sources data from primary care, secondary care, specialist screening services and laboratory systems including diagnostic information, demographics, process outcomes, screening results, medication and correspondence. These data provide a more complete overview of diabetes than is available from any single data source. Patients can use MDMW to share information with their healthcare teams, through automatic data upload, secure messaging and online discussion forums, further enhancing communication. They can also set and record their own realistic goals and receive highly tailored advice and guidance based on the current status of their results. We aimed to evaluate and analyse user experience of the records access service.

Method
In January 2015, an online evaluation survey was emailed to 3,979 active users of the MDMW ePHR to assess their experiences and perceived benefits. Survey Monkey was used to capture the results for each unique respondent, using a series of open and closed questions. The responses details the impact the system had on patient satisfaction and how it enhanced their ability to self-manage. results 1,095 (27.5%) active users completed the survey. Patients report that MDMW improves their knowledge of diabetes (90.3%) and their motivation to manage it better (89.3%). They believe it allows them to make better use of consultation time (89.6%) and means that they do not need to keep paper records (84.4%) or phone their doctor for results (85.2%). Users found graphs helpful to monitor changes (95.9%) and 83.5% said the system helped them meet their diabetes goals.92.1% said the system contained all expected features. 87% said it reminded users of information discussed at consultations. 85.7% said the system was up to date. 89.1% said the system was easy to use. 94.2% said that supporting information helped understanding of results. 93.7% said it helped them find information tailored to their own diabetes. 96.7% said they were confident the system was secure. 88.2% said it helped to manage their diabetes better. 89.8% said it helps them set goals. 96.4% said the system will significantly improve diabetes self-care in Scotland. Anecdotal feedback included: 'Immensely satisfied with the system. Really amazing to see my results online.Real motivator', 'I don't have to make appointments to see GP or nurse which is good as I work and don't like taking time off work'.
Discussion and conclusion MDMW is a highly effective intervention in the pursuit of supported self-management. Patients report enhanced knowledge and understanding of diabetes and improved motivation to make positive changes. MDMW is a key resource to engage patients in order to achieve strategic aims for the diabetes population in Scotland. We are actively pursuing opportunities to extend the service into other parts of the NHS.

Introduction
Learning of practical and motoric skills is essential in physiotherapy education, traditionally taught in a classroom setting through observation and repetition. It is important for teaching in general, but especially for physiotherapy students to receive immediate feedback during the learning process in order to correct mistakes and optimize competence. Thus, limited time of the teacher for demonstration and individual supervision is a challenge in the classroom setting. A preliminary study conducted in the Physiotherapy School of the RWTH Aachen University Hospital has shown that learners have a strong need for repeated demonstration, immediate feedback and means for tracking individual learning progress.
Method In this work, the wearable device Myo (Thalmic Inc) is used as an inexpensive consumer-grade human activity recognition device. It contains an eight channel surface electromyograph and a nine-axis inertial measurement unit. The signals can be evaluated with supervised machine learning techniques for different gestures.
The Media Didactics Meets Wearable Computing (MediWeCo) digital assistant based on the Myo device gives continuous individual direct feedback to the learners on the executed therapeutic techniques, enabling them to assess and self-govern their skill level outside of the traditional classroom setting. Individual learning, self-learning and individual repetition are thus intensified, independent of time and location. Furthermore, the digital assistant allows for individual and objective assessment of learners in examinations.
The estimated processing pipeline of MediWeCo includes a human-to-technology module, which establishes communication between the learning application and the Myo wearables. The module includes a gesture recorder for acquisition of movement patterns and a model generator to build gesture models. Evaluation algorithms utilizing machine learning to compare incoming signal data against defined models of movements. Here, the main challenge is the optimum feature selection, both from a learning aspect (what features should be taught, e.g. speed, timing and movement) and from a technical aspect (i.e. what characteristics of the data should be extracted).
results In prior work, we have employed the Myo armbands to evaluate performance of clinical hand hygiene technique using Artificial Neural Networks and Hidden Markov Models. The system achieved a recognition rate of 98.30% and should thus be efficient in the analysis of physiotherapy techniques as well.

Discussion
To cater for these needs, the MediWeCo project was initiated with support of the German Federal Ministry of Education and Research. During the project, a novel mobile blended learning solution for teaching and learning of physiotherapeutic skills and the examination of acquired techniques are implemented. A learning application provides theoretical training on virtual patients using various eLearning approaches. Additionally, interactive and multicam videos give an optimal view on movement as well as technique and allow decision processes to be followed during physiotherapy sessions, respectively. In addition to the didactical concept, the individual manual performance of physiotherapy techniques can be assessed with wearable sensors.
academic and commercial research on the genetics of health, disease and quantitative traits of current and projected public health importance. Features include the family-based recruitment breadth and depth of phenotype information, with detailed data on cognition, personality and mental health. Genome-wide association and exome genotype data are available on most of the cohort. These features maximise the power of the resource to identify, replicate or control for genetic factors associated with a wide spectrum of illnesses and risk factors. By linkage to routine NHS hospital, maternity, laboratory tests, prescribing and dental records, this has become a longitudinal dataset, using the Scottish Community Health Index (CHI). Mortality data can be obtained from the General Register of Scotland.
results Researchers are now able to use the linked datasets to find prevalent and incidental disease cases and healthy controls to test research hypotheses on a stratified population. They can also do targeted recruitment of participants to new studies, utilising the NHS CHI register for up to date contact details. There are six published papers on a variety of conditions and currently around 10 ongoing studies based on our record linkage capabilities. Expert working groups have been set up to further annotate data and co-ordinate research in the fields of genomics, cognition, mental health and chronic pain.
Discussion GS has now established and validated EHR linkage, overcoming technical and governance issues in the process. There are current or planned collaborations looking into heart disease, diabetes, breast and colon cancers, depression, neuropathic pain, Alzheimer's disease and dementia. Generation Scotland is also a contributor to major international consortia, with collaborators from many institutions from the UK and worldwide, both academic and commercial. Some of the GS:SFHS study can also be linked to other studies such as the Aberdeen Children of the Nineteen Fifties and the Walker birth cohort (Tayside) to obtain further longitudinal data and use the SHARE project to obtain new biological samples.
conclusion Generation Scotland has thoroughly tested the linkage process and plan to extend it to include primary care data (GP records) in the next year. There are plans to extend the cohort and collect more samples and data. The GS resources are available to academic and commercial researchers through a managed access process (www.generationscotland.org).

Archie Campbell, Laura Boekel, Caroline Hayward, and David Porteous, University of Edinburgh, Edinburgh, UK
Introduction The Generation Scotland: Scottish Family Health Study (GS) is a family-based genetic epidemiology study of ~24,000 volunteers from ~7000 families recruited across Scotland between 2006 and 2011 with the capacity for follow-up through record linkage and re-contact. Here we present linkage to routine NHS laboratory test results.
Method Participants completed a demographic, health and lifestyle questionnaire and provided biological samples including DNA, and most underwent detailed clinical assessment, including anthropometric, cardiovascular, respiratory, cognition and mental health. They also gave written informed consent for linkage to their medical records. Using the Community Health Index, access has been obtained to routine NHS lab tests from the Tayside and Glasgow area health boards. This covers ~90% of the GS cohort.
results More than 20,000 GS participants were genotyped using a genome-wide SNP array. We have extracted uric acid (urate) test results from the routine biochemistry dataset and performed genome wide association (GWAS) analyses with genotype and imputed data. This showed the expected significant result for the SLC2A9 uric acid transporter gene. This provides a proof of concept that the NHS laboratory data could be used for further GWAS (and other genetic) analyses of additional biomarkers, currently under way.

Discussion
The Genetic Annotation Expert Working Group of Generation Scotland has now established and validated linkage to routine laboratory data, overcoming technical and governance issues in the process. There is no national dataset, with source data being held in different locations, although it is all part of the Scottish Care Information database (SCI Store). Linked data is released into accredited safe havens, with access limited to approved researchers, and great care must be taken when dealing with individual record level data. The test results go back up to 25 years, and in some cases, the test methodology has changed or been recalibrated several times during that period.
conclusion Most laboratory tests are requested by GPs and the results could also be linked and accessed via the primary care dataset, which presents a new set of both opportunities and limitations. We plan to explore these over the next year. The GS resources and linked data are available to academic and commercial researchers through a managed access process (www.generationscotland.org).
Introduction Data science combines skills from computer science, mathematics, statistics and business processes. One response to concern about the sustainability of current models of healthcare delivery -due to increasing complexity and costs, coupled with a drive to personalized medicine -presents a strong argument for a step change in the use of data to guide healthcare delivery. The application of data science, comprehensively utilizing existing health data, will help unravel the relationship between treatments, outcomes, patients and costs. There is currently a shortage of data scientists in the UK and internationally. The Farr Institute is creating programmes to train data scientists. Each individual comes with expertise in a specific area that is developed alongside broader scientific and professional skills essential to health data science. Our initial aim was to create a PhD-led 'community of practice' that would value the sharing of knowledge and team working in order to encourage new collaborations and avenues of research.

Methods
We developed and implemented a cohort-based, team-science focused programme to develop key scientific and professional skills. PhD students from across all participating universities in The Farr Institute were invited to attend. We delivered student symposia, summer schools and industry engagement days focused on data visualization, mobile data and geographical systems alongside professional skills including research communication, writing for the public, conducting research with public involvement, interdisciplinary thinking and working delivered through a team science agenda. All events used standardized feedback surveys with results used to improve the programme going forward.
results Since 2013, 54 PhD students from 16 institutions across the UK have attended one or more events. The cohort has diverse backgrounds but all have a focus on data and are analysing health data directly or assessing how data is used within the healthcare system. The inter-disciplinary nature of the cohort means that each person gets an understanding for what others bring to the team science approach.
Discussion Students enjoyed a formal structured learning environment to share research knowledge combined with an informal networking opportunity allowing discussion of common issues with peers. Involvement of senior academics provided vision and encouraged collegiality. Workshops were designed to support group-problem-based learning and to highlight the need to change working behaviours and manage multiple perspectives and skills sets to achieve an outcome. Involvement by external bodies including patient representatives and industry has developed communication skills and increased awareness of other viewpoints. Sessions where the outcome was more open ended or the session was not as group focused were less satisfactory. More recently, the summer school model has adapted to grow the network and make international links using the same approach.
conclusions The Farr Institute PhD programme is successfully developing a community of young researchers, from a variety of disciplines, with a strong team-science focus. The future development of these students as they graduate and move into careers in the field will reflect the impact of the programme. Lessons learned from this cohort team science approach have been used to develop other programmes including a Future Leaders Programme.

Ltd, Ankara, Turkey
Introduction In order to address the challenges introduced by the increasing prevalence of chronic diseases with the aging society, EHRs are widely being adopted, and varieties of health-relevant parameters are captured via wearable devices and smartphone apps. There is also a growing movement called 'Quantified Self' to capture person's state and daily behaviours to establish a basis for preventive medicine interventions. The amount of healthcare and wellbeing data collected becomes unmanageable through conventional IT systems. This creates a big opportunity of big data analytics applications for healthcare data, where the requirements of high volume, velocity (real-time data from medical sensors), and variety (heterogeneity) of healthcare data sources can be addressed adequately.
Method In this study, we introduce the underlying architecture of the health data analytics platform that has been designed as a highly scalable ingestion stack respecting to the principles of Big Data Lambda architecture. The stack starts with the Inbound Adaptors acting as an interface to various health data sources such as EHRs, medical/tracking devices and mobile/Web/IoT applications. Several adaptors have been implemented for standards based communication: (i) HL7 CCD based clinical documents, (ii) ISO/IEEE 11073 compliant medical devices and (iii) Bluetooth LE enabled tracking devices. For device integration, an Android application has been implemented to collect data from medical devices and trackers, which also captures the measurements from the sensors on the phone and from Google Fit.
According to the nature of the analytics services, data received through the Inbound Adaptors follow either the batch layer or the speed layer of the ingestion stack. If real-time analytics is required, the Spark streaming based speed layer processes the data. Otherwise, data is processed by the batch layer through Apache Spark using Cassandra for data extraction, mediation, medical terminology mapping, summarization and complex rule processing. The results of the ingestion processing are received by the outbound adaptors to be exploited by the consumers such as data visualization services or machine learning algorithms.
results ISO/IEEE 11073 compliant blood pressure and blood glucose measurement devices, a wristband that provides skin temperature, heart rate and measures for movement through Bluetooth LE and an Android application collecting data from phone sensors and Google Fit have been implemented and integrated to the ingestion stack, and tested with in vivo data collection mechanisms. In the next phases, the project will focus on EHR integration.
Discussion Collected data will be used through a use case where the outbound results will be smart, adaptive and personalized interventions for diabetic patients so that the interventions resulting from the extensive analytics can lead to behavioural changes in those patients for reducing risks and improving quality of life.
Introduction With an ageing population, multimorbidity has become one of the main challenges in recent years for patients and the health care systems worldwide. Hence, it is crucial to characterise the problem in order to devise effective strategies to manage multimorbidity and move towards integrated healthcare rather than single-disease focus. Therefore, we aimed to identify patterns of multimorbidity and disease clusters in older adults from the UK.

Methods
Using the UK Biobank, we extracted data on 36 chronic conditions in 502,643 participants. Firstly, we assessed the prevalence of multimorbidity overall and by patient characteristics. We then assessed the patterns of co-existence between these conditions using a two-step approach initially using a cluster analysis to identify chronic conditions that cluster together and then using association rule mining to look at the patterns within these clusters more closely. We estimated support, confidence and lift for each association rule, using lift as the primary measure of significance. The results were presented using visualisations and summary tables. Subgroup analyses by age, gender and ethnicity were also performed.
results The overall prevalence of multimorbidity in the study was 19% (n=95,710) and no significant variations were found by age. However, multimorbidity was more common in in the black ethnic group compared to the others and increased with deprivation. 3 clusters were obtained with up to 30 association rules within each cluster. The first cluster resulted in two diseases that were highly associated (angina and heart attack with a lift of 13.27). The second cluster included conditions like osteoporosis, stroke, heart failure, atrial fibrillation, vascular disease, chronic kidney disease, etc. with diabetes at the epicentre of the cluster. Heart failure and atrial fibrillation had the strongest lift in this cluster (23.60). The third cluster was a medium sized, high prevalence cluster including conditions like hypertension, asthma, cancer and depression etc. with lift ranging from 1-5. Subgroup analyses showed variations of multimorbidity patterns by gender and ethnicity however, no variation was found by age groups. Generally, depression was at the centre of the disease association for the males while diabetes and osteoporosis dominated the association for the females. Similarly, diabetes was at the centre of the association for the white participants while depression and cancer dominated the associated for the non-white participants.
Discussion This study used data from a large, national well-defined population with high sensitivity to identify multimorbidity, based on a comprehensive list of chronic conditions included. We utilised a novel way of identifying patterns of multimorbidity and variations by age, gender and ethnicity and demonstrated the applicability of data mining techniques to medical data where its use has generally been very limited.
conclusion This study found certain conditions to be at the epicentre of disease clusters and focusing on better management and secondary prevention of conditions like diabetes and hypertension may help prevent other conditions in the clusters.

Elizabeth Ford, Brighton and Sussex Medical School, Brighton
Introduction Electronic patient records (EPRs) can be used to understand the rates of a medical disorder within the general population and provide medical practitioners with early warnings of which patients are at risk of a disease. Unfortunately, EPRs contain uncertain and incomplete data. Traditional epidemiological statistical methods do not translate well to this new environment. Current studies assume that a recorded diagnosis is correct. If a patient has a recorded diagnosis, it is assumed to be true, and likewise, if a patient does not have a diagnosis they are assumed not to have the condition.
As part of the Wellcome Trust funded ASTRODEM study, we trialled analysis methods borrowed from astrophysics which model the complexity of the data to find the rates of misdiagnosis, the probability of a condition, and probability that a patient has a condition given they have a combination of up to two other diagnoses.
Methods Real world data includes misclassifications, so we do not know the "truth". To demonstrate this method we, therefore, generated synthetic data. The probabilities of having combinations of three condition statuses (e.g. dementia, diabetes, obesity) and the probabilities of receiving subsequent diagnoses were decided, and 100,000 patients were randomly generated.
We fitted a hierarchical Bayesian diagnostic model containing 14 variables. We used Bayesian priors to represent the probabilities of misdiagnosis. Eight variables gave the probability of each combination of conditions (note this is not the combinations of diagnosis). The final six variables are the probabilities of having a diagnosis of a condition given a patient has or doesn't have that condition. The model fits combinations of parameters to the number of people with each combination of diagnoses. This assumes that these combinations can be combined to create a binomial parameter to explain the number of patients in a group. We compared this model to a logistic regression model.
results Using the Bayesian model we estimated conditional probabilities which were much closer to the real value. For example, for the probability of condition 1 (e.g. dementia) given condition 2 (e.g. diabetes) and 3 (e.g. obesity), conventional analyses that conflate diagnosis with true condition status predicted a value between 0.26-0.28, the Bayesian model predicted 0.34-0.44, and the true value was 0.375. We could also estimate the probability of receiving a diagnosis for condition 1 in the case that the patient has the condition and doesn't have the condition. Conventional analysis cannot estimate these values from our dataset.

Discussion & conclusions
This study provides evidence that where rates of misclassification or misdiagnosis are not taken into account, estimates of the association between two conditions can be wholly inaccurate. This method can recover the true association, as well as giving information about probabilities of diagnosis that are not available using conventional methods.
Three conditions were chosen to prove the principal of this technique, but this can be increased to many more conditions. This method will be expanded further to include time series information. The method will be applied to real world dementia data during the ASTRODEM project.
Introduction There are known large socio-demographic inequalities in the risk of being diagnosed with cancer through an emergency presentation. While there have been welcome decreases in the overall proportion of emergency presentations (24% in 2006, 20% in 2013), it is important to also consider whether inequalities by age and deprivation group are narrowing or widening over time.
Methods We analysed population-based 'Routes to Diagnosis' data for patients diagnosed in England between 2006 and 2013, with any of 33 cancer sites (including all common cancers and several rarer ones). Logistic regression was used to examine the association between diagnosis through emergency presentation and age (25-49, 50-59, 60-69, 70-79, 80+ years) deprivation (defined by Index of Multiple Deprivation quintiles), sex, cancer site, and year of diagnosis. Interactions between age and year, and between deprivation and year, were used to examine whether inequalities narrowed or widened over time.
The regression model was also used to estimate the proportion of patients in each age group and each deprivation group diagnosed through EP in each year, adjusting for the different sex, cancer and age/deprivation make-up of the different groups.
results In 2006, there were notable inequalities in risk of diagnosis through emergency presentation by age-group (among 50+y olds adjusted proportions ranged from 17.7% for 50-59y olds to 33.3% for 80+y olds) and deprivation (least deprived: 20.1%, most deprived: 27.2%). There was strong evidence that these inequalities changed over time (p<0.0001 for both interaction terms). Nonetheless, the magnitude of these changes was overall small. Considering associations with age, we saw a narrowing of inequalities in <80y olds, although the difference between the very old (80+y) and patients aged 70-79y increased over time. Specifically for deprivation, we observed only a small degree of narrowing of inequalities in emergency presentation over time. Predictions from the model indicated that if there were no inequalities in risk of emergency presentation by deprivation, nationwide the proportion of patients diagnosed through that route during the study period would have decreased from 21.4% to 18.7%.

Discussion
Although rates of diagnosis of cancer through an emergency presentation have decreased in recent years, there has been little reduction in related inequalities, which are even widening for the oldest patients.
Introduction Positive experiences of care are increasingly considered important for cancer patients. Different routes (pathways) to the diagnosis of cancer may influence the experience of subsequent cancer care. If so, optimising diagnostic routes may help improve cancer patient experience among other clinical outcomes. However, formal evidence about the presence, direction and strength of associations between routes and experience is lacking.

Methods
We examined associations between diagnostic routes and experience for patients with colorectal cancer using linked data from the Cancer Patient Experience Survey 2010 and the Routes to Diagnosis dataset. Routes to Diagnosis denote eight different care pathways to the diagnosis of cancer, including screening detection, emergency presentation, 'urgent' specialist referral for suspected cancer to secondary care (otherwise known as 2-week-wait referral) and non-urgent referral.
We selected 10 'report' and 8 'evaluation' questions from the survey instrument, representing all major aspects of the patient journey from diagnosis to post-treatment care. 'Report' questions reflect actual processes of care (e.g. whether patients were provided with access to specialist nursing), whereas 'evaluation' questions reflect the appraisal of how a process was experienced (e.g. whether patients felt that explanations about the treatment were adequate).
We categorised responses as binary (positive/negative) experience outcomes, consistent with public reporting conventions for CPES. For each selected survey question separately, we describe case-mix adjusted proportions and odds ratios of patients endorsing a negative experience of care. Adjustment was made for age (5 groups), sex, deprivation quintile, ethnicity (white/ non-white) and major cancer type.
results Across the 18 survey items, we observed consistent associations between diagnostic route and care experience, with evidence (p<0.05) that patients diagnosed after an emergency presentation evaluate their experience more critically than those diagnosed via a 2-week-wait referral route for 12/18 questions (adjusted odds ratio for negative experience ranging from 1.19 to 2.96). For the remaining 6/18 questions the associations were not large enough to be statistically significant. In absolute terms, across the 12 questions the proportions of patients indicating a negative experience were between 4% and 23% greater (adjusted for case-mix) compared with patients diagnosed via the 2-week-wait referral route.
In contrast, screen-detected patients reported significantly better experience for 7/18 questions (adjusted odds ratio for negative experience ranging from 0.52 to 0.83). For the remaining 11/18 questions the associations were not large enough to be statistically significant for 10. In absolute terms, screen-detected cases reported between 3% and 9% more positive experience across the 7 questions (adjusted for case-mix) compared with patients diagnosed via the 2-week-wait route. Associations between diagnostic route and experience were similar for both evaluation and report items.
Discussion Diagnostic routes influence cancer patient experience notably. The fact that this influence is seen in both report and evaluation questions suggests that the impact is reflected in actual quality of care rather than simply a perception of care (potentially reflecting prognosis).
conclusion Expanding the pool of screen-detected cases and decreasing the proportion of emergency presenters could result in improvements not only in clinical outcomes but overall experience of cancer care.

Method
The safe share project is piloting co-designed solutions with key stakeholders to provide a high volume, encrypted, VPN network between research centres. The architecture is illustrated in the following diagram: The project has purchased, set-up, installed and tested high performance network routers at the pilot sites and linked them through a new pilot central infrastructure at the Jisc shared data centre at Slough. This infrastructure allows dispersed research groups to directly connect their data safe havens and environments securely e.g. without exposure to the rest of the institution's network. The interconnectivity can be at difference levels of information governance and IT security to reflect the information governance requirements of groups of collaborating research centres. These have been called "service slices" for the project and those used for the pilot are:

Legand
• "Farr Institute" at ISO 27001 with suitable scope, equivalent to NHS Digital Information Governance Toolkit compliance • Administrative Data Research Network policy at the government's "official" security level • Public Services Network compliance for local authority involvement results The network element of the project was tested with Swansea/Cardiff and Manchester/Leeds until February 2017. The testing has indicated that the infrastructure will be effective. An early observation is that the pilot has been difficult to incorporate in a couple of the other proposed sites because they did not want to disrupt the accreditations of their existing data safe haven at this point.

Discussion
Based on the testing and the support of those sites not involved in testing, a business case has been approved by Jisc for the network component of safe share to become a national service in Spring 2017. "Safe share connectivity" will be offered as a subscribed service to the UK research community, including government and commercial research partners.
Introduction The increasing use of information technology in healthcare offers new opportunities to enhance the quality and safety of prescribing. The pharmacist-led information technology intervention for medication errors (PINCER) trial demonstrated how pharmacists working collaboratively with general practitioners to act upon electronic medication safety data reduced potentially hazardous prescribing in primary care. Based on this, we worked with relevant stakeholders to develop the Salford MedicAtion Safety dasHboard (SMASH), a novel, interactive audit and feedback intervention designed to identify instances of potentially hazardous prescribing thus facilitating optimal use in practice. The dashboard interrogates electronic health records using a set of 13 medication safety indicators, and presents the resulting information to healthcare professionals in both aggregated form and as lists of individual patients with potential safety hazards. We explored the ways in which the dashboard was used in general practice by clinical pharmacists, GPs and other GP staff.
Methods Eighteen semi-structured interviews were conducted with pharmacists and a range of practice staff from nine General Practices within Salford. Interviews were audio recorded and transcribed verbatim. Contemporaneous field notes from three non-participant observations undertaken with pharmacists whilst working with the dashboard in practices were included in the dataset. We adopted a template analysis approach that was iterative and concurrent with data collection. Emerging themes were developed into coding frameworks and discussed across the research team.
results Pharmacists and clinicians did not see any negative outcomes in using the dashboard but highlighted a number of patient safety benefits including improving medicines management systems and prescriber education. Dashboard use varied across practices: in some only the pharmacist used the dashboard and reported back to GPs, while in others GPs took a lead or there was more collaborative work. Such role allocation and division of labour depended upon the needs and priorities of the practice and the relationships established between the pharmacist and GP staff. These contextual factors influenced ways of working with the dashboard. We identified two strategies. Reactive working focused primarily upon interventions for individual patients affected by the indicators, such as changing medications. Proactive working involved prioritising specific indicators, providing feedback and education to individual clinicians, and changing the ways the practice prescribed or monitored medicines use.

Discussion
The ways in which electronic patient safety dashboard interventions are utilised in primary care may depend upon the perceptions of users as to the ease and value of using the tools organisational dynamics within practices and individual and local priorities. Consistent with previous literature we found that there were different ways of working in response to the dashboard. For some users of the dashboard its functions were not only about making changes to patient medication but developing systems and providing prescriber education within practice.
conclusions A novel interactive electronic medication safety tool can have an impact upon patient safety and lead to enhancements in the safety of prescribing. The complex dynamics of primary care may lead to different strategies being employed in the use of the technology.
Introduction Testing potential drug treatments in animal models is a crucial part of preclinical drug discovery however, subsequent clinical failures of drugs show that animal studies do not always reliably inform clinical research. Many reports have drawn attention to the need for more systematic, rigorous, and objective review of preclinical data prior to exposure of a potential drug to human populations. Currently, such reviews are performed manually and involve analysis of large quantities of published articles and internal proprietary reports. Text mining methods could offer substantial aid in this time-consuming task.

Methods
In this study, we use text mining to extract information from the descriptions of over 100,000 drug screening experiments (bioassays) in rats and mice. We retrieve our dataset from ChEMBL -a literature-based bioactivity database focused on preclinical drug discovery. Our novel analysis of these data uses natural language processing techniques to parse the assay descriptions and mine them for information about animal experiments: genetic strains, experimental treatments, and phenotypic readouts used in the assays. To this end, we use a text mining approach that leverages existing vocabularies and manually crafted extraction rules. To automatically organize the extracted information, we construct a semantic space of assay descriptions using a neural network language model, Word2Vec, and train several assay classifiers based on the generated semantic vector representations.
Introduction People often cite high cost as a barrier to eating more fruits and vegetables, despite this food group's association with lower levels of obesity and chronic illness. Grocery store scanner-data can help public health authorities to identify environments where healthy food is unaffordable. In an innovative use of this transaction data, we describe spatial and temporal variation in the price and affordability of fruits and vegetables.
Methods Over two million records of geo-coded grocery store transaction data were available over six years in Quebec (2008-2013) on a weekly basis from Nielsen Company. A standard basket of 25 types of fruits and 31 types of vegetables was created based on products with the highest volume and dollars sold. A smoothed monthly time series of the price per serving of each basket was constructed across 56 regions in Quebec by taking the monthly mean of prices across the years. Time series were also constructed on a yearly basis. For both the overall and yearly time series, monthly residuals for mean price (the difference between the overall mean and the regional mean) were determined for each region.
results The overall median price per serving for the standard vegetable and fruit basket was $0.40 (range $0.29, $0.51) and $0.44 (range $0.34, $0.55), respectively. For the vegetable basket, the most expensive regions' prices varied from $0.21 to $0.26 above the mean depending on the month. This difference from the mean was largest from November to January. For the fruit basket, the corresponding estimates varied from $0.08 to $0.15 above the mean, with the maximum difference observed in November. The yearly time series suggested that the largest spike in price relative to the mean occurred in 2010, and the range was lowest in 2013.

Introduction
Monitoring clinical pathways and outcomes to measure and assure quality in patient care is a process that has numerous stakeholders within a healthcare economy. Multiple statistical methods have been developed to monitor clinical outcomes from different sources of healthcare data. 1, 2 These methods require knowledge of statistical and data visualisation techniques to develop reports and charts for interpretation. However, the interpretation of such material is often carried out by business users without such knowledge. The objective of this research was to design and develop a prototype application that will enable technically advanced users to declare and import datasets, apply risk-adjustment methods and generate indicators. Business users will be able to interpret the results through data visualisations, thereby assisting the monitoring of the quality of clinical processes with the intention of improving patient outcomes.

Methods
The user requirements of the software prototype were initially established through carrying out a series (N=8) of users and a review of relevant software products on the market. By appealing to the concepts described by Wilkinson, 3 the design of data visualisations such as funnel plots 4 and Variable Life Adjusted Displays (VLADs) 5 were abstracted and generalized. Similar concepts were employed to facilitate the declaration of risk-adjustment methods. The established user requirements and abstraction of chart declaration and risk-adjustment methods were combined and used to design a logical model that formed the basis for the prototype.
results A prototype was developed that enables the user to define and import datasets, to declare the risk-adjustment models used to generate indicators from the imported datasets, and to generate the charts used to present these indicators. The prototype has been tested for accuracy and consistency. Using an established evaluation framework, 6 a representative user base (N=7) reviewed the software and evaluated it against similar commercially available software packages.

Discussion
While the evaluations showed that the prototype proves the concept of the research and advances the technology in the applied area, they also showed that significantly more development is required to construct an application that can be used across healthcare economies.
conclusion It is possible to produce analytics software that produces intelligence that is entirely configured by the analyst and/or pre-configured for wider business use, bridging the gap between the analyst and business user, facilitating the wider use of advanced analytics in health. references Abstract no. 415 exploring the gap between mHealth app design practices and knowledge derived from scientific research

Victor Panteleev, Linda Peute, and Gaby Anne Wildenbos, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
Introduction The mobile health (mHealth) market is expanding; there were over 165.000 mHealth apps available on the market in 2015, and iOS apps increased by 106% compared to 2013. Safety, usability, and user acceptance of mHealth apps advance if clinical design guidelines are adhered to during their design. Yet, it is unclear whether mHealth app designers use and adhere to design guidelines that have a clinical focus on usability (e.g. Health Care Information and Systems Society (HIMSS) mHealth guidelines). The objective of this study is therefore to explore the usage and willingness of use of clinical design guidelines in mHealth app developers, to assess how they take safety and usability aspects into account during mHealth app design.
Methods In a pilot study, Dutch mHealth app designers (n=4) were asked to complete a semi-structured questionnaire with six open-ended questions. The questionnaire contained questions regarding guideline usage and acceptance of clinical design guidelines.
results Two app designers were instructed by their employers to use design guidelines. Three respondents were free to use any guidelines. One respondent had to use guidelines provided to them. Guidelines used were: Apple iOS (n=4) and Android (n=2) Human Interface Guidelines, and UI Guidelines for Windows Mobile (n=1). No respondents used HIMSS (or other) design guidelines. Three respondents surmised there exists a need for clinical design guidelines for specific patient populations.
Discussion During mHealth app design, designers used standard design guidelines (iOS, Android, Windows Mobile); clinical design guidelines were not used. Possible reasons for non-usage of such guidelines are that they are too abstract to apply, too extensive to use due to time limitations, or due to a lack of knowledge on their existence by designers or employers. Considering that three designers were free to choose any guideline, it is likely that they were unaware of the existence of clinical design guidelines. This could be explained by the tendency of these types of guidelines to be published in sources generally unexplored by designers (e.g. scientific journals).
This pilot study was designed to explore if there is a gap between the practice of mHealth app design, and knowledge derived from scientific literature regarding optimal mHealth app design, and to determine the feasibility of further research. This gap exists in this study, and to bridge it, more extensive research on the practical usage of clinical design guidelines will be necessary, especially because, to our knowledge, there are few to no publications on this topic. Usage of recently developed guidelines for mHealth app designers by the Federal Trade Commission and the European Commission should be researched also.
conclusion Adherence to clinical design guidelines is important for the safety and usability of mHealth apps. mHealth app designers seem unacquainted with clinical design guidelines, but often show a willingness and need to use them. To bridge the gap in knowledge on safety and usability of mHealth app design gained by scientific research and the actual practice of designing mHealth apps, designers should be provided with simple to use, yet effective clinical design guidelines.
Introduction National Therapeutic Indicators (NTIs) are primary care prescribing measures for the General Practitioner (GP) practices across all of Scotland. They were first published in 2012 covering 10 therapeutic topics to support medicines management and quality improvement to enhance work already embedded within Scotland Health Boards (HBs). NTIs are built using a national administrative database of prescriptions dispensed in community pharmacies creating comparative feedback at various organisational levels (for example GP practices within HBs and HBs within Scotland). In the first year, the Scottish Quality Prescribing Initiative (SQPI) provided additional funding to GP practices (approximately £800 for an average sized practice) to achieve specific targeted prescribing changes in two of the 12 NTIs.
Method Six of the 12 original NTIs were purposively selected for in-depth evaluation based on the perceived clinical:cost benefits of each (low:high n=1, medium:high n=2, high:medium n=1, high:low n=1, medium:medium n=1 ). Initial analysis used segmented regression analysis of interrupted time series data to examine the hypothesis that NTI and NTI plus SQPI introduction and cessation (if applicable) would be associated with a change in prescribing other known interventions such as regulatory risk communications were also modelled. Subsequent analysis used Join Point Analysis (JPA) to identify any changes in prescribing trend in the data (as opposed to pre-specifying the intervention time). Variation in impact by Health Board was explored.
results Interrupted time series analysis (ITSA) of the quinine NTI (medium clinical and cost benefits) for all Scotland (figure 1) identified significant decreases in prescribing following a national drug safety warning in June 2010. A greater decrease was seen following the introduction of the NTI but this waned once the NTI finished. The estimated relative decrease in prescribing following NTI introduction was 27.2% (95% CI 18.8% to 35.5%) at 12 months and 22.7% (95% CI 14.0% to 31.5%) at 24 months. The reduction was larger among GP practices who chose quinine as an SQPI and so were financially incentivised to respond (figure 2) 57.2% (95% CI 46.5% to 68.0%) at 12 months and 50.5% (42.1% to 58.8%) at 24 months. JPA identified that quinine prescribing trends decreased in all health boards following the national drug safety warning and prior to the NTI start, but with considerable variation in the timing and size of this decrease. In 9 of the 14 Health Boards additional decreases were seen after the NTI introduction. Summary results for the other five NTIs will also be presented.
Discussion At Scotland-level, there was evidence that NTI introduction led to a reduction in the targeted prescribing, which was greater in those GP practice incentivised to act on this. JPA shows that there is considerable variation in impact at Board level depending on whether and when they responded to the regulatory risk communication. We conclude that 'national initiatives', such as risk communications and NTIs are effective but are mediated by local action, and we need to understand better how to influence this. Introduction There is a gap between the ideal needs of the research community and the ethico-legal restrictions on the use of personal healthcare data. On the one hand the research community would want unfettered access to data, on the other it is the legal responsibility of data owners to guarantee the privacy and confidentiality of the patients. The best way of achieving and maintaining a balance between trust, security and relative openness is an open question the common approach is to use a Trusted Third Party, but suffers from the problem of establishing mutual trust, does not scale and its operations can be opaque.
The technologies underlying cryptocurrencies -the blockchain -tackles the exact same problems that are faced in the world of medical data, namely the decentralisation of trust, the use encryption technologies and the integrity of series of transactions 1 . Newer blockchain implementations also bring with them the ability to execute arbitrary code across such decentralised platforms.
Method A proof-of-concept implementation was to be developed using distributed ledger technology to demonstrate four key concepts. The first was to allow patients to grant access by research organisations to their medical records for research. The second was to allow research organisations to notarise research proposals and push proposal to patients asking for granting of rights to data. The third was to allow existing data controllers to act on behalf of patients in terms of granting access, thus simplify key management. Finally, to allow auditing of access requests and outcomes via transaction recorded in the distributed ledger, including allowing patients to verify when their data has been accessed and for what purpose.
results A private instantiation of an Ethereum 2 based blockchain was deployed and smart contracts were used to implement personal data sharing preferences. The smart contracts were written in the Solidity 3 language and deployed to the blockchain. A web-based user interface was developed that enabled the participating actors to read and write from the underlying blockchain.
Discussion This solution demonstrates that we can eliminate the requirement for trusted third parties to enforce and manage data sharing in healthcare using a distributed ledger. Key management is always complex and remains an issue for widespread deployment, though delegation may help here. The integration of a distributed ledger solution with the "real-world" of electronic health record systems or the role that the blockchain could play in data linkage or federation of systems remains to be explored. Future work must also investigate the addition of rudimentary 'payment' infrastructure such as 'proof of hosting' or as community oriented payment for data use system.
conclusion Whilst this prototype demonstrates the feasibility of creating, matching and enforcing data sharing agreements using smart contracts deployed on to a blockchain, there remains much further research to do before a production system could be deployed at scale.
references around the topic of 'Health Data' in collaboration with community workers and older citizens. Over a 6-week period, participants learned how to create, edit and upload their community reports. We helped them to become familiar with the Health Data topic by briefing them during the first session, allowing them to test an activity tracker themselves, and organising interviews with others who tested an activity tracker (i.e. participants in local senior exercise classes). All community reports resulting from the training were uploaded onto the international Community Reporters website (https://communityreporter.net tagged as #datasaveslives). Two members of our team independently viewed each report and assigned labels to characterise its content. Through discussion of the labels, they then identified the most prominent themes arising from the reports. Project updates were posted on a dedicated webpage (www.herc.ac.uk/research_project/community-reporters).
results All participants created a community report, despite some of them having no or limited experience with technology at the start of the training. From the analysis, the following motivators emerged for using health data: an intrinsic interest in personal health data inform communication with healthcare professionals perceiving devices, such as an activity tracker, to reflect a positive attitude towards personal health. Trepidation related to technology was also a common theme, with many participants highlighting the need for dedicated support for using health data in the future. Setting targets for physical activity was controversial, triggering lively discussion around issues such as: unrealistic expectations, unwanted interference in daily life, and effect on health behaviour. There was a clear enthusiasm to engage with health data research, for which the main motivator seemed to be altruism.

Discussion
We used community reporting as a novel method to explore older people's view on health data. It not only allowed our participants to express their views, but also equipped them with the skills to use technology and create videos to tell their stories. Additionally, we created a strong community network, which will facilitate engagement of older people in future research.
Introduction Linkage between data from bloodstream infection (BSI) surveillance by Public Health England to admissions to neonatal units (NNU) enables risk-adjusted analyses of rates of BSI. Linkage is deterministic when it relies on exact matches and probabilistic when probabilities are used to determine links. As completeness and quality of patient identifiers used for linkage improve, probabilistic linkage methods may have limited benefit. We aimed to determine whether more links could be identified with the use of probabilistic linkage in the presence of improved identifiers, or whether deterministic linkage alone was sufficient to achieve a high rate of linkage.
Methods Data was extracted from Public Health England's Second Generation Surveillance System (SGSS) of positive isolates of bacteria or fungi from blood and cerebral spinal fluid in babies aged 0-12 months in 2010 to 2015. Data on admissions to NNUs for singleton births was extracted from the Neonatal Research Database (NNRD) and linked to infection episodes in SGSS. We used deterministic linkage to classify records with the same NHS number in each dataset as links. Probabilistic linkage compared records in NNRD and SGSS, and assigned each pair of records a probability that they belonged to the same baby based on agreement on the identifiers: postcode prefix, postcode suffix, date of birth, sex and hospital. To reduce the number of comparison pairs, blocking on prefix, suffix and date of birth was used so only pairs that agreed on at least one of these were included. Classification as links required the specimen date to be between the admission date and two days after NNU discharge.
results Completeness of NHS number was 94% in NNRD across all years but in SGSS data improved from 69% Discussion Over a quarter of links identified were not found by deterministic linkage using NHS number alone, resulting in underestimation of the rate of infection. Probabilistic linkage substantially increased linkage, even as the completeness of Introduction Harnessing the widespread patient data collected across the UK National Health System (NHS) would provide a wealth of information to improve patient care beyond that possible from analysis of national registry data alone. This requires suitable technical and governance infrastructure for efficient data extraction, integration and analysis. The challenge is ensuring high coverage, granularity and quality of data, automation and clinical involvement to create valuable datasets for audits and research. We developed a local infrastructure as part of the NIHR Heath Informatics Collaborative (NIHR-HIC) and other programmes.
Methods We extract clinical data from a number of systems composing the electronic patient records. We collect national registry datasets, patient level primary care data and air pollution data. We obtain omics data from existing studies. The core infrastructure is based on SQL Server Integration Services. We developed software components for accurate patient matching, interpretation of unstructured information and data cleansing before import into a data warehouse, using master data services, data quality services and natural language processing (NLP). Anonymised data are loaded as studies into an enhanced tranSMART. We use metrics to track data completeness for exemplar studies and engage with clinicians to review data quality.
results To date, we have collected high granularity and quality data across clinical areas. Data completeness ranges between 60% and 97%. The renal transplantation theme defined 250 attributes and collected data currently totalling 7,546 transplants. Genotyping, gene expression and other omics data were obtained from the GRAFT, WTCCC3 and KALIBRE studies that total ~3,000 donor recipients pairs, and the overlap with HIC patients was assessed. NLP of 4,398 renal biopsy text reports extracted diagnosis of recurrent disease subtypes, which will be evaluated against a subset of manually assessed biopsies.
Other clinical areas include Acute Coronary Syndromes (52,693 GSTFT patients, 150 attributes), Critical Care (8,573 GSTFT patients, 260 attributes), five Cancer types and Hepatitis. Pseudonymised primary care data for ~360,000 Lambeth patients are hosted at GSTFT. Data have been used to inform patient care planning, health needs and in research. We linked air pollution and stroke data from south London and investigated associations between particulate matter and risk of stroke subtypes.

Discussion
The developed infrastructure allows data from local and national sources to be linked, cleansed and pseudonymised before being loaded to a datastore. NLP of biopsy reports extracted information on patient outcomes that would not have been possible to do manually. Metrics helped improve data quality. Anonymization before data sharing and analysis ensured good information governance. Transplantation data were analysed for audits informing clinicians on graft function, survival, rejection rates, and cardiovascular outcomes. We are carrying out an exemplar study looking at recurrent disease incidence in transplanted kidneys and the clinical and genetic factors that influence its development.
conclusion The use of the infrastructure in clinical audits demonstrates the value of re-using NHS patient data to enable service and data quality improvements, enhance our understanding, and perform potentially novel research in the UK, as planned in the exemplar studies, to ultimately benefit patients.
Introduction According to estimates, there are 11.6 million dogs and 10.1 million cats kept as pets in the UK, with 30% and 23% of households owning a dog and cat respectively. These animals suffer a wide range of important diseases that impact not just on their own welfare but that of their owners. Despite the size of these populations, they have historically developed in the absence of coordinated surveillance, likely reflecting, at least in part, a relative absence of notifiable / reportable diseases in these populations, leading to a similar lack of momentum at government level to instigate national surveillance programmes as exist more typically for farmed species. We will describe a globally unique health informatics network in small companion animals that provides new opportunities for Health informatics research.

Methods
The Small Animal Veterinary Surveillance NETwork (SAVSNET) has assembled a strong coalition of collaborators and data providers allowing the collection of real-time electronic health data from veterinary diagnostic laboratories (~80,000 test results/day) and a sentinel network of over 450 veterinary clinics (~5000 EHRs/day) across the UK. From veterinary practice, each electronic health record (EHR) contains the owner's postcode and a range of animal data including age, species, breed, gender, treatments and the clinical free text. At the end of each consultation, a compulsory syndrome code is added by the attending practitioner. In addition, a more detailed questionnaire is randomly applied to approximately 10% of animals. From laboratories, each test result is accompanied by the practitioner postcode as a surrogate of the owner / animal location.
results In total, EHRs are now available for almost 2,000,000 consultations (approximately 70% from dogs, 26% cats, 1% rabbits, 2% other species). Compared to their carnivore cousins, cats are generally older, more likely to be non-pedigree and more likely to be neutered. Key areas of current research include antimicrobial resistance mapping, description of antibacterial prescription, as well as real-time outbreak monitoring in both animal-only and "one health" settings, notably gastroenteritis. Using text mining approaches, we are unlocking the research value previously hidden in the clinical narrative of each EHR, and pointing to a future that is less reliant on practitioner coding. For example, by using often incidental reference in clinical narratives to ticks observed on animals, retrieved by text mining, we can now generate temporal and spatial maps of tick activity across the UK. Such data could be used to inform both animal and human health. Through the owner postcode, we are linking EHRs to other data sources and understanding the impact of climate and owner predicted deprivation on disease and treatment.
High-quality and efficient health care seems not possible nowadays without the support of information technology (IT). To verify that appropriate benefits are forthcoming and unintended side effects of health IT are avoided, systematic evaluation studies are needed to ensure system quality and safety, as part of an evidence-based health informatics approach. To guarantee that health IT evaluation studies are conducted in accordance with appropriate scientific and professional standards, well-trained health informatics specialists are needed. The objective of this contribution is to provide recommendations for the structure, scope and content of health IT evaluation courses. The overall approach consisted of an iterative process, coordinated by the working groups on health IT evaluation of EFMI (European Federation for Health Informatics), IMIA (International Medical Informatics Association) and AMIA (American Medical Informatics Association). In a consensus-based approach with over 80 experts in health IT evaluation, the recommendations for health IT evaluation courses on the master or postgraduate level have been developed. The objectives of an evaluation course are as follows: Students should be able to plan their own (smaller) evaluation study, select and apply selected evaluation methods perform a study and report its results and be able to appraise the quality and the results of published studies. The mandatory core topics can be taught in a course of 6 ECTS (European Credit Transfer and Accumulation System) which is equivalent to 4 U.S. credit hours. The recommendations suggest that practical evaluation training is included. The recommendations then describe 15 mandatory topics and 15 optional topics for a health IT evaluation course. Follow-on activities which are desirable as part of this continuous educational development program are now: consulting a wider stakeholder group on the recommendations, validating the contents though use and review in academic practice, considering the distillation of a subset to form a module on appreciation of health IT evidence, and evaluation in generic health management programs. We invite all teachers of health IT evaluation courses to use these recommendations when designing an evaluation course, to add their course description to, and to report on their experiences. We also invite feedback on the use of the principles of this module as a means of instilling an evidence-based approach to health informatics application in wider health policy and health care delivery contexts. For further information, see https://iig.umit.at/efmi/.
Introduction Infectious disease remains a major threat to global public health, exacerbated by the rapid development of bacterial resistance to antibiotics. The recent review on antimicrobial resistance calls for the development of surveillance systems to ensure health systems, doctors and researchers can make the most of 'big data'. This work set out to pilot the development of a pathogen surveillance system through the linking of bacterial genome data with the electronic health record of the individual affected.
Methods 1,000 Campylobacter isolates from 800 people were collected through the Public Health Wales microbiology laboratories over 12 months and labelled with a sample ID number. The identifiers of the infected person (NHS number, name, DOB, address) and sample ID number were held in the Public Health laboratory. The isolates were sent to Bath University for genome sequencing. The identifiable data was sent to a trusted third part within the NHS to assign an anonymised linking field alongside the sample ID. This information was then transferred to the Secure Anonymised Information Linkage dataset to enable linkage to general practitioner and hospital records. This system then allows bacterial genome data linked to an encrypted sample ID to be linked to the patient medical records. This pilot study selected those patients who had cancer (using GP/hospital data) matched for age, gender and socio-economic status with those who had benign tumours. Their encrypted study IDs were unencrypted by the trusted third party to inform the genome sequencing laboratory which samples to genotype. This formed a nested matched case control study.
results The pilot study identified some issues that would need resolving for the development of a larger surveillance system. For example, infection is common in new-born infants, but these infants often do not have an NHS number, correct baby name and are not registered with a GP. Thus, work is needed to track these samples through the mother if they are to be included in a surveillance system. In this pilot a larger than expected number of patients had a diagnosis of cancer. However, without the source of the sample it is not clear if the majority of samples come from cancer clinics or if this is evidence of infective trigger in the development of cancer. Thus, future work should collect the source (GP, hospital, specialist clinic) of the sample. The results of the case control analysis can be presented in the conference.

Introduction
In shared decision making (SDM), physicians and patients cooperate to refine medical decisions considering both the available clinical evidence and the patient's personal preferences. Patients' preferences may be quantified as utility coefficients (UCs), indicators measuring the quality of life perceived by the subject in relation to the health conditions he/she might experience in response to the considered clinical options. To elicit UCs, we developed UceWeb, 1 a web-application implementing five elicitation methods: time trade-off, daily time trade-off, standard gamble, willingness to pay, and rating scale. These elicitation methods may suffer from some bias due to specific characteristics of either the patient or the considered health condition. For example, the time trade-off (TTO) method may be inappropriate when the considered condition lasts less than one year. 2 The literature suggests that these situations may lead to elicit unreliable UCs, ultimately leading to sub-optimal decisions.
Methods In this work, we propose a rule-based decision support system for targeting the choice of the elicitation method to the considered patient and health state. We formalized twelve rules on the basis of the collected literature evidence. Each rule suggests (or advises against) the use of a specific method when a defined condition is verified. Rule conditions address the patient's characteristics (e.g. he/she is unemployed) or the health state nature (e.g. it is a short-term disease). For example, one rule advises against the TTO if the considered health state is short-term. We assigned a reliability score to each rule according to the relevance of the supporting evidence. We then implemented an engine that triggers the rules by matching the rule conditions with actual patient and health state data, evaluates the triggered rules, and provides a recommendation for each method.
results We integrated the proposed system into the UceWeb application. To support elicitation procedures, the recommendation for each method is presented as a traffic light, whose colour summarizes whether the method is suggested or not for the current elicitation. The recommendation also provides the list of the triggered rules, complemented with its supporting evidence.

Discussion
For testing purposes, we asked 50 healthy volunteers to elicit UCs for specific health states that are supposed to trigger four of the formalised rules. We are currently analysing the resulting UCs in light of the collected literature evidence.
In particular, we are considering whether known unwanted effects on elicited UCs (e.g. saturation of TTO coefficients for shortterm diseases) can be observed in our data.
conclusion To our knowledge, UceWeb is the only elicitation tool providing decision support for targeting the choice of the method to the specific elicitation, facilitating SDM in clinical practice.

Methods
The study was done to investigate if PLC would interfere with 20 medical devices in tests of conductive noise and radiated electromagnetic fields, after confirming that the electric power supply was clean. The subjects were five pieces of ultrasonic diagnostic equipment, three electrocardiographs (ECG), two patient monitors, two pulse oximeters, two respirators, and one each of the following: defibrillator, manual defibrillator, electroencephalograph (EEG), medical telemeter, infusion pump, and syringe pump.
results We found malfunctions of only two pieces of older ultrasonic diagnosis equipment, caused by conductive noise. Superimposed noise was found on one EEG, however, it was only seen when the power line of the PLC modem was intentionally placed in contact with an electrode or terminal box. No noise would be superposed on this type of EEG when in normal use. We did not observe malfunction related to the use of PLC by any of the other tested devices. conclusion Our results indicate that hospitals should carefully monitor older devices to prevent malfunction. Older devices may be problematic because many hospitals use them because of limited finances. It is also important that hospitals improve the electromagnetic environment of the rooms in which devices with weak biomedical signals are used, whether or not PLC is used.

Lynn Meuleners and Michelle Hobday, Curtin University, Perth
Introduction Demographic changes in the Australian population are leading to an increase in the number of older drivers. Driving is a complex task and requires numerous skills. Some cognitive aspects that are essential for driving such as memory, visual perception, attention, and judgment ability may be affected by dementia. In the early stages of dementia, the risks associated with driving with dementia may go unnoticed due to an average three-year lag between symptoms and diagnosis. This study examined the crash risk among older drivers aged 50+ in the three years prior to an index hospital admission with a diagnosis of dementia, compared to a group of older drivers without dementia.
Method A retrospective whole-population cohort study was undertaken using de-identified data from the Western Australian Methods The PPV is calculated as the number of true positives over the number of positive calls. The positive calls can be found in the database, while the true positives were determined using questionnaires sent to the general practitioners of 880 randomly selected possible asthma patients identified using 8 pre-defined algorithms. The questionnaires were reviewed by two independent experts, one respiratory physician and one general practitioner (GP), to construct a gold standard. The algorithms consist of a combination of one or more of the following: definite or possible asthma Read codes (labels assigned by experts), evidence of reversibility testing and recording of two or more prescriptions of inhaled maintenance asthma therapy, and core asthma symptoms (wheeze, breathlessness, chest tightness and cough).
results Out of 880 questionnaires distributed, 463 were returned at the time of abstract submission. Of these, 457 were deemed usable and reviewed by two experts. The mean PPV across all of the algorithms was 72 using the study chest physician's opinion, 71% according to the study team's GP and 71% in the judgement of the patient's own GP. The PPVs of the particular algorithms are calculated separately. Based on this preliminary stage of analysis, it appears that a record of definite asthma codes gives a high PPV (81%-85%). Additional conditions of reversibility testing, repeated inhaled asthma therapy, or a combination of all of these three requirements does not improve the PPV. The best PPV (86%-88%) was reached by the combination of possible asthma codes with evidence of reversibility testing and more than one prescription of inhaled maintenance asthma therapy. Algorithms based on asthma symptoms with or without evidence of reversibility testing and inhaled asthma therapy, showed lower PPVs (all less than 60%).
phase of a project that aims at collecting various types of cognitive data, acquired from human subjects, both with and without cognitive impairments, in order to study relationships among linguistic and extra-linguistic observations. The project's aim is to identify, extract, process, correlate, evaluate, and disseminate various linguistic phenotypes and measurements and thus contribute with complementary knowledge in early diagnosis, monitor progression, or predict individuals at risk.
Methods Automatic analysis of the acquired data will be used to extract various types of features for training, testing and evaluating automatic machine learning classifiers that could be used to differentiate individuals with mild symptoms of cognitive impairment from healthy, age-matched controls and identify possible indicators for the early detection of mild forms of cognitive impairment. Features will be extracted from audio recordings, the verbatim transcription of the audio signal and from eye-tracking measurements.
results Currently we do not report concrete results since this is work in progress. Nevertheless, features will be extracted from (i) audio recordings: we use the Cookie-theft picture from the "Boston Diagnostic Aphasia Examination" which is often used to elicit speech from people with cognitive impairments and also reading aloud a short text from the "International Reading Speed Texts" collection, (ii) the manually produced verbatim transcription of the audio: during speech transcription, attention is paid to non-speech acoustic events including speech dysfluencies, filled pauses, false-starts, repetitions as well as other non-verbal vocalizations such as laughing, and (iii) from an eye-tracker: while reading, the eye movements of the participants are recorded while interest areas around each word in the text are defined by taking advantage of the fact that there are spaces between each word. The eye-tracking measurements are used for the calculation of fixations, saccades and backtracks.

Discussion
We believe that combining data from three modalities could be useful, but at this point we do not provide any clinical evidence underlying these assumption since the analysis and experimentation studies are planned for year 2 of the project (2017). Therefore, at this stage, we only report a snapshot of the current stage of the work. We also intend to repeat the experiments two years after the current acquisition of data in order to assess possible changes at each level of analysis.
Introduction This study aims to quantify the misdiagnosis of chronic obstructive pulmonary disease (COPD) in asthma patients in the UK using electronic health record databases. The specific objectives of this study are to calculate the PPV, NPV, sensitivity and specificity of a COPD diagnosis recorded by a general practitioner in patients with a confirmed asthma diagnosis. Asthma is difficult to assess in health-care database epidemiological studies as the diagnostic criteria are based on non-specific respiratory symptoms and variable expiratory airflow limitation which are often not recorded in electronic medical records. Specifically asthma in older patients can be confused with COPD.

Methods
The 880 asthma patients were identified at random in the Clinical Practice Research Datalink (CPRD) using 8 different algorithms. Questionnaires (110 questionnaires per algorithm) were sent to the general practitioners with a request for asthma diagnosis confirmation to be supported by any evidence available including information on reversibility testing, other factors considered for making an asthma diagnosis, the Quality Outcomes Framework indicators, smoking status, concurrent respiratory diseases and other sources like consultant and hospital discharge letters, lung function tests and radiography results. A review of this information by a respiratory consultant aims to identify the actual cases of COPD in confirmed asthma Introduction Treatment for patients with rheumatoid arthritis (RA) is shaped by monitoring changes in disease severity. At present, clinicians have few objective measurements of disease activity between clinic visits, even though a number of patientreported outcomes measures (PROMs) exist. Smartphones provide a possible solution by allowing regular monitoring of disease severity between clinic visits and integration into electronic medical records. Potential benefits include better information for consultations, triaging of outpatient appointments and aiding patient self-management. Such data could also support novel research by providing temporally-rich data. The REMORA (REmote MOnitoring of Rheumatoid Arthritis) study is designing, implementing and evaluating a system of remote data collection for people with RA for health and research purposes. The project asks whether electronic collection of patient reported outcomes (ePROS) between visits can enhance care and provide a source of research data. This paper describes the process of determining ePROS of importance, and presents the dataset included in the beta-app piloted.
Methods Interviews were held with a range of stakeholders (10 RA practitioners, 12 RA researchers, 21 RA patients).
Interviews determined ePROSs for inclusion, recording frequency, and the value of a free text diary. Initially, interviews were conducted with practitioners and researchers regarding their preferences. Key ePROS identified were tabulated and discussed with the PPI (patient and public involvement) group, working alongside the research team, and the table refined. Subsequently, patients were interviewed regarding their preferences and also asked to feedback on tabulated suggestions. Ultimately, components which had widespread consensus across the stakeholder groups were incorporated into the app. Components without consensus, or beyond the scope of the study, were documented with a view to incorporating them in later versions. PPI group members reviewed and commented on the suitability of the final components prior to their incorporation into the beta app.
results All stakeholder groups wanted to capture information on changes in disease activity and impact of the disease (physically and emotionally). Practitioners and researchers wanted routine data that had been recorded consistently using existing validated tools, but saw the value of a diary for recording triggers and alleviators of disease activity. Patients mainly suggested recording notable events (such as flares) as they occurred, but could see the benefits of recording data routinely. The final dataset comprised the following: Daily question set: Pain, difficulty with physical activities, fatigue, sleep difficulties, physical and emotional wellbeing, coping (10 point visual analogue scale), morning stiffness (7 categories) Weekly question set: Number of tender and swollen joints (numeric value 0-28), global assessment of wellbeing (10 point visual analogue scale), employment status (yes/no response -radio button), description of flare (free text box) Monthly question set: Health Assessment Questionnaire (HAQ) impact of disease on daily activities, including function and mobility (fixed point scales -radio button) plus free text entry box.
conclusion Consensus on the key components of the smartphone app was achieved. These components have been incorporated into the 'beta app' in readiness for piloting within clinical practice.
Introduction Multimorbid patients, suffering from two or more chronic diseases, often receive multiple disease-specific treatment plans that are likely to contain conflicting recommendations, since medical guidelines typically do not optimally account for complex multimorbid patients. General practitioners (GPs), given their role as care coordinator, are in a good position to identify and reconcile these conflicts.

Method
We conducted a literature study and expert interviews to identify practical challenges of guideline-based multimorbidity management in primary and secondary care and existing solutions. Based upon the literature study and interviews, we developed a workflow model providing decision-support for GPs when treating or coordinating care for multimorbid patients.
results Challenges of multimorbidity care mentioned in literature, were echoed by experts. For example, medical guidelines usually do not account for added complexity (cognitive decline, fall risk, malnutrition and decline in social relations) or conflicting patient preferences.
Competing demands and shifting priorities over time require prioritisation of conditions, revision of treatment plans and ensuring adequate self-management. The conventional workflow of GPs is problem-oriented, hampering a holistic approach. Existing tools for reconciliation of treatment conflicts often focus on specific subpopulations and lack applicability to the generic multimorbid patient population. Models for multimorbidity management, such as the Chronic Care Model and Ariadne principles, only provide abstract advice from an organisational perspective and are not directly applicable in clinical practice.
We therefore propose the MultiMorbidity Model (3M), a framework for CDSSs that supports GPs in delivering multimorbidity care for the comprehensive multimorbid population. It is a workflow model of five steps facilitating identification and reconciliation of various conflicts. It enables a holistic approach and provides opportunities for application of existing computerised decision support tools and shared-decision making tools. GPs take inventory of all applicable treatment recommendations (I -Select), prioritise these based on size of health effect and number of conditions affected (II -Prioritise), and personalise (III) the prioritisation by balancing burden of treatment, personal preferences and expected therapy adherence. The treatment plan is then simplified (IV) by identifying and reconciling conflicting recommendations. Finally, the treatment plan is formulated (V) in a concrete, specific and actionable way, adapted to the patient's lifestyle and health literacy. Output of the model is an individualized treatment plan for the patient, fitting the patient's health status, preferences and combination of diseases.
Discussion As a workflow model for multimorbidity management, the 3M provides decision-support to GPs, striking a balance between standardisation of care and personalisation of treatments. A preliminary evaluation indicated that usage of the 3M results in treatment plans with prioritised and concrete recommendations, making it a useful substitute for the usual workflow of GPs during follow-up visits for multimorbid patients. Complemented with shared-decision making tools and computerised decision support tools, 3M enables optimisation multimorbidity care.
conclusion This is a first step towards CDSSs that facilitate care coordination for multimorbid patients. Future research should focus on further validation of the model, as well as and integration with computerised tools to fit the workflow in the limited time of GP consults.

Vasa Curcin, King's College London, London
Introduction The care commissioning (assessment and planning) system currently in operation in England is configured to reduce 'need', where need is determined by assessment of the person's level of impairment, degree of risk/safety, informal care/family support and so on. In order to make cost-effective decisions in social care needs, there are two ultimate questions that need to be answered: what are the classes of individuals with common care needs, and what characteristics determine those classes. Atmolytics is a visualization layer of a data warehouse for social care needs assessment that provides flexible analytic functions to support data analysis. Atmolytics provides the functionality of defining a group of service users by their characteristics as well as their assessment questions and answers. Recorded service user information will be generated lively from shared databases based on the group definition; furthermore the group definition can be further used in the report function. The report function includes 15 types of report that create visual result of group definition. While the analytics required draws on complex real-world data, it is of prime importance to assure that the decisions are transparent and made with correct assumptions. In order to provide transparency and auditability of the tool findings, we have designed a data provenance module within Atmolytics to capture the full audit trail of the data transformations, leading to better understanding of the context of data production.

Method
The extended auditing/logging capacity was realized by employing PROV-DM together with provenance template, specifying the structure of data provenance to be captured. The storage solution is designed based on graph database.
results Initial analysis confirms that capturing provenance in visual analytics should not only describe automated processes but also human actions relevant to the data and models in the system -interactive steps etc. -that are more commonly associated with usability studies. Additionally, current auditing/logging capacity in a typical visual analytics system is insufficient for tracing or representing human actions and supporting a meaningful process mining, more specific it lacks a connectivity of recorded messages. The prototype resulted graph now connects the activities of analytic process within the system. The history of a group definition can be shown as a path from graph database.

Discussion
Capturing provenance in a visual analytics system such as Atmolytics is not a trivial task. Each of its subsystems relies on a separate data store, which communicates with others exclusively via a service bus architecture. Furthermore, disconnected auditing/logging functions expose different levels of events. In order to overcome these issues, we are employing provenance templates as higher-level abstractions over provenance graph data, implemented through a dedicated module that communicates with all other parts of the system. Future research plan is provenance data visualization by clustering and RDF database comparison on provenance use cases.
conclusion We find provenance template approach to be a realistic and promising solution to improving auditing/logging capability in enterprise visual analytics software, and we are currently in the process of developing the provenance visualization tool.

Morten C. Eike, Tony Håndstad and Thomas Grünfeld, Oslo University Hospital, Oslo
The significance of genetic testing for clinical purposes is increasing, and with the introduction of high-throughput sequencing (HTS) techniques and tools, this development is escalating. However, utilizing the output of sequencing -regardless of techniques and tools -requires a complex, multi-stage analysis process. The genetic variants in a patient must be identified and compared to variants that have been previously encountered in other individuals and current knowledge about their clinical significance. For variants that have not been previously described or for which there is limited clinical evidence, several predictive algorithms may be deployed, depending on the particular sequence context. Hence, the advancement in sequencing techniques must be complemented by technologies that can support end users in exploiting and producing information. This paper presents a user-driven innovation project at Oslo University Hospital, Norway, aimed at developing such technologies.
User-driven innovation has been described as a pull away from the traditional technology-centered innovation strategies, towards strategies that aim at achieving a co-evolution of the technical and the social, where users play a crucial role. This development has also been characterised as a democratisation of innovation, and it is interesting to note how both European Actions and priorities (e.g. in H2020) and various national priorities around Europe (and also in other parts of the world) actively seek to stimulate user-driven innovation through financial resources. The project we report from is a result of such an initiative in Norway the Norwegian clinical genetic Analysis Platform (genAP) project was funded by the Norwegian Research Council as a user-driven innovation project in collaboration between the Department of Medical Genetics (DMG) at the Oslo University Hospital and the Department of Informatics at the University of Oslo. It ran from 2011 to 2015, as a multidisciplinary collaboration between experts of multiple domains (medical genetics, molecular biology, bio-informatics, information systems, etc.) including users and the University's IT-department as supplier of a secure environment in which the system could run. The target system's character as a decision support tool, embedding highly specialized knowledge from all these domains, made this composition of project participants and associates pivotal for achieving the overall aims of the project.
Being publicly funded through the National Research Council and carried out as a collaboration between researchers and users, the genAP project constitutes a particular kind of user-driven innovation. As such, it also illustrates how this configuration Introduction Scientific journal publications seldom capture scholarly work so that an external researcher could reproduce reported results. The growing number of platforms for sharing scholarly assets (e.g. Figshare, ReShare, and EPrints for research data) reflects this, presenting science a sizeable opportunity to reuse resources of previously funded projects to make new discoveries, especially across disciplines/ domains.
Despite the plethora of platforms and standards, barriers remain to the reuse of scholarly assets due to missing information about datasets and their analyses. Poorly curated data leads to unnecessarily duplicated efforts in reusing scholarly assets and data harmonization and analysis are often manually repeated when reusing data. We discuss a methodology for combining and harmonizing research data, supporting the capture and sharing of rich contextual information essential for correct interpretation and data re-use.

Methods Minimum Information Checklists have been applied to develop Variable Reports and Variable Report Models:
the mechanism through which we incorporate data context and achieve semantic harmonization across different sources of comparable data, e.g. different birth-cohort studies. eLab communities use Variable Report Models to specify reporting requirements for variable summaries which are used when reporting data.
Research Objects have been applied alongside new distributed computing technologies in the Job Manager Module to support the reproducibility of analyses, which might be reported in publications. This includes moving code and input files to a remote resource through HTCondor, providing process statuses and moving outputs back into the eLab.
results The HeRC eLab builds on the Alfresco Enterprise Content Management System, which provides core support for day-to-day project activities such as document management, audit history, and collaboration. We developed the Variable Bank Module which facilitates the import and export of data. Research questions can be asked across multiple datasets created by different communities. The Job Manager module works with data and computational code through Execution Research Objects which capture the process of executing analysis code on data. The eLab has been further developed and successfully used by the STELAR (Study Team for Early Life Asthma Research) consortium to harmonise data from five UK birth cohorts and the iFAAM (integrated approaches to food allergy and allergen management) EU project to harmonise data from food allergy studies.
conclusion We have presented Variable Reports and Execution Research Objects as approaches to capturing context and provenance of data and methods and their application within the HeRC eLab. Our work distinguishes itself from existing approaches, providing: • a way to combine and pool datasets from any domain • data search, combine and export capability from multiple studies automatically, with fewer manual processes to query and deliver data extracts.
Often, data warehouses and other approaches, also extract, transform and load data, without capturing the relationships between variables-the essence of any research claim. Variable Report Models advance semantic harmonization-with no further cost for reuse after the initial investment. The use of Minimum Information Checklists with Variable Reports means that software does much of the metadata quality checking.
Introduction Diverse and disparate datasets are increasingly being linked and used in research both at scale and at higher clinical resolution. In biomedical research, a growing 'open' research culture has emphasised the significance of publiclyaccessible metadata the availability of which is critical since researchers use clinical datasets containing personally identifiable information, and are not always able to readily share these data. However, the inability to characterise and evaluate datasets due to insufficient metadata limits the extent to which data may be utilised for research. These challenges are compounded by inconsistencies in the way researchers record and share discovered datasets. This study aimed to identify and evaluate methods to enhance biomedical research data discoverability.

Methods
We used a combination of analytical techniques: a systematic literature review to characterise existing data discoverability practices and identify current challenges an online international stakeholder survey and feasibility analyses (technical, economic and organisational factors) of methods to enhance biomedical research data discoverability.
results We identified 49 studies and organisations 13 were randomly selected for review. PDF was the most commonly used format for research protocols whereas research data were mostly disseminated using SAS, STATA and SPSS files. A total of 253 individuals completed the survey. The most popular aspect of a research study that should be easily searchable was the 'research study question' (15%). Survey results showed that variable standards of data management and research data negatively impacted the handling of metadata. Challenges associated with data publications included, limited perceived significance and the need for changes in research culture for data to be, "considered and acknowledged as a valuable scholarly output alongside publications". However, formal academic recognition of their significance is limited and the publishing of these articles could have an associated open access fee. Semantic web technologies, e.g. the Resource Description Framework, use uniform resource identifiers to differentiate between disparate data sources which may be integrated. However, limited familiarity with these technologies could result in a significant demand for training. Public health portals: online catalogues of metadata records describing studies for which research data may be available for reuse. Researchers are already using online portals yet, integrating use of this portal into work routines may be challenging additional resources are required to develop and sustain the portal.

Discussion
We identified inconsistencies in how research data were documented (e.g. the provision of online data dictionaries) and the creation/usage of metadata. Most of the survey respondents were data users and given that awareness of the significance of having high-quality metadata is still increasing amongst researchers, these results could be attributed to limited awareness of the discoverability issue and inadequate routine metadata administration.
conclusion Our findings suggest that more emphasis is needed on the importance of metadata through training/support the advantages of data publications and increased recognition of these outputs within the academic community. The three methods identified and evaluated can support these recommendations.

Tim Wilkinson, Amanda Ly, Zoe Harding, Christian Schnier, and Cathie Sudlow, University of Edinburgh, Edinburgh
Introduction Neurodegenerative diseases such as dementia and Parkinson's disease (PD) are major causes of mortality and morbidity. Prospective cohort studies can provide important insights into the determinants of these disorders. UK Biobank (UKB) is a large, population-based, prospective cohort study of over 500,000 participants aged 40-69 years when recruited between 2006 and 2010. Participant follow-up is largely via linkage to routinely-collected health datasets such as hospital admissions, death registrations and, increasingly, primary care data. Here, we discuss the approach we have developed to estimate the accuracy of these sources for the identification of dementia and PD outcomes.
Methods We conducted systematic reviews of studies that assessed the accuracy of ascertaining dementia or PD cases from codes in routinely-collected datasets versus a clinical expert diagnostic reference standard. We summarised results for positive predictive value (PPV) and sensitivity. Informed by these results, we performed our own validation study of dementia coding using data from UKB participants and have commenced a similar study for PD. Using published and online resources and clinical judgement we generated a list of ICD-10 and primary care (Read version 2) dementia codes. We identified Edinburghbased UKB participants with a dementia code in hospital admissions, death or primary care data. We extracted relevant letters and investigation results from the electronic medical record (EMR). A neurologist adjudicated on whether dementia was present based on the extracted notes, providing the reference standard to which the coded data were compared. We calculated the PPV for each data source individually and combined.
results The systematic reviews revealed a wide variation in methodologies and results across existing studies in the literature. For PD, PPVs ranged from 71-88% in hospital and death datasets, while in a primary care dataset the PPV was 81%, increasing to 90% in patients who also received >1 prescriptions for antiparkinsonian drugs. Sensitivities for PD coding in hospital and death datasets ranged from 53-83%. PPV estimates for dementia coding in hospital and death datasets ranged from 4-100%, with PPVs of 83-92% for primary care data. The use of specific subtype codes or selection of codes in the primary position only resulted in higher PPVs; however, there was a corresponding reduction in case ascertainment. For our UKB validation study of dementia coding, there were 17,000 Edinburgh-based participants of which 44 participants had a dementia code in at least one data source and available EMR data. PPVs for dementia were 41/44 (93%, 95% CI 81-99) overall, 13/15 (87%, 95% CI 60-98) for hospital admissions, 2/2 (100%, 95% CI 16-100) for deaths and 33/34 (97%, 95% CI 85-100) for primary care data.

Discussion
Results to date suggest that, with appropriate choices of codes, the diagnostic accuracy of these datasets is likely to be sufficient for identifying dementia and PD cases in large-scale, prospective epidemiological studies. Primary care datasets are potentially valuable data sources warranting further investigation.
conclusion By systematically reviewing the literature and performing our own validation study, we have developed a method of estimating the accuracy of using routine datasets to identify neurodegenerative cases.

Introduction
The impact of advances in data science is often dependent on the capacity and capability of analysts working in the health care systems who have to implement new approaches. In this presentation, I will discuss work undertaken at The Health Foundation to outline the challenges facing the analytical workforce in the health services in the UK.
Methods Qualitative analysis of 70 interviews with analysts, academics, clinicians and managers working for health services and public health in the UK during 2016.
results Though there are examples of good analysis and variations between organisations, the interviews identified a series of common problems including: -Decision makers in health care often cannot access the right type of analytical skills. -In some cases there are too few analysts, in others they are too busy working on mundane data manipulation "shifting and lifting". -Where we do have analysts, their skills can be limited and they work in small units with little chance to develop professionally.
-The increasing amounts we rightly spend on information are not being matched by our investment in people to analyses the data we have.
Analysts do not form a homogenous occupational group but span many different disciplines and skills. The interviews suggested that the critical attributes are for people who are able to: (a) understand and structure the problems of managers/ clinicians (b) access evidence and information that was relevant to a problem (c) apply appropriate and robust methods to manipulate information and data and (d) communicate the findings accurately and clearly.
The reasons behind the shortfall in analytical capabilities encompassed issues covering the supply of analysis and training and support as well as the demand for their skills Discussion In a situation where the problems are multifaceted, the solutions seem to be long term strategies that encompass: (a) Promoting ways that analyst can use networks to share and learn (b) Working at scale to overcome the problems of fragmented communities of analysts and the need to an array of different boundaries. (c) Supporting professional development and vocational training. (d) Supporting tools for analysis (e) Creating environments for innovation and (f) Develop new relationship with the experts. There are also important elements of cultivating demand for high quality analysis and to reinforce the Introduction Demographic change leads to an increase of people in need of care. Most are elders cared by relatives at home. These informal caregivers often face this situation surprisingly and without any prior care know-how. The aim of the project "Mobile Care Backup" is to support informal caregivers managing their everyday tasks by applying technology. In particular, the aim is proactive provision of necessary knowledge and information for informal caregivers. Furthermore, the information provided should be adapted to the caregiver's specific situation. The majority of knowledge units will address nursing activities, giving the informal caregivers a better understanding of their tasks. To provide this information, the automatic detection of nursing activities is necessary. Thereby an automatic care diary can be realised, too. This paper describes a concept for automatic detection of nursing activities in home care.
Methods Typically, specific nursing activities are connected to different fixed places inside the cared person's home. Therefore the position of the caregiver is an important predictor of the activity performed. Thus, indoor positioning is a necessary application. Since position measurement alone is not sufficient for the detection of nursing activities, an abstraction of accelerometer and gyroscope data could be used additionally. Furthermore, many nursing activities, like the administration of pharmaceuticals, are executed at a specific moment or time frame of the day. Thus, dates of execution could predict activities, too. Data points from indoor positioning could be validated by logical rules. If the caregiver remains in front of the tub in the bathroom he is probably bathing his relative. Accelerometer and gyroscope data could be interpreted by a soft computing method, like an artificial neural network or support vector machine. The mentioned date of execution of a nursing activity could be aligned with previously entered daily routines. In a second step, a pattern classification approach will be applied, using the results of these three evaluations as feature vector, putting out one final nursing activity.
result For the indoor positioning, GPS or beacon technology could be used. Accelerometer and gyroscope data could be retrieved from a smartphone or smartwatch, which could measure the date of execution, too. As different studies have shown, it is possible to abstract accelerometer data to activities of daily living. For this task, neural networks seem to suit better than support vector machines and other techniques. 1 Discussion Indoor positioning using beacons should be more precise than GPS. The smartwatch's accelerometer data is more significant than the smartphone's. By merging these two approaches with the execution dates, we should reach a high classification rate. Camera-based approaches could produce good results as well, but will not be used due to acceptance issues. Information could be displayed on the smartwatch, so the caregiver is empty-handed to perform the activity.
Introduction Screening can reduce deaths from cervical, bowel and breast cancer if the people invited participate, however screening uptake among Scottish women is 73% for breast, 69% for cervical but only 61% for bowel. Little research has examined why bowel screening fails to achieve the uptake rates of breast and cervical. The availability of Glasgow-wide data for the complete population within a socioeconomically diverse region with comparatively low screening uptake provides a unique context for this research. To determine why women who are eligible for all three types of screening choose to do none, some or all tests and to shed new light on barriers unique to bowel screening we will investigate demographic and medical factors associated with the lower participation in bowel screening relative to breast and cervical screening.
Methods Data on screening invitations and attendances for women aged 20-74 in the NHS Greater Glasgow and Clyde Health Board who were sent an invitation or attended at least one of the three programmes during 2009-2013 were linked to demographic data, hospital discharge records, GP Local Enhanced Service (LES) data and death certification records. The number of attendances for breast and cervical screening and the number of bowel screening tests returned were recorded. It was not possible from the data provided to identify the number of invites to cervical screening or the relationship between a cycle of invites and attendances for all three programmes. Co-morbidity was assessed using a Charlson Index based on hospital records and GP LES data and socio-economic status was categorised by Scottish Index of Multiple Deprivation (SIMD) quintile based on home postcode. Logistic regression for each screening programme assessed the association of age, SIMD, Charlson Score and other factors on screening participation. Women who were invited to participate in all three programmes were also identified with similar analysis performed.
results There were 430,591 women invited to take part in at least one of the screening programmes over 2009-2013. 116,212 (72.6%) women attended for breast screening out of 159,993 invited over the period, 250,056 (80.7%) women from 309,899 attended cervical screening and 111,235 (61.7%) women completed bowel screening from 180,408 invited. There were 68,324 women who were invited to participate in all three screening programmes during the study period with 35,595 (52.1%) participating in all 3 programmes.

Discussion
Despite having rich data from the individual screening programmes, allowing unique insight into cancer screening uptake, recording of patient invitations and attendances differed significantly between the programmes. This provided limitations to the analyses such as identifying the number of invitations prior to uptake and screening cycle adherence. Women have lower participation in bowel screening than for breast or cervical, although the same demographic factors are associated with participation. Only half of women eligible for all three screening programmes participate in them all.
conclusion Older women and those living in more affluent areas were more likely to attend for breast, cervical and bowel screening. Women with multi-morbid illness were less likely to participate in all screening programmes.

Bernardo Valdivieso, Universitat Politècnica de València, Valencia
The use of Big Data platforms in health care is in an uprising trend. Big Data technologies allow easier and faster analysis of vast amounts of data such as patient pathways, which may lead to better decision-making. We present a Big Data methodology approach for warning potential complications in patients with diabetes by finding local similarities among their patient pathways. Specifically we present a Storm based platform that implements our extended version of the Smith-Waterman algorithm to detect clinical complications in diabetic patients by comparing them to a whole set of Electronic Health Records (EHR). A demo of the system is available at www.lppaalgorithms.com.
The extended version of Smith-Waterman compares the patients based on a tuple form of clinical records, known as Patient Pathway (PP). Data processing was made to obtain the PP dataset from the EHRs, where each one is composed by the clinical observations ordered in time. We define five different types of clinical observations: hospitalisation, outpatient consultations, emergency room visits and laboratory tests for glucose and creatinine. The episodes of a patient are codified and put together in the PP following the timeline of each episode. Then, using the SW-based algorithm a comparison between each pair of PPs is carried out. The comparison has a possible output: cardiopathy complications. In case the PP pair is not ranked, it is not shown.
The logic for comparing PP is developed using a Big Data framework called Apache Storm. Different components are defined: a Spout that gathers the tuples from a web queue and passes the data to the logic, and a set of Bolts, each Bolt has a unique function inside the topology, and the Bolts can be largely replicated. These Bolts work together to join the pathways from the web queue with each one on the database, creating a set of 2 tuples for each database entry: the query patient, and the database patient. After finding local similarities, it is possible to rank the 10 best PP alignments. These are sent as possible outcomes for the query patient.
The LPPA system is based on different technologies. The web interface is primarily programmed using HTML and JavaScript, having WordPress as a design tool. The server is programmed in JavaScript, using node.JS to run the system, and various Introduction The WHO International Classification for Health Intervention (ICHI) is based on an ontology framework defined in ISO 1828, named Categorical Structure for terminological systems of surgical procedures. We reviewed 574 ICHI alpha 2016 existing codes and structure and compared with EN 1828 and the SNOMED CT (SCT) procedures hierarchy concept model. We conclude that modifications are needed to design a more semantically defined version of the ICHI chapter Medical and Surgical interventions. We checked if the three axes of ICHI (Target, Action and Means) are sufficient to express semantically the Medical and Surgical interventions and how ISO 1828 and SNOMED CT concept model for the Procedure Hierarchy express these interventions.

Method
We studied 574 ICHI alpha 2016 1 interventions from three chapters: Nervous system, Ear, and Endocrine system. We compare the existing three axis structure of ICHI with ISO 1828 2 and SCT concept model. 3 results The different concept model of SCT attributes using the word "direct", as in procedure site direct, direct morphology, direct device and direct substance are equivalent to ISO 1828 semantic link "hasObject" and semantic categories "Anatomical Entity", "Lesion" and "Interventional equipment". They have no equivalent in ICHI. Further on the attributes Procedure site indirect allows to provide the equivalent with ISO 1828 semantic link "hasSite" with the semantic category "Anatomical entity". The ICHI axis Action should be duplicated to express the intent and the deed. The ICHI Target axis should be extended to pathology as adhesions or calculus, and to medical devices as pacemaker. The ICHI Target axis should be duplicate in Direct Target grouping the semantic categories on which the action is carried out and Indirect Target which is the site on which is localised the object on which the action is carried out as "Implantation of internal device, ventricles of brain". The ICHI Means axis should be extended to medical devices and drugs.
references Introduction Alzheimer's disease, the most common neurodegenerative disorder, is expected to cause 1 million new sufferers per year as early as 2050. Similarly, numbers of people suffering from Parkinson's disease are expected to rise to between 8.7 and 9.3 million by 2030, making it the second most common neurodegenerative disorder. As such, these diseases have become a major focus of global biomedical research in an effort to develop a detailed understanding of causes and pathology that will lead to novel treatments and improved care. The ApiNATOMY project (http://apinatomy.org/home) aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed for these illnesses. As curation is labour-intensive, automated methods are sought that allow for faster manual curation. Here we present our method aimed to learn the highlighting behaviours of human curators.
Methods PDFs are converted into sentence-separated XML files using Partridge. Sentences potentially relevant for the curator, are identified through an algorithm that assesses each sentence individually and scores its relevance based on linguistic (cardinal numbers preceding nouns, characteristic subject-predicate pairs), semantic (named entities) and spatial features (splitting of papers into regions and section assignment). Linguistic and semantic features were weighted based on the percentage of their occurrences in highlighted sentences and spatial regions. The overall score of a sentence was calculated using a linear function that combined scores of all identified features in it. The parameters of the linear function were chosen from Introduction The incidence of preterm birth is increasing, with a high proportion of survivors experiencing motor, cognitive and psychiatric sequelae. Prematurity places newborn infants in an adverse environment accentuating their individual ability to cope with systemic challenges, and calls for precision in healthcare interventions. Machine learning strategies are used here to investigate the neurobiological consequences of prematurity. Given the establishment of a large genetic contribution to quantitative neuroimaging features informative of downstream function, and the assumption that a subset of genetic markers will be found in statistically meaningful association with a subset of image features, computational models must be able to select those informative variables. Multivariate sparse regression models such as the sparse Reduced Rank Regression method (sRRR) obviate the need for multiple-testing correction and significance thresholds, since this involves fitting a predictive model using . A weighted adjacency matrix of brain regions for each infant was converted into a single vector of edge weights based on fractional anisotropy (FA), resulting in one matrix of n individuals by q edges, where n = 272 and q = 4005, adjusted for major covariates (post-menstrual age at scan (PMA), gestational age at birth (GA)) and ancestry. Saliva samples were collected using Oragene DNA OG-250 kits, and genotyped on Illumina HumanOmniExpress-24 v1.1 chip. The genotype matrix was converted into minor allele counts, including only SNPs with MAF ≥5% and 100% genotyping rate (556 227 SNPs). sRRR model parameters: SNPs at each iteration (n = 500), stability selection with 1000 subsamples of size 2/3 subjects, convergence criterion = 1x10-6, resulting in a ranking of all genome-wide SNPs based on their importance in the model. A null distribution was computed by running sRRR in the same way, additionally permuting the order of subjects within the phenotype matrix between each subsample during stability selection with 20 000 subsamples. results sRRR detected a stable association between SNPs in the PPARγ gene and the imaging phenotype fully adjusted for GA, PMA and ancestry. SNPs in PPARγ were significantly over-represented among the variables with the uniformly highest ranking in the model, contributing to a broader significant enrichment of lipid-related genes among the top 100 ranked SNPs. conclusion This provides specific insight into how nutrition might be tailored with precision according to each infant's genetic profile to optimize brain development.

Discussion
Introduction HIT usability issues remain a prominent source of medical errors and other unintended consequences. While research has identified usability issues in a single system or setting, the challenge of usability across a range of systems remains problematic. Patient care increasingly occurs across multiple providers, settings and HIT systems, and thus usability must be considered not just for one system, but across several systems and users. Functions or features in HIT (e.g. data retrieval or display) may not be designed consistently across systems and this can lead to errors and other unintended consequences.
Methods To examine the variables and interactions of how specific usability issues vary across different clinical systems we constructed a matrix of 11 usability dimensions and contextual differences. We built the matrix from the literature and from our collective 52 years of surveys, observations, shadowing clinicians, usability studies, etc. For this poster, we select four key usability dimensions and discuss how they contribute to the silent error of information retrieval. We also shall illustrate each of these with screen shots and analyses.
results Finding patients and data reflects inconsistent navigation and search functions. Such problems are at least inefficient, and at worst, lethal. Inconsistent data displays, e.g., fonts, colours, metrics and interfaces etc. vary dramatically across systems. Providers become comfortable viewing data in a specific context and may be confused when the display changes. Last, the number of screens and patient charts open at once-presents patient safety dangers and trade-off. Each additional chart or screen increases the probability of entering an order into the wrong patient chart or reading data from the wrong patient chart.

Discussion
We sought to examine multi-setting, multi-systems, and multi-user matrix of usability dimensions and contexts. The proposed research will hopefully encourage a more panoptic design of HIT software by incorporating the need to focus on usability across several facilities and many software vendors' products.
conclusion HIT systems and functions will always be emergent, interactive and multifaceted. To make the systems useable across many settings and many users, vendors will have to incorporate equally emergent, multi-context and multifaceted approach to usability.

Sharareh R. Niakan Kalhori and Hajar Hasannejadasl Farhad Fatehi, Tehran University of Medical Sciences, Tehran
According to the World Health Organization 33% of the years lived with disability (YLD) are attributed to neuropsychiatric disorders. WHO estimated that globally 350 million people suffer from depression. The effect of this burden on society is overwhelming. Meanwhile, self-management is an important aspect of required care in long-term disorders and diseases management. mHealth based tools such as smartphone applications have been recommended as new tools to support self-management in depression. In this review we assess on mobile application apps were focused on depression in English. Evaluation conducted based on 7 functionalities (such as inform, instruct). Of 251 potentially relevant apps, 68 met our inclusion criteria. However for self-management assessment 7 applications had the minimum eligibility. Given the complex challenges faced by patients with depression, there is a need for further app development targeting their needs. In addition development of multifunctional apps is required to support the management of depression along with other related mental disorders such as anxiety and stress concurrently.
Introduction It is well-recognized that physical-activity (PA) plays an important role in enhancing and maintaining healthrelated behaviors in children 1 . This study aims to examine factors associated with high-level PA at age 12 months compared to those who are relatively inactive.

Methods
The Environments for Healthy-Living, Growing up in Wales cohort study collected questionnaire data and postnatal notes and linked this data with general practice and hospital admission records. In addition, a total 148 out of 800 infants wore a tri-axial GENEActiv accelerometer on their ankle to collect the physical activity data 2 over 7 consecutive days. Activity was measured in the sum of vector (SVM) magnitudes and the population was divided into two using the median from a lower activity and a higher activity group. Important predictive factors were identified for a linear regression model to predict the PA levels.
results The mean SVM score in lower active group ranged from [0.677, 4.932] SVM magnitude and in active group from [4.975, 10.628] SVM magnitude. Infants in the active group were more likely to be boys (i.e. 70.42% boys and 29.58% girls), whereas in the inactive group (i.e. 38.57% boys and 61.43% girls). Active infants have a longer gestation, more milk feeds per week, more likely to be breastfed for longer, more active at night, and drink more juice. There were significant differences between lower and higher active infants groups on the following factors defined by mean difference (MD) and confidence interval (CI)): mother gestation days (MD: -9. conclusion There is a great deal of variability in the level of activity in different children. The active children are more likely to be those who are full-term, breastfed, active at night, and take juice. There was no significant effect on the size of the baby on the activity level however, the preterm birth is associated with lower activity level. The important factors identified by this study would benefit health decision making in promoting healthier lifestyles for infants and their mothers. Introduction Stratifying patients based on predicted future rate of decline would help in the design of clinical trials. Trajectory modeling to detect patterns of decline is a challenge when little information on disease stage is available. The relationship between the rate of progression and disease severity can be used to identify dementia patients deviating from the expected pattern of change. 1 Applying this approach with a cognitive phenotype has not yet been explored but could be used to identify those most at risk for faster cognitive decline. Methods Due to the challenge of identifying a cohort with sufficient length of follow-up to effectively study disease progression, this study turned to the secondary use of electronic health records. Specifically, a retrospective cohort was derived from South London and Maudsley NHS Foundation Trust health records comprising 3441 subjects with at least 3 MMSE scores recorded over 5 years. Residuals from the relationship between cognitive decline and disease severity were grouped into tertiles of average, slower and faster progression. Subject characteristics were explored for association with group membership by multinomial regression. Characteristics including demographics, items from the Health of Nation Outcome Scales (HoNOS) and promising repurposing medications for dementia 2 were available for comparison across groups. conclusions This study demonstrates how health records can be used to suggest potential relationships between patient characteristics and future disease progression.
references Introduction The problem of patient adherence is becoming alarming in the medical practice worldwide. Formation of patient adherence (PA) depends on many factors. Specifically, there is limited data on PA to particular lifestyle recommendations. Considering physical activity and exercise as an essential part of lifestyle to control cardiovascular disease (CVD) and prevent its further progression, the review was focused on discovering the factors associated with physical-activity-related adherence in a group of patients with CVD that can be used under eHealth interventions development. The objectives of the review included: (a) identification of types of physical-activity-related behaviour and its settings, (b) assembling adherence measurement criteria, (c) identification and classification of factors affecting adherence.
Methods A comprehensive literature review was conducted based on the scoping studies approach 1 . Where applicable, the systematic review methods were used to narrow and increase the quality of the final results. The MEDLINE database and the Cochrane Library were accessed between March and August 2016. Out of original 277 yielded publications, 58 were included for further analysis. Considering the manual search queries, 5 relevant papers were added to the final results.
results PA is an indicator of the performed level of physical activity in everyday life, as well as during cardiac rehabilitation. Attention was paid to the perspectives from which the term was considered in the selected studies. In regard to adherence, the types and settings of physical-activity-related behaviour were determined and classified. Finally, the measure instruments used up to date for adherence were overviewed and briefly described. Patient adherence to physical-activity-related behaviour reflects a complex interaction of different factors. Examined factors have been classified with regard to the nature of its origin and association to PA. Statistically significant factors and their influence on PA to physical-activity-related behaviour are discussed in the review.

Discussion
Intervention settings: Majority of the interventions was heterogeneous and not comparable with regard to participants' characteristics, types and settings of physical-activity-related behavior, and intervention settings. Patient adherence: The selected studies provide different types and dimensions of adherence, depending on the particular behaviour, measurement methods and instruments. Associated actors: The biggest challenge in understanding the influence of certain factors is that adherence is multifactorial.
conclusion Intervention classification: There is a need for intervention classification, the results of which could be used in the design of eHealth interventions. Modifiable factors analysis: Actuality for dividing factors to modifiable and non-modifiable, from the perspective of eHealth interventions development, requires further investigation. Prediction algorithm: Attention has to be given to the eHealth interventions that already exist in the clinical practice. Together with the recent results, it may serve a solid background for the development of the prediction algorithm for early identification and prognosis of patients' (non-) adherence.
Introduction Decision support is widely regarded as one of the most promising means for information technology to improve care. However, to date, its ability to live up to that promise has been inconsistent. The recently-published "Two Stream Model" proposes that decision support can be described in terms of the clinical stream (reasoning about what advice to present) and the cognitive stream (reasoning about how to present that advice to the user). It suggests that cognitive/behavioural knowledge should be used to determine what support the user needs and how the system should provide it. The objective of this review is to evaluate whether and how knowledge from three diverse areas of cognitive science -descriptive decision theory, humancomputer interaction, and behaviour change theory -have been applied or proposed for application to the field of computerized clinical decision support.
Methods A search was conducted for each of the three areas of cognitive/behavioural science by a Master's student in Medical Informatics. The searches used Medline (all searches), Google Scholar (for human-computer interaction) and Embase (for behaviour change theories) and followed the general form: ((area of cognitive-behavioural science) and (decision support)). The searches were conducted in January 2016. Articles were included if they described a computerized decision support system, or a proposal for designing such systems, and described the use of a descriptive decision theory, human-computer interaction, or behaviour change theory in the design or evaluation of that system. Papers which used one of these approaches exclusively to analyze the results of a usability evaluation were excluded. Data were extracted on the study year, the cognitive/ behavioural theory used, how it was used, and the decision support system where it was applied. Data extraction was checked by a second reviewer. results A total of 15 studies were included: 5 incorporating descriptive decision theory, 5 using human-computer interaction, and 5 using behaviour-change theory. The studies using descriptive decision theory mainly used methods from this field (cognitive task analysis and theory of situation awareness) to collect observations used in system design. One study used Norman's Theory of Action to categorize system use problems in evaluation of an existing system. Knowledge from humancomputer interaction was used in both design and evaluation. Two proposed general principles for the design of systems, two described observation methods during evaluation, and one proposed a tool for evaluating human factors principles in medication alerts. Behaviour change theory was used exclusively in the design of patient-oriented systems, mainly for smoking cessation (4 studies). Four employed the trans-theoretical model one used the PRIME theory.
conclusions Although these results should be considered preliminary due to the limitation that each search was carried out by a single researcher, these results suggest that knowledge from the field of cognitive and behavioural science has seen only limited use in the field of clinical decision support systems. These three areas of cognitive science were chosen due to their clear relevance to the field of decision support. However, further research should extend this to other areas of cognitive/ behavioural science.

Introduction
We are presently witnessing a new epoch in the evolution of mobile technologies defined by the proliferation of network aware, compute capable devices dubbed the 'IoT Age'. A distinct category of these are personal wearable devices or 'wearables' which have an enormous potential application in the fields of healthcare and medical research. Real time data streaming capabilities represent an innovative and promising opportunity for mobile health (mHealth) applications based on remote sensing and feedback. The €22m IMI2 Remote Assessment of Disease and Relapse -Central Nervous System (RADAR-CNS http://radar-cns.org) initiative is a new research programme aimed at developing novel methods and infrastructure for monitoring major depressive disorder, epilepsy, and multiple sclerosis using wearable devices and smartphone technology. While a number of commercial mHealth solutions are available to aggregate sensors data, there is a lack of an open source software stack that provides end-to-end data collection functionality for research, clinical trials and real-world applications. The RADAR platform aims to fill exactly this gap, providing generalised, scalable, fault tolerant, high-throughput, low-latency and data collection solution.
Methods By leveraging open source data streaming technologies, we are building an end-to-end system with generalized aggregation capabilities. The platform will focus on classes of data rather than specific devices, in doing so it will enhance modularity and adaptability as new devices become available. The platform is delivered under open source licence in order to create a legacy to downstream RADAR projects and the wider mHealth community. The key components of our software stack include: Data Ingestion and Schematisation (using Apache AVRO), Database Storage and Data Interface, Data Analytics, Front-end Ecosystem, Privacy and Security.
results We have utilised the Confluent Platform as a core component, this is a new open source suite of tools built on top of Apache Kafka. At fixed intervals of time, the patients' data sources collect data representing the patient's attributes by passive (e.g. from hardware sensor streams) or active (e.g. questionnaires, assessments apps) ways. These attributes are then ingested via an HTTPS interface which translates REST calls in native Kafka calls. After a restructuring phase, data (both realtime and historical) are simultaneously analysed and persisted. Two different data warehouse layers (cold and hot storage) are deployed to provide low latency and high performance data access via controlled interfaces.

Discussion
So far, we have demonstrated integration of the Empatica E4 wearable device and on-board smartphone sensors as examples of passive data sources and a Cordova questionnaire app builder as an example of an active remote monitoring data sources. Low latency data access tools and REST APIs serve as downstream generalised data access interfaces, examples of which are modular data visualisation tools.
conclusion RADAR-CNS aims to improve patients' quality of life, through self-management for example, and potentially Introduction The use of SNOMED CT in medical coding is constantly progressing on an international level. 1 However, description logic modelling errors related to the hierarchical structure have been described before on aspects of anatomy and clinical findings. Examples for problem types are especially "issues with site and resulting interferences". 2 We present an additional approach to address SNOMED CT logic modelling errors that affect issues with finding site. Our entry point for improvement is primarily substantial and focuses on concept definitions provided by literature.

Methods
For examining exemplary description logic modelling errors, we use the SNOMED CT International Version 20160731. We take the concept |414086009|Embolism (disorder)|, that is erroneously defined as a |19660004|Disorder of soft tissue (disorder)|. After the analysis of associated concept definitions for "embolism" and "soft tissue", the hierarchical connections are adjusted.
results In SNOMED CT, there is the distinction between |414086009|Embolism (disorder)| and 55584005|Embolus (morphologic abnormality)|. Embolism is defined as a condition where the blood flow in an artery is blocked by a foreign body. Soft tissue includes all kinds of tissues inside the body, except bone. To improve the modelling, we take an alternative definition of "embolism" connected to the specification of a certain finding site, especially in the bloodstream. 3 According to this definition and with the intention to include the "site aspect", it seems rather an option to create a new hierarchical connection that defines: |414086009|Embolism (disorder)| is a |43195004|Bloodstream finding| is a |118234003|Finding by site|.

Discussion
In clinical practice, embolism is considered as a finding of the blood stream rather than a soft tissue disorder. Soft tissue seems to be unsuitable as a classifier concept because of the broad term definition. The finding site is needed to indicate the location of embolism that is the cause of the incorrect classification. In order to rearrange the hierarchy, the finding site of blood vessel has to be changed to enable the expected description logic modelling result.
conclusion It will be a considerable huge effort to filter all concepts that seem to have an "issue with site", with regard to the definition of the term 'soft tissue'. However, the assignment of new or alternative parent concepts will not solve the problem. It is not ideal to declare all embolism disorders as primitive concepts. It seems to be the best option to limit the content improvement on single use cases and to get more evidence on whether keeping soft tissue disorder as a classifier or to retire it.

references Introduction
The University of Oxford is engaged with some of the world's largest population-based, prospective studies. The UK Biobank (UKB) and China Kadoorie Biobank (CKB), each of which recruited over half a million participants, will be based at the University of Oxford's Big Data Institute (BDI). The BDI will be directed at obtaining and characterising large datasets to significantly alter our understanding of the causes and treatment of disease. Both studies will have a unique opportunity to share what they have learnt from each other and to identify new opportunities in the future.
Methods UKB and CKB both recruited half a million middle aged participants between 2004 and 2010. Participants were selected from different regions across the UK and China, and have undergone extensive baseline measures, provided blood and urine for future analysis and gave detailed information about themselves. Participants' health outcomes will be closely followed over the next few decades through linkage with established death and disease registries, and national health databases. Both studies run periodic repeat assessments on a subset of their cohorts and implement project enhancements such as genotyping, physical activity assessment, biochemistry, proteomics and new web-based questionnaires to collect even more data on their participants.
results Collectively, both biobanks have combined health-related data on over a million people, stored over 3.5 million blood samples and have linked participants to 45,000 death records, 100,000 cancer records and 1.5 million hospital admission and health records. UKB is an open access resource with over 3,619 registered researchers, 343 projects underway and 1,256 released datasets. CKB has its own researchers in the UK and China, and it is only just starting to make its data available to the public. CKB now has over 300 registered researchers.
Discussion Whilst the two studies began with separate operating and funding models, we have identified many opportunities to share methodologies such as genetic assays, record linkage, encryption, anonymisation and delivery of research datasets.
We are looking for further opportunities to explore new and novel big data opportunities by working closely together. We have begun sharing our knowledge and experience with integration of new data sources such as physical activity monitors, electronic data capture devices, imaging data, outcome adjudication and linkage of our participants to air pollution and other meteorological data.
We are also collaborating and attracting the attention of other departments within the university to develop machine learning techniques on data such as the ECG to predict future outcomes for our participants and validate these through our continual follow-up.
conclusion These two studies are powerful resources for investigating the main causes of many common chronic diseases over the next few decades, and the information generated will advance our understanding of disease aetiology in the UK, China and in other countries. This collaboration also presents an invaluable opportunity for sharing of methodologies related to integration, processing, dissemination and storage of health-related data. This has led to high quality research and contributed immensely to understanding of disease and contributing factors.
Diagnosis and treatment monitoring can be greatly improved if numerical values of laboratory markers are available. In several laboratories various tests, such as the enzyme-linked immunosorbent assay (ELISA), are based on colorimetric methods, where enzyme activity is used as a quantitative label. The ELISA is an easily standardized and readily automated, relatively inexpensive, highly sensitive and specific procedure, which requires small sample and reagent volumes. The accuracy and wide range of quantification with the ELISA method is still an open problem. This paper presents the improved and web based version of a software for analyte quantification that bases its quantification capability on optical density readings collected both during the colour formation phase and after the dispensation of the stop solution.
The Quanti-Kin Web has been developed within Microsoft Visual Studio 2015 using Microsoft SQL Server 2016 to record data and is now available at the address http://www.quantikin.com. The complete process of test management can be divided into five sections. A large quantity of old data related to the calibration curves of two important experimental centres and data produced in the same centres, but by training personnel visiting these centres, was collected. Therefore, with these types of data it was possible to evaluate the efficiency of the calculation engine. The previous program QKDS achieved good precision regarding the quantification of known amounts of p24, but the data presented were only produced in an extremely well controlled environment. The statistical analysis performed with the data collected by highly trained users shows that Quanti-Kin Web produced results similar to those presented in. On the contrary, during widespread worldwide routine use the performances in quantification were significantly lower than those obtained by well trained personnel and reported in. This aspect can greatly influence quantification results and the curves. Thanks to Quanti-Kin Web improved quantification algorithm, this problem can be overcome without affecting the quality of the experiment. In fact a strong check on the wells has been developed. By this way, the maximum error has been significantly reduced from 960.49 in the old version to 55.63 in the new one. The standard deviation was also reduced from 86.53% to 6.7%, the variance was reduced from 74.87 to 44.87 and lastly the mean error was reduced from 15.19 to 0.24. The data are calculated over experiment performed in many laboratories all over the word during the last three years.
The web deployment of the present tool makes its use very simple, as it does not require any installation and it assures a very fast execution. The data that are exchanged on the web are uniquely related to the amount of analyte present in the wells and not to the identity of the patient, so any restrictions due to the privacy laws of many countries are not affected.
The clinical data sharing represents a fundamental tool to improve the clinical research, patient care and reduce health costs. The Health Ministries of many developed countries are planning the creation of national health information exchange (HIE) systems by defining the functionalities to support the sharing of the knowledge of their content. To realize distributed system architectures able to satisfy this requirement, the management of semantics is a critical and essential aspect that must be considered. For this reason, a research is now underway to set up an infrastructure able to aggregate information coming from health information systems, and it will be experimented to support regional HIE in Veneto Region. In this paper the first steps of this research and the current implementation state are presented.
The first period focused on the semantics management in laboratory reports. As indicated by the Italian Health Ministry, laboratory reports must be structured adopting the HL7 Clinical Document Architecture Release 2 (CDA R2) standard and LOINC vocabulary. For this reason, LOINC was used as reference code system. To manage the semantics of the information involved in the contextual workflow, the design and the implementation of a terminology service was considered and the Common Terminology Service Release 2 (CTS2) standard, product of Healthcare Service Specification Project 3 , was adopted. In this phase, the authors selected 6 CTS2 terminology resources (codeSystems, codeSystemVersions, EntityDescripctions, Map, MapVersion and MapEntry) and, for all these, decided to start from the implementation of read, query, maintenance and temporal functionalities. The SOAP (Simple Object Access Protocol) was chosen as implementation profile and Microsoft Windows Azure was adopted as cloud platform to host both database and web services.
The proposed solution is formed by the regional HIE, 22 Laboratory Information Systems (LISs) of the local departments of the Veneto region, the terminology service, called Health Terminology Service (HTS), and an application to manage the content of the terminology database. The core of the architecture is the HTS that provides access to the terminology database through interfaces compliant to the CTS2 standard. At the present, the HTS is formed by a Microsoft SQL Azure database (the terminology database), and eighteen Windows Communication Foundation (WCF) services, which represent the CTS2 interface, hosted on Microsoft Azure. The first client application that was connected to the HTS was the web application used to maintain the content of the HTS terminology database. It is continuously evolving to satisfy both the needs of medical staff and the requirements that the Veneto region is designing to create the regional HIE and to manage the semantics of its content.
This paper presents the current implementation state of the infrastructure proposed to manage semantics in laboratory reports at regional level. In the next months, the technical specification will be defined for the integration of HTS with 4 out of 22 LISs and with the regional HIE. After a validation period in which the solution will be tested, an analysis will be performed to evaluate its impacts.

Introduction
We present a mobile software application development for safety reporting within the field of angioplasty. The application aims at supporting physicians with capturing and retaining data regarding safety events. A combination of Interaction design and User experience techniques was used to inspire usability 1 and create useful, intuitive interface. The consequence of not considering the user experience could be user frustration and the user looking for an alternative solutions to data capture. If f forced upon users, an application usage could increase the likelihood of mistakes increases, and reduce effectiveness. 2 Method To collect data and define system requirements a literature review and a field study were conducted which resulted in both quantitative and qualitative data. The data was analyzed to understand the data flow and clinical processes all with a purpose to enable a user keeping in touch with the whole hospital information system. To be able to utilize the users' skills and experiences within their domain, it was important to include them in the participatory design process. To get feedback on the concept, medical staff was given the screens together with explanation of the concept based on several levels of functionality.
results Proposed user interface enables entry of data specific for adverse events of the knee and hip implants. Besides the patient data, the system allows entry of the event classification (serious, non-serious) and treatment, as well as the connection of the database maintained within the Helse Bergen hospital system. Reports could be initiated and retrieved if there are previous adverse event instances. Expert evaluation of the first design solution was performed using low fidelity prototype. It has shown that design was relevant, straightforward, done in a way that official reporting would commence. A question was also asked if the system could be adjusted to general reporting.

Discussion
The design was met with enthusiasm by the healthcare professionals. However, it has been clear that there are reservations exist for reporting adverse events in general. The main reason seems to be a heavy work burden. There were also concerns about being viewed negatively by other medical staff. Attitudes towards reporting were not entirely negative, for example, the biomedical engineer lab that evaluates explanted medical devices would appreciate such a bed side reporting. Interviewed physicians accepted this point of view and did not entirely rule out their participation. Therefore, more work needs to be done to address attitudes towards reporting and lack of motivation for it.
conclusion The development is directed towards the high-fidelity prototype and further web-based system development that will enable more detailed reports. Those will be fit into the hospital information system and provide basis for other functionalities such as e-learning and other general reporting. Complex Chronic Patient management is of great difficulty, in a context that requires personalization of actions based on the complexity of the patient's condition over time. It needs to complement the recommendations defined in clinical guidelines from recommendations based on treatments performed on a representative set of patients, identifying conflicts between the recommendations of different guidelines designed for handling isolated chronic diseases. It also requires its extension to specific protocols in areas not described with sufficient detail in clinical guidelines in terms of safety and quality. The PITeS-TIiSS project aims to overcome this problem. Its main goal is to design and deploy a Clinical Decision Support System which helps to improve personalized decisions based on evidence and reduce variability in clinical practice in an integrated care domain. It will perform, integrated into the workflow of the healthcare professional, two types of recommendations related to the need to identify the duality between the best practice defined by consensus of domain experts and the analysis of the results obtained from patients with similar characteristics. From the review of the integrated care process of the pluripathological patient 1 and the existing clinical practice guidelines on the management of acute and chronic heart failure 2 and chronic obstructive pulmonary disease, 3 it will be defined decision rules that allow applying, automatically and personalized to the patient's conditions, clinical knowledge. It will also take into account cross-cutting tools such as the Stopp/Start 4 and Less-Chron 5 criteria as well as a prognostic scale called Profund index 6 . The process will be dynamic in order to improve its adaptation for changes in the reference knowledge and for the feedback on its use, introducing the concept of Learning Health System. In this study, the tool will access the information provided by the Health Information infrastructure of the Andalusian Public Healthcare Service. The integration of information will be carried out in a fast, consistent and reusable way. Final results will be reported in December 2018.

references
Introduction Registry Randomized Controlled Trials (RRCTs) provide clinical researchers with the ability and resources to ask important clinical questions and design studies without the inherent biases often introduced through trials which recruit by other means. RRCTs have three key characteristics: 1. Randomly assigning patients in a clinical quality registry which combines the features of a prospective randomized trial with a large-scale clinical registry. 2. They are more pragmatic and enable fast enrolment, control of non-enrolled patients, and the possibility of very long-term follow-up.
3. The clinical registry can be used to identify patients for enrolment, perform randomization, collect baseline variables, and detect end points.

Method Qualitative work for a pilot and optimisation study which builds upon recent developments in Canadian Research
Capacity: • The ability to extract, transform and link EMR data to administrative data for ascertainment of long-term outcomes in the $  Introduction Because the collection of mental health information through interviews is expensive and time consuming, interest in using population-based administrative health data to conduct research on depression has increased. However, there is legitimate concern that misclassification of disease diagnosis in the underlying data might bias the results. Our objective was to determine the validity of ICD-9 and ICD-10 administrative health data case definitions for depression using a review of family physician (FP) charts as the reference standard.
Methods Five trained chart reviewers reviewed 3362 randomly selected charts from years 2001 and 2004 at 64 FP clinics in Alberta and British Columbia, Canada. Depression was defined as presence of either: 1) documentation of major depressive episode, or 2) documentation of specific antidepressant medication prescription plus recorded depressed mood. Bipolar depression and alternate indications for antidepressants were excluded. The charts were linked to administrative data (hospital discharge abstracts and physician claims data) by unique personal health number. Validity indices were estimated for six administrative data definitions of depression using three years of administrative data.
results Depression prevalence by chart review was 15.9%-19.2% depending on year, region, and province. An ICD administrative data definition of '2 depression claims within a one-year window OR 1 discharge abstract data (DAD) diagnosis' had Introduction The closure of a coastal emergency department (ED) and the opening of a new inland site in metropolitan Perth, Western Australia, was expected to improve overall access to ED. The objective of this study was to examine the impact of ED relocation on different types of ED presentations.

Methods
To address this aim, ED presentations were first divided into urgent/non-urgent medical and urgent/non-urgent trauma (injuries and poisoning) based on triage categorisation and ICD-10 coding. The ED relocation occurred in February 2015. Each SA3 regions was modelled separately, comparing February to October 2014 to the same period in 2015, after adjusting for population. Estimates of the burden of ED utilisation attributable to 'distance to ED', were obtained using separate Poisson regression models for adults and children. Confidence intervals were estimated using a stratified bootstrap approach, at the 95% significant level.
results 13% of the entire population had their travel distance to the nearest ED decreased by at least 1km but on average 5km, while 5% of the population had their distance increased by at least 1km and on average 5km. The total number of ED presentations increased 7.1% within the region, with population growth of 3.6%. Areas near the new ED saw a significant increase in urgent and non-urgent medical presentations in adults. Change in distance contributed 30% to 70% of the increased in urgent medical presentations. The increase of non-urgent medical presentations attributed to change in distance varied between 20% and 40%. Conversely, significant decreases in both urgent and non-urgent medical presentations were observed in areas near the closing ED, with more than 200 fewer presentations in each category. For urgent medical presentations, these decreases were entirely attributable to the increased distance to ED. The increase in urgent and non-urgent trauma presentations attributed to decreased distance ranged from 9 to 15% and 15% to 30%, respectively.
Methods 50 foot ulcer cases were included. Data was obtained from the Danish wound database, Pleje.net, and was collected from January 2006 to October 2016 by Danish community nurses and wound specialists. 19 different wound features were tested. The features covered characteristics related to the three wound stages: the inflammatory phase, the proliferation phase and the maturation phase. We developed a pattern prediction model to forecast individualized development of diabetic foot ulcers into one of two classes: (i) healing, and (ii) no healing, of the diabetic foot ulcer. A positive prediction (test results) corresponds to the prediction (i) healing. In other words, the sensitivity is the number of cases where healing is predicted, in cases where healing actually occurs, divided by the number of cases where healing occurs. Since data includes both nominal and ordinal data types, a binary logistic regression classification was chosen. We used the five features with the highest separability between classes and a 2-folds cross validation.
results The mean age for the participants was 70 (±14) years, 22% were women, and the overall healing rate among the cases was 72%. The model for predicting healing of diabetic foot ulcers included the following features in the binary logistic regression model: 'callus', 'wound size', 'gender', 'epithelializing' and 'hypergranulation'. The ROC performance of the classifier is seen in Figure 1. The data yielded a cross validated area under the curve (AUC) of 0.66. The threshold can be set arbitrarily. When comparing the performance, one threshold will lead to a sensitivity of 83% and a corresponding specificity of 57%. This corresponds to predicting 8 out of 14 non healing wounds and predicting 30 out of 36 healing wounds. Choosing another threshold, a sensitivity of 63% and a specificity of 84% can be obtained, which corresponds to predicting 12 out of 14 non healing wounds and 23 out of 36 healing wounds.

Discussion
The literature has shown several relevant features for predicting wound healing. Most of these features, however, are not for the community nurses. We identified five features from the wound anamnesis which are readily available to the community nurses. Two of these features are used in other prediction models. The remaining three, however, 'epithelializing', 'hyper granulation' and 'callus' have, to our knowledge, not been used in other models.
conclusion In conclusion, features from community nurses' wound anamneses are relevant features when predicting wound outcome.

Francois-Andre Allaert, Chaire d'évaluation des allégations de santé, ESC et Cenbiotech, DIJON
Intoduction For several decades clinical and observational studies of wounds and wound healing were conducted on paper case report form (CRF) or electronic CRF corresponding to inclusion and follow-up medical visits and between these visits on questionnaires which were rarely daily filled up by the patients and/or the nurses and associated with photos of the wound.

Methods
The use of Information Communication Technology and especially Smartphone applications can overhaul this traditional framework of study by introducing closer follow-up of the wound healing process, systematized image capture with few constraints, and even the more active involvement of those must concerned -the patients themselves. The main difficulty to solve is not the development of a Case Record Form on a smartphone being able to manage data, pictures and analogic scale but guaranty the medical secrecy and the protection of medical personal data in accordance to the European directive on data protection.
results We develop a system of cryptography allowing the involvement of medical practitioner, caregivers and patients themselves in the data records using asymmetric algorithms which have been validate in the framework of a national wound observatory by the French health and data protection authorities. Briefly summarised, the physician downloads the Nurstrial Application and the system proposes to him/her to enter the data for the initial visit, follow-up visits, and undesirable events for the different patients with wounds. For each new patient, the system randomly generates a unique cryptographic identifier using a process leading to the internal creation of a cryptogram encoded on 128 bits generated by UUID-V4 combining hash algorithms and pseudo-random generators. Patients have access to the application via a special-purpose procedure for sequencing information sent in the course of the various transmissions with information sent by the physician, with no risk of collision or duplication, without ever having access to any nominal or identifying information via a process described earlier using a cryptographic identifier encoded on 128 bits generated by UUID-V4 that combines hash algorithms and pseudorandom generators. When information is not sent to the database, a message is sent to the doctor or the patients without knowing their identity.

Discussion
The main limitation on the Nurstrial® system is of course the need to have an i-phone or Android type smartphone but these systems now make up more than 85% of the total number of recent smartphones in France. The Nurstrial® application operates on smartphones with an Android 4.1 operating system and higher and on Apple smartphones with iOS 7 and higher. Here again, few people use old telephones and the phone companies and smartphone manufacturers are the first to urge people to update regularly to new versions.
conclusion Nurstrial® is a real step forward in terms of quality for the conduct of clinical and observational studies in the area of wound healing. It enables information to be recorded in real-time in the form of text and images that are collected and potentially pooled by all of the actors in the healthcare chain: physicians, nurses, and patients too. results Results were stratified by age (<75 and >75yrs) because of significant age interaction, p<0.001. < 75 yrs UGIB significant risks (p<0.05) included: DOAC and PPI use, liver disease, high alcohol consumption (≥28 units/week), SMI and depression, stroke, SSRIs, male gender, asthma, smoking, interacting antibiotics and age for >75yrs PPI use, depression, stroke, male gender and age increased risk of UGIB. We found no association of steroids with asthma and including steroids in our models did not alter bleeding risk estimates. < 75 yrs ICB significant risks (p<0.05) included: AP, AC and SSRI use, dementia, male gender, African ethnicity, heart failure, hypertension, ≥CKD 3, smoking and age for >75yrs significant risks were AC, AP use, dementia, SSRIs, male gender, heart failure, African ethnicity and ≥CKD 3, with protective effects for alcohol intake <28u/wk.

Discussion
Additional to recognised risk factors we identified UGIB risks increased with several comorbidities including SMI, depression and asthma, and alcohol intake ≥28u/week. For ICB risk, main comorbidities included dementia, heart failure, hypertension and ≥CKD 3 with protective effects for alcohol intake <28u/week. references Introduction Molecular interplay plays a central role in basic and disease biology. Patterns of interplay are thought to differ between biological contexts, such as cell type, tissue type, or disease state. Many high-throughput studies now span multiple such contexts and the data may therefore be heterogeneous with respect to patterns of interplay. This motivates a need for statistical approaches that can cope with molecular data that are heterogeneous in a multivariate sense.

Methods
In this work, we exploit recent advances in high-dimensional statistics 1-2 to put forward tools for analysing heterogeneous molecular data. We model the data using Gaussian graphical models, 3 and develop two useful techniques based on estimation of partial correlations using the graphical lasso: 4 a two-sample test that captures differences in molecular interplay or networks, and a mixture model clustering approach that simultaneously learns cluster assignments and multivariate network models that are cluster-specific.
results We demonstrate the characteristics of our methods using an in-depth simulation study, and proceed to apply them to proteomic data from The Cancer Genome Atlas (TCGA) pan-cancer study, 5 consisting of protein expression measurements for 181 cancer signalling proteins in 3,500 patients spanning 11 different cancer types. We first test for pairwise network differences between cancer types. Subsequently, we use the mixture model to identify clusters of patients that present similar protein signalling networks and we visualize the networks.

Discussion
Our analysis of the TCGA data provides formal statistical evidence that protein networks dier significantly by cancer type. Furthermore, we show how multivariate models can be used to refine cancer subtypes and learn associated networks.
conclusion Our results demonstrate the challenges involved in truly multivariate analysis of heterogeneous molecular data and the substantive gains that high-dimensional methods can offer in this setting. references require a clear view of the programs' exit qualifications, the courses' learning outcomes, the position and relation of courses to the program, and links between the courses in terms of topics covered. Without tooling, grasping a program's contents and keeping it consistent, considering its evolution, is challenging. This abstract discusses ongoing work on the development of an ICT tool to visualize and analyze curricula to facilitate their consistent management.
Methods A literature study was performed to identify similar tools and develop an information model that provides the foundation for the tool. A prototype was developed and filled with information of the two Amsterdam programs. The prototype was used to gain insight in user experiences and formulate future improvements.
results All information and visualization presented in the tool is generated from structured data in the underlying curricula database. The tool provides an overview of the courses, credit size, learning tracks, learning goals and references to IMIA topics. This way of handling information enables different views on a curriculum that can also be used for analytics. For example, a view of a learning track, that has different activities in different courses, can be shown comprehensively and chronologically. It indicates where different activities take place and what learning objectives are served. This enables educators to verify and maintain this track. Another example, analytics can be performed on the recorded references to IMIA topics. A heatmap is created to reveal focus areas in a program. A final example, the tool can visualize the relation between program's exit qualifications and individual course objectives, including if required skill levels are met.

Discussion
The usage of the tool stimulated discussion amongst educators. For example, different overlapping topics where identified and a more consistent program could be realized. Also terminology proved to be ambiguously interpreted amongst educators in the program. The tool enabled the identification of these differences in understanding and foster discussions using real examples. Additionally, the tool provided analytics to uncover deeper characteristics of a program. A future version of the tool will allow educators to change and add information by themselves. This makes curricula design a shared responsibility. The tool will stimulate collaboration between educators who can be supported in developing courses in a structured way using pre-defined protocols and auditing processes. Also secondary education information will be (partially) generated, for example assessment plans, and completed by users with less overhead compared to a non-automated process.
conclusion The curricula visualization and analysis tool provides a way to consistently monitor and maintain an education program. It stimulates shared responsibility, discussion and provides opportunity for lessening administrative burden.

Huib Ten Napel, INSERM U1142 LIMICS, Paris
Since 2006, the WHO-FIC network has been developing an International Classification for Health Intervention (ICHI) based on an ontology framework defined in ENISO 1828 named Categorical Structure, with 3 axes and 7 characters. It was planned that ICHI should encompass the granularity of the ICD 9CM Volume 3. After several tests with the ICHI alpha 2 (May 2016) version, we analysed from bottom up 574 ICHI alpha 2 codes by modelling them in XML, and show that the existing coding structure does not allow a semantic representation necessary to ensure interoperability with other existing coding system for medical and surgical interventions. We have thus developed a more refined version of ICHI using XML model: using as the root element, which is a set of elements in the XML schema, we have included 3 attributes, code, interventionType, and title, to each element. Within an element there are 7 further elements: title, comment, linkedClassification, content, composition, inclusion, and exclusion. The element is composed of three axes (Target, Action and Means), and an axes can hold more than one object type. The element allows distinction of procedures that contain multiple interventions. Our XML model of ICHI will be able to cover the problems of granularity of the previous model.

Introduction
The electronic health record (EHR) is the major repository of clinical data in the NHS. It is a huge potential resource but remains severely under-utilised to the point that very little of the UK's research output is based on routinely collected clinical data. The reasons are complex but ultimately reflect the fact that these data are rarely entered into the hospital EHR in a form that allows for their organized storage and digital download.

Methods
We have developed a SNOMED-based electronic power-form comprising a user-friendly interface for real-time entry of clinical data into the EHR during cardiac outpatient consultation. Our aim was to capture outpatient clinical data in a form that allows for automatic development of summary patient reports and for batch download of de-identified data for audit and research.
results During the first 4 months after installation of the power-form, consultant utilisation averaged 60% for the 327 new patients seen during that period. Presenting symptoms, examination findings, investigations, diagnosis, initial treatment and disposal (>120 fields) were entered in real time during consultation and a structured summary report was developed. This was made available for electronic transfer directly into the patient's EMIS file in the primary care record, permitting same-day delivery of the report and obviating the need for a dictated clinic letter. Batched download of the digital data was successful, with sample analytic findings as follows: • Patient ethnicity: S Asian 44%, white 34%, black 15% • Presenting symptom: chest pain 41%, dyspnoea 11%, palpitations 10%, dizzy attacks/syncope 8%, hypertension 7% • Diagnosis: non-cardiac chest pain 24%, angina 11%, coronary disease 7% • Disposal: discharged to GP 74%, follow-up appointment 11%, cath lab waiting list 6%, referral to specialist clinic 9% A total of 58 GPs and 37 patients have been surveyed on the utility of the report. Satisfaction has been reported with same day delivery of the summary report, its layout and the adequacy of the information provided for patients' understanding of their condition and GPs' clinical needs. Using a 5 point Likert scale (1=much less useful -5=much more useful) both GPs (average Likert score 4.32) and patients (average Likert score 4.62) find the outpatient report to be more useful than the conventional dictated clinic letter.
Introduction Administrative Health Databases (AHD) have been widely used in Italy, some dating back two decades or more. Epidemiological observations from AHD data can be useful to stakeholders to support health policies and services. AHD scope and availability for epidemiological studies in Italy are not well known or documented. A Research project from the SISMEC Working Group on Observational Studies was funded by the Italian Ministry of Health and the Puglia Region, to perform a census of Electronic ADH in Italy ('Electronic health databases as a source of reliable information for effective health policy", Project RF-2010-2315604). The project aimed at evaluating methodological issues related to the use of AHD for epidemiology, and focused on the public regional health administrations.
Methods A census was completed in 2016 after sending questionnaires to the various regional administration contact persons for AHD, that receive mandatory data from hospitals and local health units. In 2 out of 21 information on AHD was directly gathered from institutional web sites. Several features were collected for AHD, including type, time span, population coverage, missing data and quality, IT system, unique linkage key and anonymization. A web site was created to make this information publicly available (http://www.sismec.info/arches).
results The survey found 349 AHD, pertaining to 29 types, from 21 Regional Health Administrations. The results documented for the first time their detailed features, and specifically those concerning linkage keys and privacy protection. The number of AHD per region varied between 6 and 39 the most represented types were home and residential care data over 65% of AHD report protection of anonymity. Linkage key were available in 67% of AHD, and were based on local regional procedures. The survey confirmed that AHD in Italy are fragmented at the regional level. The different regional jurisdictions of local government manage the regional data on independent IT systems, because of implementation of IT after the approval of Constitutional laws in 2001 devolving health legislation to regional governments, resulting in a fragmented national context. Although many of those AHD are then merged by the Ministry of Health, the opportunities for nation-wide observational studies on secondary administrative health data collected in AHD are unclear, and any independent study proposal would run into several barriers, due to privacy regulations, confusing process of approval, and heterogeneity of AHD. At present any data-linkage procedure across regions incurs in the barrier of different pseudo-anonymous identification codes being used in different regions.
conclusions Several problems affect the feasibility of nation-wide observational studies on secondary data from the wealth of AHD in Italy. This is especially true for epidemiology researchers, interested in research rather than in organisational analyses. However, independent research can provide the Italian Health System with new, fresh insights that could expand the borders of health systems routine monitoring. Problems of privacy protection, heterogeneity and fragmentation could be addressed at a national level, taking advantage of experience from other countries. Presently in Italy patients flow freely across health services, information about their care does not.

Albert King, Scottish Government Education Analytical Services, Edinburgh
Introduction National Records of Scotland (NRS) provide the Trusted Third Party (TTP) Indexing Service on behalf of the Scottish Informatics and Linkage Collaboration, encompassing data linkage projects supported by Farr Institute Scotland, Administrative Data Research Centre -Scotland, Urban Big Data Centre and the Scottish Government. The role of NRS is to match the personal identifiers submitted by data controllers to the national research spine and generate study and datasetspecific index numbers. These indexes are used to link pseudo-anonymised records accessed by approved researchers in a safe haven. To avoid both retaining linked research datasets, and repeatedly sharing personal identifying information from datasets required in multiple projects, NRS are developing a series of "read-through" index keys, which can be re-used and facilitate safer and more efficient data linkage.
Methods NRS are in the process of agreeing a series of memorandums of understanding (MoU) with various Data Controllers of administrative datasets in order to establish safer and more efficient linkages of datasets. Under each MoU, the data controller asks NRS to process their dataset by linking it to the national research spine and creating anonymised "read-through" index keys which the Data Controller will hold at the person-level on their own dataset. NRS will maintain a look-up of the "readthrough" against the spine. This means for approved research studies involving their already indexed data, the Data Controller just needs to send the read-through keys (without any other personal identifying information) for the people comprising the study cohort, to the indexing team to generate study-specific keys in the usual manner. It can also allow the data provider to receive the "read-through" keys and study-specific index numbers from the TTP, when the research cohort originates from a dataset held by a different data controller.
results So far MoU's have been agreed with NHS Scotland for national health data , for primary care data from consenting General Practices, Scottish Government Education Analytical Services for school pupil census and Communities Analysis for housing data, University of Edinburgh for Scottish Mental Surveys, and NRS for Census and Vital Events data.

Discussion
We anticipate that the creation of read-through indexes will deliver the following benefits without the need to increase personal data held by the TTP Indexing Service or Data Controllers, and in a way which preserves Data Controllers direct control over the use of the data they hold: • Reduced privacy risks as individual identifiers required for indexing would be shared once rather than on a project-by-project basis • Increased use of administrative data for research with benefits to public policy and academic research outputs that inform practice in health, education, and other fields • Reduced burden on data controllers as identifiers would need to be extracted only once • Improved efficiency of linkage as the indexing team would need to carry out indexing only once conclusion Utilising read-through keys considerably cuts down the amount of personal data which regularly have to be transferred to NRS, and then matched using probabilistic methods to the spine on a project-by-project basis.

Yao-Ting Chang, Chung-Chieh Hsu, and Feipei Lai, Department of Computer Science and Information Engineering, National Taiwan University, Taipei
Introduction To provide the safety and high quality health care and to provide the telehealthcare for elderly people by information and communication technology, we proposed a knowledge-based telehealthcare smartphone application (APP) with the artificial intelligence mechanism in intelligent disease management. The aim of this APP was to enhance the early recovery of elderly patients who received surgery.
Methods This study investigated a smartphone application developed to serve the functions of drainage follow-up, nutritional monitoring, symptom management, activity management, and wound care. To provide real-time remote care, we also designed a platform for the patients and medical staff to permit reviews of the records.
results We have implemented a smartphone application for both Android and iOS versions. The APP could automatic provide a summary of the patient's health status based on the measurement records. According to the previous preliminary results, twenty patients at the National Taiwan University Hospital received perioperative care via this APP as the telehealth group. During the study period, we retrospectively collected an additional 20 demographically matched cases as a control group. The telehealth group had a lower body weight loss percentage relative to the control group during a 6-month follow-up period (4.8 ± 1.2% vs. 8.7 ± 2.4% p <0.01).
Discussion Although the telehealth group had a lower body weight loss percentage, they had more outpatient clinic visits than did those in the control group (9.8 ± 0.9 vs. 5.6 ± 0.8 p <0.01). In the future, we will conduct a further study for the clinical effectiveness and cost-effectiveness of elderly patients who received surgery.
conclusion This study supported the feasibility of a smartphone application for the perioperative care of patients to promote a lower body weight loss and the collection of comprehensive surgical records. With the advanced functions of this APP, we expect to acquire further clinical evidence to encourage and add further support to implementation of telehealth as part of surgery care, especially in elderly patients.
Introduction Asthma is among the most common chronic conditions in childhood. We aimed to develop and validate robust statistical models to predict asthma at 8 years of age using three Machine Learning methods.
Methods The data come from 3 UK cohorts in the STELAR consortium. We studied 1,145 children from Ashford and Aberdeen and externally validated the predictive models using data on 348 children from Manchester. Information on characteristics of the children, family related factors and asthma-like symptoms were collected at recruitment and at 1 and 2 or 3 years of age. We defined asthma at age 8 by the presence of at least two of the following: (1) current wheeze (2) asthma treatment (3) a doctor's diagnosis of asthma the prevalence was 65 (12%), 87 (11%) and 49 (14%) in Ashford, Aberdeen and Manchester, respectively. We developed predictive models using penalized regression methods (LASSO and Elastic Net, EN) and an empirical Bayes regularization method. These models simultaneously perform coefficient estimation and variable selection. The amount of shrinkage towards zero of the regression coefficients is controlled by hyperparameters that were chosen based on 10 fold cross-validation. We used a Normal-Gamma hierarchical prior distribution for the empirical Bayes binomial model in order to account for highly correlated variables. We externally validated these models and assessed their predictive performance by discrimination and calibration measures.
results The LASSO, EN and empirical Bayes regression models selected 20, 23 and 19 predictors, respectively, from the initial 61. History of parental allergies and doctor's diagnosis of eczema, the absence of a dog in the house, and antibiotic use at the age of 2 years were found to be important predictors of asthma at 8 years in all predictive models. Other predictors selected include paternal smoking, wheezing symptoms, hospital admissions and birth order. Overall, predictive models showed good accuracy (0.67, 0.64 and 0.69 for LASSO, EN and empirical Bayes, respectively). Sensitivity, specificity and negative predictive value were high (0.84, 0.65 and 0.98 for LASSO, 0.90, 0.59 and 0.96 for the EN and 0.82, 0.67 and 0.97 for empirical Bayes, respectively), whilst positive predictive values (0.28, 0.27 and 0.29 for three methods respectively) were generally low. All 3 methods reported an area under the ROC curve of 80%, showing good predictive performance and favourable discriminative ability to distinguish subjects with and without the disease.
Discussion After validation, our predictive models demonstrated good discrimination ability for asthma. Overall, the empirical Bayes method selects the most parsimonious model and provides better accuracy and predictive ability, at the expense of a lower sensitivity, compared to the other two methods. On the other hand, LASSO and EN provide very similar results with a higher accuracy in the first approach.
conclusion This multicentre study of asthma-like symptoms in children, combined with novel statistical methods, demonstrates promising results in predicting asthma. The predictive performance in terms of positive predictive value may be further improved with the use of additional predictors and a more targeted population.

Soo-Yeun Lee, Inha University Hospital, Incheon
Introduction Most electronic health record (EHR) systems containing electronic nursing records (ENRs) are not based on standards that facilitate semantic interoperability. We hypothesized that reorganizing nursing data into a standard format would allow the sharing and comparison of nursing data across settings. We tested the eMeasure process of the National Quality Forum using nursing data obtained in specific ENR environments, and validated the results based on manually abstracted existing reports. Inpatient fall prevention was selected as a nursing-sensitive quality measure.

Methods
This study was conducted in several steps: (1) establishing a project team, (2) developing a data dictionary by reviewing eight international and national practice guidelines, (3) identifying evidence-based data elements and an indicator map, (4) mapping the local terms to concepts in reference terminologies, and (5) representing indicators and validating the process by comparing those obtained by manual abstracting. We used the current definitions of quality indicators for inpatient falls and standard nursing terminologies (the 2015 releases of the Logical Observation Identifiers Names and Codes [LOINC] and the International Classification for Nursing Practice [ICNP®]). The nursing data of 7,829 and 8,199 patients from 2 Korean hospitals with different ENRs were used to represent indicators and validate the process.
results The identified data dictionary contained 45 data elements that were categorized into 53 concepts. These concepts were mapped onto LOINC and ICNP with coverages of 75.5% and 54.7%, respectively. The indicator map derived from a review of 10 practice guidelines identified 11 process indicators (e.g., the percentage of patients assessed for fall risk within 24 hours of hospital admission, and the percentage of patient days at risk of falling) as well as two outcome indicators (fall incidence and the percentage of falls with injury). These outcome and process indicators could be successfully represented using data from the two ENR systems, but the process indicators were not available for the manual abstractions.
Discussion In this study we were able to quantitatively represent quality indicator matrix in a form that was comparable with that used in other hospitals. The process indicators were not measureable for the manual abstractions. For the hospital that did not have an explicit policy and governance for data structures, this post-implementation solution showed several limits in mapping and reorganizing of data, which were labor-intensive and troublesome. This finding was typically observed when we determine whether the data elements in the data dictionary can be extracted from the two ENR systems. We could find significant differences depending on whether the data element was captured from a structured format or a semi-structured format. Considering the concept mapping with standard terminologies, there were significant gaps. An unexpected finding was a new detection of fall events, not reported to internal reporting system, through the data analysis on narrative nursing notes from the both of hospitals.
conclusion Reorganizing nursing data from specific ENR environments into a standard format allowed quantitative representations of inpatient falls successfully. This implies that nursing-sensitive outcome measures can be shared and compared throught the utilization of clinical nursing data from multiple ENRs.
Abstract no. 571 Demonstrating the feasibility of using electronic health records in genomewide association studies: a case study in the UK biobank

Informatics Research, Institute of Health Informatics, University College London, London
Introduction Genome-wide Association Studies (GWAS) use cases and controls from investigator-led studies, with cases often defined using manually curated medical record data. Along with the decreasing cost of genotyping, there is increasing demand for larger sample sizes to detect smaller effects. Large scale bio-banking efforts have established cohorts with >100K participants. To define cases in these cohorts it is no longer feasible to use manual approaches, and self-reported data is limited by its lack of accuracy and phenotypic resolution. To overcome these challenges, structured health data through electronic health records (EHR) is increasingly being made available. In this study, we sought to explore the performance of an EHR-derived phenotype of myocardial infarction (MI) for a GWAS in a national biobank, with a view of comparing our findings to published studies using conventional case ascertainment.

Methods
The UK Biobank is a cohort with 500K middle-aged participants recruited from England, Scotland and Wales. Genotyping was performed using two Affymetrix Axiom arrays. We applied a previously validated MI phenotype algorithm (https://www.ucl.ac.uk/health-informatics/caliber) using secondary care diagnostic codes from hospitalisation (Hospital Episode Statistics) and mortality (Office of National Statistics) for UK Biobank participants, in a sub-cohort of 112,142 participants who were genotyped as of June 2015, to define participants with prevalent or incident MI (cases) and those without MI (controls). We used logistic regression to test the association between 10 million imputed genetic variants (expected allelic dosages) and MI whilst controlling for the effects of sex, batch, array and centre as well as principle components 1-15. In order to test the validity of our results we extracted all known genome-wide signals for MI and systematically compared these with our results.
results Within the sample studied, we identified 3,408 MI cases (mean age 62, male 78%) and 108,734 controls (mean age 57, male 47%) derived from EHR. Baseline characteristics for MI cases were similar to those reported in published MI GWAS studies (62% smokers, 60% using statins). QQ-plots showed little inflation of the test statistic (lambda EHR=1.02). After adjustment for covariates we identified 69 variants in two chromosomal regions showing genome-wide significance (<5x10-8). The most robust association was for rs944797 on chromosome 9 (Risk Allele: C OR: 1.16 P: 1.4x10-11) which was a comparable estimate to that identified previously.

Discussion
Using EHR to define cases of MI, we were able to replicate several previously reported genome-wide associations, which had used conventional case ascertainment. EHR derived MI cases also had characteristics consistent with those that were expected in traditional cohort studies. We did not identify all known associations but this is likely due to statistical power. Whether an EHR-derived approach for case ascertainment out-performs, a self-reported approach remains to be tested.
conclusion EHR-derived phenotypes offer a viable alternative to manual phenotyping at a lower cost and at higher clinical resolution and can accelerate advances in precision medicine though large-scale GWAS.

Methods
We obtained practice-level information covering over 99 percent of the registered general practice population and attributed to Lower Super Output Areas (LSOA) in England. Negative binomial models were fit to investigate the relationship between spatially estimated recorded quality of care and suicides. In order to measure quality of care we aggregated all indicators from the two mental health domains of the QOF, i.e. depression and serious mental illness (SMI), into a composite score. Analyses were adjusted for deprivation, social fragmentation, prevalence of depression and serious mental illness, as well as census variables.
results Overall, no significant relationship was found between practice performance on the mental health indicators of the QOF and suicides in the practice locality ( conclusions For those practices that participate in the scheme, higher reported achievement of mental health specific activities incentivised in the QOF was not associated with significant changes in suicides. These findings suggest implications for the effects of other similar programmes on suicide prevention. Introduction Patient monitoring systems capable of accurate recording in the real-world, during the activities of everyday living, can provide rich objective accounts of patient well-being that have broad application in clinical decision support. Combining physiological, environmental and actigraphy sensing together with a quantified subjective patient report and activity log, provides new opportunities and new challenges in big data analysis, data mining and visual analytics. Method An iterative prototyping approach together with clinical collaboration informed the design and development of a novel 24hr sensing system with broad application relevant to sleep assessment. The system design, sensor selection and visual analytic strategies were informed by literature review and pilot studies with i) clinical staff and ii) healthy participants.
The sensing system comprised, i) a daytime wearable sensing unit (on-body accelerometry for Metabolic Equivalent Task, pulse, skin temperature and resistivity) and ii) two night-time sensing units (an on-body unit as per daytime but with wrist accelerometry, and a bedside unit for ambient light, temperature and sound-level). Continuous recordings were used to generate averages, minima and maxima in 1-minute, 15-minute, 1-hour and 4-hour intervals. For data mining and visual analytics, these records were combined with quantified accounts of subjective user reports and activity logs. Ten subjects (including three clinicians) tested the system for up to three consecutive days and nights and provided assessments of use and comfortability. Five clinicians were interviewed regarding system applications, barriers to use, data use and visual analytics.
results Data acquisition was successful across a wide range of MET levels. System comfortability was good but with some discomfort and skin irritation arising from prolonged use of a carotid pulse sensor (selected for its robust performance compared with wristband alternatives). Electrooculography sensing for REM sleep detection was attempted but was uncomfortable and performance was unsatisfactory. Usability of the system benefitted from prolonged battery operation. Few data losses resulted from user-administration of sensors, but more resulted from a lack of prototype ruggedisation. Attempts at intuitive multivariate data visualizations, including heat maps, motion charts and clustered views, had limited success. However, the system and approach was assessed as very good for real-life application and decision support.
Discussion 24hr outpatient sensing has wide clinical application in rehabilitation, in the management of chronic conditions and, in pre-and post-surgical assessment. However, better detection of both low level activity and sleep is required than currently available in commercial activity monitoring devices.
conclusion Multi-modal outpatient monitoring can perform robustly and with acceptable comfortability across a spectrum of activity types and levels, however, system robustness and ease-of-use are paramount to reliability, and users' self-application of sensors requires careful attention.
Introduction Interrogation of routine electronic health record (EHR) databases often involves repetitive programming tasks, such as manually constructing and modifying complex database queries, requiring significant time from an experienced data analyst. The objective was to develop a tool to automate the selection and characterisation of cohorts from primary care databases to be used by data analysts and researchers.

Methods
We identified a set of common elementary approaches to query clinical variables from the primary care database of the Secure Anonymised Information Linkage databank. We then designed an easy-to-use web-based user interface to allow using combinations of these approaches as 'building blocks' for querying more complex variables. We created an R programme to automatically generate and execute the corresponding Structured Query Language (SQL) queries.
results The developed prototype allows researchers to query clinical information from primary care databases based on the following elementary variable types: (1) count of events of interest (e.g. asthma prescriptions) or their distinct dates (2) the code or date of the earliest or latest event of interest (e.g. type of the earliest smoking cessation prescription) (3) the code or date of the event of maximum or minimum value (e.g. maximum BMI recording ever) and (4) count of events of interest having complex temporal constraints with other events (e.g., count of asthma doctor visits with oral steroid prescriptions within one week). Researchers may choose fixed, dynamic, or individualised query intervals. Algorithms are saved on a web server as versioned and sharable objects. The prototype integrates with a Read Codes dictionary and a sharable codeset repository allowing researchers to keep a record of codes used for reporting transparency.

Discussion
The developed prototype provides a scalable, versatile solution for the implementation of complex cohort selection and characterisation algorithms using primary care databases. The automatic generation of SQL queries reduces human errors and should enable rapid and scalable implementation of these algorithms, which has the potential to improve research efficiency and reproducibility. In addition, the graphical user interface allows researchers with no programming skills to interrogate the data. The tool is under active development to improve the functionality and usability, and we look forward to testing it in other databases and assessing its suitability in different research contexts. We plan to make this tool available under an open source licence.

Wattamon Srisakuldee, George and Fay Yee Centre for Healthcare Innovation, Winnipeg
Introduction New healthcare treatments such as prescription drugs and surgical procedures are often tested in randomized clinical trials (RCTs) or evaluated in cohort studies. RCTs, in particular, can give an accurate picture of the benefits and harms of new treatments in the short term. But many treatments continue to be used for decades, meaning that RCTs or cross-sectional cohort studies do not provide a full picture of their long-term effects. In the past it was common for data to be archived when a study was finished. There is a worldwide movement to make data available for reuse in order to check the accuracy of original findings, look for new benefits and harms, and measure long-term benefits and harms. The latter can be done by linking the original participant information with data from large administrative databases. Canadian provinces provide universal health care that generates extensive records, which can be linked to the information collected in existing RCTs and cohort studies. The objective of this study is to: (a) describe the process to develop a repository in the province of Manitoba, Canada containing descriptive information (i.e., meta-data) about trials and cohort studies conducted in the last ten years, and (b) identify barriers and enablers of data reuse studies.
Methods Study participants are principal investigators who have conducted a RCT or cohort study that meets the following criteria: (a) the study captures information about one or more of the following health domains: health status, factors that influence health status, health care, public health, and health-related interventions, (b) the study collects data on Manitoba residents, and (c) the health data must come from studies completed between January 1, 2007 and December 31, 2016. Principal investigators were identified via contacts with research offices at all provincial universities and clinical research departments at hospitals, health regions, and related provincial and regional organizations. The collected meta-data includes characteristics of the patients/cohort participants (i.e., age group, sex, disease characteristics), characteristics of the study measures, data custodian/trustee, and willingness of the principal investigator to initiate data sharing agreements and/or participate in data linkage projects. Meta-data are collected using an on-line tool developed with REDCap software.
results To date, we have identified more than 80 principal investigators who have been contacted to provide meta-data. Data collection is in process. The collected data will be used to establish a publicly-accessible online meta-data repository.
Discussion This study will help to identify the characteristics of study data that could be reused in new investigations, as well as potential methodological, logistical and ethical challenges associated with data reuse. The study results will be used to develop focus groups with members of research ethics boards and research review committees to identify issues associated with investigator requests to reactivate trials and extend cohort studies via record linkage. The study is currently being replicated at two sites in the province of Ontario, Canada to assess the feasibility of implementing it on a national basis.
conclusion The results from this study will be used to propose best practices for data reuse focusing on data linkage. Collectively, this study will help to impact the reuse of health data in Canada to improve patient care.

Soo Hong Lee, Yonsei University, Seoul
Introduction In order to enhance the safety and efficiency of surgery, systematic information service to support various stakeholders is necessary. A smart system supposed to comprehend whole procedure of the operation and be able to provide with intelligent service such as proactive data mining and generation of warning at appropriate timing is being developed for the support.

Methods
Workflow is a formalized model of a certain operational process. Through the model, the system understands the whole process in advance, reckons current progress according to the contextual information, and provides necessary service in timely manner. IDEF0 was used for the representation of the functions and relations in the model. Information, data, knowledge, context and instances pertinent to the surgical operations have been analyzed and formalized. Through the function deployment, necessary functionalities for the intended software system have been defined.
results The resulted software named SWORM (Surgical WORkflow Manager) consists of five modules, namely, DB management module, Adaptation module, Surgery Planning Module, Surgery Recording Module, and Visualization module. The mobile application supporting relevant personnel in moving also has been constructed in a hybrid app development environment. This app is written with web technologies and currently runs on Android. At the server side, the SWORM uses HTTP and MySQL. Based on SWORM which is a kind of middle ware, pre-operative, intra-operative, and post-operative services are to be implemented. SWORM can be integrated with legacy systems including EMR or HIS and make itself a platform for applications.

Discussion
The system has been applied to the preparation stage of Maxillofacial surgery for trial use, and evaluated by its developers and users. Feedback from a surgeon is the usefulness of setting up and continuous refining of personalized pre and intra operative process. The anaesthesiologist has expected merits in the perspective of safety as operations would possibly be more systematically monitored.
conclusion SWORM is continuously improved reflecting the feedback from surgical participants, and service modules based on SWORM platform are being added. The objective of workflow management is to maximize the usability of human and material resources and to prevent the medical accident during the entire stages of surgery from pre to post operation. Ultimate goal of the research is the implementation of intelligent agents to support various medical staff.
Introduction Accident and Emergency department (A&E) performance against the UK's 4-hour waiting time target is a key metric used to assess hospitals. The flow of patients through the hospital as a whole has been identified as a factor affecting A&E performance. A hospital can be considered as a network of wards that are connected when patients are transferred between them. This network science approach enables a global perspective of the complex dynamic flow of patients through a hospital.
Methods Data on A&E attendances, waiting times and patient transfers between 01/12/2014 and 01/07/2016 were extracted from the electronic medical record at King's College Hospital NHS Foundation Trust. Only transfers for admitted non-elective patients at both hospital sites (Kings College Hospital Denmark Hill, DH, and Princess Royal University Hospital, PRUH) were included. Patient transfers were modelled as a weighted directed graph in which nodes represent wards and edges represent transfers of patients between wards. Edge weights represent the proportion of all transfers in any given day that each edge accounted for. Discharge from the hospital is represented by an edge to a virtual exit node.
results Overall, the PRUH network consists of 72 nodes (wards) and 921 edges, and the DH network contains 78 nodes and 1531 edges. We identified a "core" set of these edges that are present in the network every month. This core set is a small proportion of all edges (21% for PRUH, 14% for DH), but accounts for the majority of all transfers (91% for PRUH, 85% for DH) and is likely to be critical to the flow of patients through the hospital network. If network-level changes affect A&E performance, the properties of the transfer network on any given day should predict performance the following day. Unsupervised clustering (PCA) of daily transfer networks separated the highest-performing 10% of days from the lowest-performing 10% in both sites. There is also a clear separation of the transfer networks on weekends vs. weekdays for both sites. For DH, the edges that contribute most to the separation of the best and worst days form clear pathways from admission to discharge with consistently higher or lower flow, whereas in PRUH the differences in flow tend to affect more individual wards.

Discussion
Since the best and worst performing days can be separated using the properties of the ward network on the preceding day, the network-level changes associated with poor A&E performance could be a driver of performance. This is consistent with the previous suggestion that the occupancy of wards downstream of A&E is a key determinant of A&E performance rather than A&E attendance rates. Analysis of paths through the network shows that a small subset of edges accounts for the majority of patient flow. These pathways could be key targets for efforts to improve efficiency, particularly in times of crisis.
conclusion Patient transfers within a hospital can be naturally described Abstract no. 603 Development and validation of various phenotyping algorithms for diabetes mellitus using data from electronic health records

Santiago Esteban, Manuel Rodriguez Tablado, Francisco Peper, Yamila Mahumud, Ricardo Ricci, Sergio Terrasa, and Karin Kopitowski, Servicio de Medicina Familiar y Comunitaria, Hospital Italiano de Buenos Aires, Buenos Aires
Introduction Recently, the progression towards precision medicine has sought the development of large databases allowing to assess the impact of risk factors or treatments in specific subpopulations. This is usually a problem for classical cohorts, given the difficulty of enrolment and follow-up of a large enough number of patients. Even more so is the situation in developing countries, given the usual lack of funds for local research. Electronic health records (EHR) have been proposed as a solution to these two costs problems. Nevertheless, this comes at a price. The quality of data in EHR is usually less than optimal, particularly regarding misclassification errors. Phenotyping algorithms allow, through the combination of different variables extracted from the EHR, to classify patients according to their particular phenotype. Our objective is to compare the performance of different classification strategies (only using standardized problems, rules based algorithms, statistical learning algorithms (six learners) and stacked generalization (five versions)), for the categorization of patients according to their diabetic status (diabetics, not diabetics and inconclusive Diabetes of any type) using information extracted from EHR.
Methods Patient information was extracted from the EHR of the Hospital Italiano in Buenos Aires, Argentina. In order to have a training and a validation dataset, two samples of patients from different years (2005-2015 total n = 2463) were extracted. The only inclusion criterion was age (≥40 <80 years old by 1/1/2005 and by 1/1/2015 for each sample). The sampling was carried out using simple randomization. The training set (2005) featured 1663 patients. The validation set (2015) represented the ~ 33% of the total sample (n = 800). Four researchers manually reviewed all records and classified patients according to their diabetic status (diabetic: diabetes registered as a health problem or fulfilling the ADA criteria non-diabetic: not fulfilling the ADA criteria and having at least one fasting glucose below 126 mg/dL inconclusive: no data regarding their diabetic status or only one abnormal value). The best performing algorithms within each strategy were tested on the validation set.
results The standardized codes algorithm achieved a Kappa coefficient value of 0.59 (95% CI 0.49, 0.59) in the validation set. The Boolean logic algorithm reached 0.83 (95% CI 0.78, 0.89). A slightly higher value was achieved by the Feedforward Neural Network (0.9, 95% CI 0.85, 0.94). The best performing learner was the stacked generalization meta-learner that reached a Kappa coefficient value of 0.95 (95% CI 0.91, 0.98).
conclusion We evaluated the performance of four different strategies for the development of diabetes phenotyping algorithms using data extracted from an EHR from Argentina. The stacked generalization strategy showed the best metrics of classification in the validation set. The implementation of these algorithms enables the exploitation of the data of thousands of patients accurately and reducing costs compared to the traditional way of collecting data for research. Thus, millions of patients from developing countries could benefit from local and specific data that could lead to treatments that take into account all their characteristics (genetic, environmental, habits, etc.) as it is the objective of precision medicine.

Sergio Terrasa and Karin Kopitowski, Servicio de Medicina Familiar y Comunitaria, Hospital Italiano de Buenos Aires, Buenos Aires
Introduction Electronic medical records (EMR) are becoming increasingly common. They show a lot of promise in terms of data collection to facilitate observational epidemiological studies and their use for this purpose has increased significantly over the recent years. Even though the quantity and availability of the data are clearly improved thanks to EMRs, still, the problem of the quality of the data remains. This is especially important when attempting to determine if an event has actually occurred or not. We sought to assess the sensitivity, specificity, and agreement level of a codes-based algorithm for the detection of clinically relevant cardiovascular (CaVD) and cerebrovascular (CeVD) disease cases, using data from EMRs.
Methods Three family physicians from the research group selected clinically relevant CaVD and CeVD terms from the international classification of primary care, Second Edition (ICPC-2), the ICD 10 version 2015 and SNOMED-CT 2015 Edition. Clinically significant signs, symptoms, diagnoses and procedures associated with CaVD and CeVD were included. The algorithm yielded a positive result if the patient had at least one of the selected terms in their medical records, as long as it was not recorded as an error. Else, if no terms were found, the patient was classified as negative. This algorithm was applied to a randomly selected sample of the active patients within the hospital's HMO by 1/1/2005 that were 40 to 79 years old, had at least one year of seniority in the HMO and at least one clinical encounter. Thus, patients were classified into four groups: (1) Negative patients (2) Patients with CaVD but without CeVD (3) Patients with CeVD but without disease CaVD (4) Patients with both diseases. To facilitate the validation process, a stratified sample was taken so that each of the groups represented approximately 25% of the sample.
Manual chart review was used as the gold standard for assessing the algorithm's performance. One-third of the patients were assigned randomly to each reviewer (Cohen's kappa 0.91). Both coded and un-coded (free text) sections of the EMR were reviewed. This was done from the first present clinical note in the patients chart to the last one registered prior to 1/1/2005. results The performance of the algorithm was compared against manual chart review. It yielded high sensitivity (0.99, 95% CI 0.938 -0.9971) and acceptable specificity (0.86, 95% CI 0.818 -0.895) for detecting cases of CaVD and CeVD combined. A qualitative analysis showed that most of the false negatives were due to terms not included in the algorith (20.4% of the total errors). False positives corresponded mostly to diagnoses that were later on dismissed (43.8%) and due to incidental findings that had no clinical significance (13.27%).
conclusions We developed a simple algorithm, using only standardized and non-standardized coded terms within an EMR that can properly detect clinically relevant events and symptoms of CaVD and CeVD. We believe that combining it with an analysis of the free text using an NLP approach would yield even better results.

Methods
The RDA allows the generic creation of forms and corresponding documents belonging to a specific patient based on the Entity-Attribute-Value (EAV) design. 2 We implemented a local development environment with open source software. The data model still holds the central EAV components but was substantially simplified to focus on only those aspects that are most relevant in the teaching context. results Only 6 of the original over 150 tables of the RDA data model were reused. User registration, security features, project administration, etc. were not taken into consideration. To process forms, patients, and documents, SQL (PostgreSQL and MySQL), a simple REST interface and the developed web interface are made available by means of the PHP Framework Laravel. The database was preloaded with sample data no patient data from the RDA are used. We evaluate a SMART on FHIR interface using this local development environment.
Discussion Being able to directly access the data model in a locally deployed environment drastically lowers the barrier for testing RDA-related features developed by students. Due to the high quality standards needed in the RDA, new features resulting from student work are always re-implemented by staff of the RDA, hence the disadvantage of not reusing code directly for the RDA is negligible.
conclusion We reduced a complex data model for clinical research data to the key aspects needed for teaching and provide a locally deployable environment. The entry barrier for students to develop prototypes or test new features based on the data model of the clinical research platform at the Medical University of Vienna is lowered and the transfer of innovative concepts developed by our students to the platform is facilitated.

references
Introduction Mitochondrial research strongly relies on quality control for monitoring and comparing mitochondrial data between individuals and populations. This abstract presents the methodological outline of a novel project aimed at developing a concept for a minimum dataset in order to describe and exchange mitochondrial data between research groups.

Methods
The development is based on the MIABIS (Minimum Information About BIobank data Sharing) 1 standard which was developed for sharing meta data referring to biobanks and biomaterial collections. Now we intend to evaluate, which basic data is needed to describe mitochondrial meta information and if MIABIS can be applied for this purpose.
results The project consists of five steps and modifies and extends the 4-step approach for formalizing free-text information published by Neururer et al. 2 to fit our research goal: In Step 1 an analysis is carried out for relevant mitochondrial free-text information (e.g. Bioblast, 3 MitoPedia). 4 Therefore, a so-called definition analysis approach 2 is used. Relevant information is marked and definitions are extracted. These definitions serve as input for Step 2, which consist of a typological analysis. 2 To identify the relevant data fields of the minimal mitochondrial dataset, the definitions are coded and iteratively abstracted by using qualitative content analysis methods. These data fields are mapped to the concepts offered by MIABIS in Step 3. This step shows, to which extent the MIABIS standard can be extended for this project. The initial concept for a novel mitochondrial data model is developed in Step 4. In order to validate the model, an expert-based evaluation is the main objective of Step 5. This may trigger an iterative process, which will lead to a refinement and re-evaluation of the mitochondrial data model.

Discussion
The minimal mitochondrial dataset will serve as a basis for harmonizing mitochondrial information and will increase interoperability of mitochondrial research data. With the development of such a minimal dataset, we contribute to a structured representation of mitochondrial data which leads to enhanced comparability and facilitates data exchange. The strongly structured analysis approach will allow generating reproducible results. Nevertheless it remains to be shown to which extent the MIABIS standard can be applied for creating a mitochondrial data model.
conclusion The project combines different approaches from interdisciplinary fields of research (e.g. computer science, social sciences) and contributes to the current state of the art of mitochondrial knowledge management to enhance mitochondrial research networking. references Introduction Recent figures from the Office for National Statistics suggest that dementia and Alzheimer's disease were the leading cause of death in 2015. Early diagnosis of neurodegenerative diseases significantly improves long-term health outcomes, thus one of the key challenges is to improve disease detection at the earliest stage possible. A second challenge is to closely monitor disease progression and fluctuations to enable the most effective therapeutic interventions. Wearable devices, apps and software offer a solution to these problems by unobtrusively collecting individualised data pertaining to physical, cognitive and functional wellbeing over time. What is not clearly understood, however, is whether older adults (either with or without a diagnosis of some form of dementia) are ready, willing and/or able to use these technologies. To address this issue, we report on qualitative data collected from three different projects which ultimately aimed to evaluate the feasibility of using technology for disease detection and monitoring.

Methods
The CYGNUS project aimed to use mobile devices and wearable technology to collect outcome measures for people referred to memory assessment services. A technology questionnaire was used to determine readiness for invite into the mobile device monitoring study (n=160). The SAMS project aimed to detect subtle changes in the patterns of daily computer use as a proxy indicator of early cognitive decline. A debrief questionnaire was used to understand their acceptability and preferences of the monitoring software (n=33).
The SKIP project aimed to monitor fluctuations in Parkinson's disease symptoms through laptop based tasks and passively through smartphone sensor data. Two focus groups were conducted to discuss the use of technology for disease monitoring (n=6).
results Initial findings from the CYGNUS technology questionnaire suggest that there was large variability between participant's ownership of/access to smartphones, tablets and/or computers, and also variability in how often they used these devices. In terms of monitoring for disease detection, the questionnaire from the SAMS study indicated that in general participants (all of whom were regular computer users) did not find the software to be obtrusive and many "forgot" that they were being monitored. In terms of monitoring disease progression, the participants from both the SAMS and the SKIP project agreed that they would like to receive feedback on their symptom fluctuations, but there were mixed views on who should control the data and how often feedback should be received.

Introduction
Most interest in the variability in drug prescribing behaviour has been focused on cost saving. It has been estimated that £200 million could be saved if unwarranted variations in prescribing activity were reduced and the drugs were prescribed with the same standard. Such variation indicates the need to focus on efficiency and appropriateness of clinical practice and to examine the possibilities that a large variation might be related to inappropriate prescriptions. Our work examines the change in variability of primary care drug prescribing rates in Northern Ireland's Western Health and Social Care Trust and investigates its relationship with laboratory test ordering rates.

Method
The GP prescribing data (Apr 2013 -Mar 2016) for 55 general practices within the Northern Ireland Western Health and Social Care Trust was obtained from the Business Service's prescribing and dispensing information systems. The total number of test requests was collected from the laboratory databases of the Altnagelvin Area Hospital, Tyrone County Hospital, and the South West Acute Hospital. Both the number of drug prescriptions and laboratory tests requested in each practice was normalized by the number of registered patients. The variability of drug prescribing data was determined by calculating the coefficient of variation (CV). The degree of correlation between the laboratory test ordering rates and drug prescribing rates was assessed by calculating the Spearman's correlation coefficient (R).
results We observed pronounced differences in drug prescribing rates among general practitioners. The high inter-practice variability in drug prescribing behaviour was shown to be caused by several GP practices with abnormally high ordering rates. No correlation between the total standardized number of prescriptions and the most commonly requested laboratory test (electrolyte profile) was reported (R = 0.107, 0.245, and 0.220 in 2013-14, 2014-15, and 2015-16 respectively). In addition, the strength of association between the most common medications used to treat under-and over-active thyroid (i.e. carbimazole, propylthiouracil, levothyroxine, and liothyronine) and the standardized laboratory test requests for thyroid profiles (FT4 and TSH) was found to be very weak (R = 0.028, 0.017, and 0.037 in 2013-14, 2014-15, and 2015-16 respectively).

Discussion
There is clearly variability in prescribing rates between general practices, suggesting that the costs of prescription could potentially be lower if the variation is reduced. However, since higher variability does not necessarily suggest lower quality practice, it requires further inspection to determine if the patient population associated with specific GP practices is different and have different needs. The lack of correlation between the prescription rates and requesting rates for laboratory tests shows that practices that request laboratory tests at relatively higher or lower rates than average are not necessarily the ones with higher/lower prescription rates. This implies that other factors may influence GP's decisions tendency to prescribe or some laboratory tests are simply ordered inappropriately.
conclusions Our investigation of variability in drug prescribing rates between general practices provides valuable information on practice variation and helps prioritise future research studies to improve the quality of prescribing. We suggest that optimisation of prescribing could be enhanced by conducting appropriate clinical interventions.
Abstract no. 657 Investigating the accuracy of parent and self-reported hospital admissions: a validation study using linked hospital episode statistics data

Leigh Johnson, Rosie Cornish, Andy Boyd, and John Macleod, University of Bristol, Bristol
Introduction The Avon Longitudinal Study of Parents and Children (ALSPAC) is a large prospective study of around 15,000 children born in and around the city of Bristol in the early 1990s. Participants have been followed up intensively since birth through questionnaires, clinics and linkage to routine datasets. In 2013 ALSPAC extracted information on a pilot group of consenting participants from the Hospital Episode Statistics (HES) database. The aim of this study was to validate parent-reported and self-reported data on hospital admissions against HES-recorded hospital admissions.
Methods Subjects included in this study were 3,195 individuals enrolled in ALSPAC who had consented to linkage to their health data before the end of 2012. In nine questionnaires completed when the children were aged between 6 months and 13 years old, parents were asked if their child had been admitted to hospital in the time since the issue of the previous questionnaire (the periods covered varied, ranging from 6 months to 4 years). Additionally, when the participants were 18 years old they were asked whether they had been involved in a road traffic accident in the past year and, if so, whether they had been to accident and emergency (A&E) or stayed overnight in hospital. We compared this information to data recorded in the HES database for the corresponding time period and calculated sensitivities, specificities and predictive values of the questionnairereported admissions and A&E attendances, using HES records as the reference standard.
results Up to 10% of individuals had been admitted to hospital during the time periods covered by the questionnaires. Among those whose parent reported a hospital admissions, at least 60% had one or more corresponding admission in the HES data. Where a hospital admission was not indicated on the questionnaire, an admission was found in the HES data for between 1.4% and 3.6% of the participants. Initial analysis suggests that some of the parent-reported admissions may have actually occurred prior to the period referred to in the questionnaire. Further analysis is planned to investigate other possible explanations for the observed discrepancies. Results for accident and emergency attendances and admissions for road traffic accidents reported by the young people will also be presented.

Discussion & conclusions
We found that the specificities and negative predictive values of parent-reported hospital admissions were high at all ages. The sensitivities and positive predictive values were lower. There are several possible explanations for this. A proportion of respondents may have interpreted the questions about admission to hospital as including visits to A&E and/or outpatient appointments. The HES database only includes A&E data from April 2007 (when the ALSPAC children were aged 15-16 years old) and outpatient data from April 2003 (when they were 11-12) so it was not possible to examine whether this explained the low sensitivities. Further, some hospital admissions would be to non-NHS providers which are not recorded in HES. Conclusions will be drawn when the additional analyses outlined above have been carried out.
Abstract no. 663 Genealogical information from co-insurance networks in pseudonymized administrative claims data in Austria

Florian Endel, Vienna University of Technology, Vienna
Introduction Routinely collected administrative claims data from the Austrian health and social insurance system is available for research in the GAP-DRG database. It is operated by Vienna University of Technology on behalf of the Main Association of Austrian Social Security Institutions. GAP-DRG holds pseudonymized information on reimbursement of prescriptions, inpatient and ambulatory outpatients contacts of almost all 8 million inhabitants. Genealogical information and family relationships are not directly available in the database. In this project, it is indirectly deduced, analyzed and integrated into GAP-DRG. This project is part of the K-Project dexhelpp in COMET and funded by BMVIT, BMWGJ and transacted by FFG.
Methods Co-insurance of relatives as spouses, children and close family members is encoded in the reimbursement information of GAP-DRG. These relationships between two persons are used to extract networks representing individuals who are associated with each other by co-insurance. Persons are classified as children, parents, in a relationship or single based on thorough data analysis and applying rules originating from qualitative descriptions of family structures in Austria. Additional data as the direction of the graph, representing the dependence of one partner on another and weights of edges holding information on e.g. difference in age is included. Visualization and common methods from graph theory are utilized to extract more details about data quality, social structure of the insured population and also limitation of the data and applied approach.
results Depending on quality requirements, there are around 2,000,000 persons in the final dataset on co-insurance. In addition to the estimation of genealogical information, new insights into the database and especially data quality are acquired (e.g. persons older than 120 years could be identified as miscoded children due to their dependence on their parents). Networks of related persons allow in-depth analysis and informative visualizations. New quality issues were identified and missing information on e.g. the socio-economic status could be imputed or corrected. Furthermore, the estimated personal information enables novel research questions and. Due to the stepwise procedures, the implemented approach can be directly adapted to new data or particular projects.
Discussion Although solid and promising results have been obtained, additional analysis and concrete limitation have to be discussed. The quality and interpretation of co-insurance networks might vary over time, region and data source (e.g. social insurance institution). Because relationships are derived from co-insurance, couples not depending on each other directly or indirectly by a common child cannot be detected. As a result, the identification of parents is of a higher quality. External validation, verification of the methodology and its application have to be discussed.
conclusion Genealogical information and networks of co-insurance can be estimated using administrative data. The presented method is straightforward and flexible but also pointed out limitations of the data collection and its quality. Previous knowledge about GAP-DRG and its general quality and trustworthiness could be verified. Summarizing, the newly acquired information on relationships and the extracted networks of co-insurance are interesting on their own and are expected to are the basis of novel data analysis and research.
Introduction Health organizations in socialized medicine contexts have a unique constraint: they don't have access to information beyond the boundaries of their own organization. Yet, it is increasingly evident that lowering costs in health care, coordinating care and personalizing care will require the pooling of data from multiple sources. 1,2 Other areas of human endeavour have achieved this. From sharing common lands 3 to sharing data for transport, 4 there are good examples of ways we can share information assets in common.
Methods A literature search of peer-reviewed articles was conducted using Google Scholar and Pubmed on governance of shared health information. A search of the grey literature on governance of shared assets was also conducted in Google. Approximately 100 relevant articles were identified.

Table 1. Principles of Shared Health Information Governance
All stakeholders have a voice in setting goals All stakeholders are equal partners to improve the health care system All stakeholders share in the risk All stakeholders participate in a trusting, collaborative manner All processes are transparent All decisions are informed by consensual agreement/input Participants acknowledge need for trade-offs and 'give and take' The researchers identified over 25 principles relevant to the governance of shared health information. All 25 principles fell into one of four themes: Policy, People, Process or Technology. Through refinement, seven principles were distilled for governing how health information should be shared between multiple health sector entities to achieve system level goals (Table 1).

Discussion
To our knowledge, this is the first articulation of principles for using health information as "a resource for action and decision-making" as opposed to "as a resource for documentation and record-keeping" which characterizes the data and IT governance literature. These principles are specific to sharing health information across multiple organizations as well as directly with patients. The principles need to be validated by stakeholders who stand to gain or lose from implementing them in actual practice.

references
Introduction Visualization is an integral component of data science. Visualizations of health data can be an effective means of deriving insights for patients, clinicians and researchers. Information visualization is an important aspect of the Precision in Symptom Self-Management (PriSSM) Center and our efforts range from infographics representing small data, such as blood pressure, physical activity levels, and self-reported anxiety levels designed for consumers/patients, (Figures A & B) to electronic dashboards for clinicians, and network diagrams of big data for research discovery of patterns and potential intervention targets ( Figures C & D).
Methods Specific methods vary depending upon of the purpose of the visualization and the intended audience. However, a typical design process begins with prototyping by PriSSM Center Visualization Design Studio experts followed by iterative participatory design with members of the end-user population. For visualizations designed for consumers/patients, formal comprehension testing is also undertaken. A variety of software products are used to create the visualizations. These include Tableau, Adobe Illustrator, Node XL, NVivo, ORA, and custom software that our team built in a previous research project: EnTICE3 (Electronic Tailored Infographics for Community Engagement, Education, and Empowerment) system.

Discussion
Visualizations are integral to data science and critical to advanced data analytics. Our team applies a structured approach and a variety of methods to create visualizations which have resulted in designs that allow consumers/patients, clinicians, and researchers to derive insights from data.
conclusions We continue to evolve our visualization design research as an approach for promoting consumer/patient, clinician, and researcher insights.