A standardised graphic method for describing data privacy frameworks in primary care research using a flexible zone model

https://doi.org/10.1016/j.ijmedinf.2014.08.009Get rights and content

Highlights

  • Easy model to display policies and rules for data privacy.

  • Novel concept of privacy zones for research data flow.

  • A risk gradient runs from high risk to low risk for patient identification.

  • The zone model is presented for important research scenarios.

  • Different types of research are considered in each of the three zones.

Abstract

Purpose

To develop a model describing core concepts and principles of data flow, data privacy and confidentiality, in a simple and flexible way, using concise process descriptions and a diagrammatic notation applied to research workflow processes. The model should help to generate robust data privacy frameworks for research done with patient data.

Methods

Based on an exploration of EU legal requirements for data protection and privacy, data access policies, and existing privacy frameworks of research projects, basic concepts and common processes were extracted, described and incorporated into a model with a formal graphical representation and a standardised notation. The Unified Modelling Language (UML) notation was enriched by workflow and own symbols to enable the representation of extended data flow requirements, data privacy and data security requirements, privacy enhancing techniques (PET) and to allow privacy threat analysis for research scenarios.

Results

Our model is built upon the concept of three privacy zones (Care Zone, Non-care Zone and Research Zone) containing databases, data transformation operators, such as data linkers and privacy filters. Using these model components, a risk gradient for moving data from a zone of high risk for patient identification to a zone of low risk can be described. The model was applied to the analysis of data flows in several general clinical research use cases and two research scenarios from the TRANSFoRm project (e.g., finding patients for clinical research and linkage of databases). The model was validated by representing research done with the NIVEL Primary Care Database in the Netherlands.

Conclusions

The model allows analysis of data privacy and confidentiality issues for research with patient data in a structured way and provides a framework to specify a privacy compliant data flow, to communicate privacy requirements and to identify weak points for an adequate implementation of data privacy.

Introduction

Clinical research has led to a growing demand for data from health records and subsequent data sharing. For research into the health status of populations, the aetiology of diseases and the effectiveness of medical treatments, increasingly access to large patient databases and registers is required. Because health research deals with human data, which is subject to special protection, research can take place only within an appropriate regulatory and data privacy framework. For some years primary care databases already exist on a national (e.g., NIVEL Primary Care Database, NIVEL-PCD) or regional scale (e.g., databases in the Maastricht area) in some European countries. These patient databases often act not only as storage site for data, but offer research services [1]. Consequently, the need has arisen to use these services in research projects to answer complex research questions. In addition, different service providers are increasingly responsible for the storage, processing and integration of patient data leading to the problem of sensitive data stored on systems that are not under the control of the entity which submitted the data [2].

Privacy legislation has been slow on reacting to the increasing role of research service provisions and the use of personal health records in research. Indeed, the suitability of conventional privacy requirements has been questioned on this basis [3], [4]. Any simplistic “global” solution such as banning access to all data without explicit consent can hamper research by excluding “sensitive topics” or biasing results by omitting hard to reach groups such as the poor. More complex solutions for privacy protection involving safe havens or third party linkages of data may be required.

In order to understand complex data, privacy requirements and dependencies, a standardised notation to allow the presentation of privacy needs associated with different research questions could improve understanding and clarity. Our aim was to create an easy to use graphical method for describing data privacy frameworks and apply to it relevant research data flow scenarios.

Section snippets

Background

TRANSFoRm (Translational Medicine and Patient Safety in Europe) [5] is a project partially funded by the European Commission developing the digital infrastructure for a “Learning Healthcare System” in Europe consisting of data collection, data mining and decision support that aims to improve both patient safety and the conduct and volume of clinical research [6]. Although IT systems in primary care settings (e.g., general practices) are a large source of electronic clinical data at patient

Objectives

To support the development of privacy compliant data flow for research, a generic graphic model should enable the easy representation of core concepts and principles of data privacy and confidentiality in the use of health data for research. This graphic model should distinguish the various phases of the research data flow from the primary sources until data reach the researcher for analysis. The secondary objective was that applying the model to common research data flow scenarios should

Creation of a graphic model

Software engineering knows different models, like data models, information models, process and component models that are used as basis for software development. A model may represent an artefact describing a system through the help of suitable diagrams [29]. We used the model approach to better understand the privacy and data protection requirements of the different research use cases in the TRANSFoRm project and to find a way to depict the context dependency of privacy in a graphical way.

Formal description of the model

Our model employs the basic definitions of schemes for data types based on the EU Data Protection Directive (Table 1). To account for these definitions, a set of basic elements, like zones for data sources (zone, subzone), operators for transforming data (data linker, privacy filter) and actors/roles (General Physician (GP), researcher) (Fig. 1), was created. The zone plays a central role in our model; it ensures that context sensitivity of privacy protection is always considered. A single rule

Discussion

Under EU law personal data can only be collected under strict legal conditions and for a legitimate purpose. From the researcher's point of view, it seems that the technical development and sophistication of anonymisation and de-anonymisation techniques is outrunning the legal/policy developments of data privacy protection. It helps to take a step back and review how research with primary care data is done, and analyse the corresponding privacy requirements by using a standardised graphic

Conclusion

The model allows analysis of data privacy and confidentiality issues for research in a structured way, using standardised graphic-based notational representations of data sources, data flow and privacy functions within a flexible zone model. It does not suggest a privacy protection framework on the technical level suitable for all research projects, but provides a framework with its components and a privacy compliant data flow. Applying our model, weak points in the definition of a privacy

Authors’ contribution

Wolfgang Kuchinke was the main author; all other authors contributed to the paper, with Christian Ohmann and Evert-Ben van Veen in graphical modelling, Adel Taweel and Brendan C Delaney in the zone creation and validation, Theodoros N. Arvanitis in the validation and Robert Verheij contributing the NIVEL example.

Conflict of interest

No conflicting interests exist.

Summary points

What was already known before this study:

  • Privacy is recognised as an important challenge for health care and health research.

  • Many different privacy protection frameworks exist that are project specific and for a single type of research with patient data.

  • Several Privacy Enhancing Techniques (PET) have been developed and new anonymisation methods are emerging (e.g., k-anonymity, data obfuscation, synthetic microdata, i-site-diversity).

  • Privacy

Acknowledgements

TRANSFoRm is partially funded by the European Commission – DG INFSO (FP7 247787). DG INFSO is now DG Connect.

The research was partly supported by the UK National Institute for Health Research (NIHR) Biomedical Research Centre based at Guy's and St. Thomas’ NHS Foundation Trust and King's College London. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

The initial concept of the privacy zones was developed in a joint

References (69)

  • B.C. Delaney et al.

    Envisioning a learning health care system: the electronic primary care research network: a case study

    Ann. Fam. Med.

    (2012)
  • EU Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals...
  • Health Insurance Portability and Accountability Act of 1996, Public Law 104-191, Report 104-726, 104th Congress...
  • Department of Health and Human Services, Office of the Secretary: 45 CFR Parts 160 and 164. Standards for Privacy of...
  • The OECD Privacy Framework, OECD Paris, France (2013). Online available...
  • APEC Privacy Framework, APEC Secretariat, Singapore (2005), ISBN...
  • International Standards on the Protection of Personal Data and Privacy, The Madrid Resolution, Spanish Data Protection...
  • U.S.-EU Safe Harbor Framework Documents, US Federal Register, July 24, 2000. Online available:...
  • M. Verschuuren et al.

    Working group on Confidentiality and Data Protection of the Network of competent Authorities of the Health Information and Knowledge strand of the EU Public Health Programme 2003-08

    Eur. J. Public Health

    (2008)
  • Academy of Medical Sciences (AMS). A new pathway for the regulation and Non-care of health research. January 2011,...
  • Office for Civil Rights, Department of Health and Human Services. HIPAA Privacy Rule. Title 45 of the Code of Federal...
  • D. McGraw

    Paving the regulatory road to the “learning health care system”

    Sanford Law Rev. Online

    (2012)
  • Office of the National Coordinator for Health information Technology, US Department of Health and Human Services:...
  • C. Runnegar

    International privacy frameworks: an overview

  • D. Kalra et al.

    Security and confidentiality approach for the Clinical E-Science Framework (CLEF)

    Methods Inf. Med.

    (2005)
  • N. Forgó (Ed.), The ACGT ethical and legal requirements. ACGT deliverable 10.2. 13.03.2007, available at:...
  • E.-B. van Veen, Patient data for health research. October 2011. Available at:...
  • R.A. Verheij, C.E. Van Dijk, I. Stirbu-Wagner, S.A. Dorsman, S. Visscher, H. Abrahamse, R. Davids, J. Braspenning, T....
  • General Practice Research Database (GPRD), Has been renamed as: Clinical Practice Research Datalink (CPRD), available...
  • T. Kuehne

    What is a model? Language engineering for model-driven software development 2005; 04101

  • Unified Modelling Language (UML), Object Management Group, available at: www.uml.org (accessed...
  • R.M. Friedenberg

    Patient–doctor relationships

    Radiology

    (2003, February)
  • Institute of Electrical and Electronics Engineers: IEEE standard 1471. IEEE Recommended Practice for Architectural...
  • Cited by (30)

    • Observational health research in Europe: understanding the General Data Protection Regulation and underlying debate

      2018, European Journal of Cancer
      Citation Excerpt :

      In essence, this article repeats the principles of data minimisation and privacy by design and default. Under the name ‘privacy-enhancing principles’ (PET), these principles were already discussed in 2008 [31], further elaborated by discussing ‘privacy zones’ [32] and in the context of ‘safe research data havens’ [20]. Hence, while these principles are not actually new, not all researchers may have been aware that part of their licence is using solid methodological justification why certain data of a certain type are needed in a specific phase of the research, balancing that justification with the privacy interests of those involved.

    • eSource for clinical trials: Implementation and evaluation of a standards-based approach in a real world trial

      2017, International Journal of Medical Informatics
      Citation Excerpt :

      All research data capture operations (e.g., eligibility checking), eCRF pre-population and eCRF completion, are orchestrated and performed by the DNC. Thus the data flow between the EHR and the DNC remains local to the EHR and only the data identified for research purposes is sent to the research repository in line with the project’s security and data protection framework [32]. The DNC can pull data from the TSS but the TSS cannot push data to nor pull data from the DNC as initiator of the communication.

    View all citing articles on Scopus
    View full text