%0 Journal Article %A Christopher Pearce %A Adam McLeod %A Jon Patrick %A Jason Ferrigi %A Michael Michael Bainbridge %A Natalie Rinehart %A Anna Fragkoudi %T Coding and classifying GP data: the POLAR project %D 2019 %R 10.1136/bmjhci-2019-100009 %J BMJ Health & Care Informatics %P e100009 %V 26 %N 1 %X Background Data, particularly ‘big’ data are increasingly being used for research in health. Using data from electronic medical records optimally requires coded data, but not all systems produce coded data.Objective To design a suitable, accurate method for converting large volumes of narrative diagnoses from Australian general practice records to codify them into SNOMED-CT-AU. Such codification will make them clinically useful for aggregation for population health and research purposes.Method The developed method consisted of using natural language processing to automatically code the texts, followed by a manual process to correct codes and subsequent natural language processing re-computation. These steps were repeated for four iterations until 95% of the records were coded. The coded data were then aggregated into classes considered to be useful for population health analytics.Results Coding the data effectively covered 95% of the corpus. Problems with the use of SNOMED CT-AU were identified and protocols for creating consistent coding were created. These protocols can be used to guide further development of SNOMED CT-AU (SCT). The coded values will be immensely useful for the development of population health analytics for Australia, and the lessons learnt applicable elsewhere. %U https://informatics.bmj.com/content/bmjhci/26/1/e100009.full.pdf