Rapidly scalable and low-cost public health surveillance reporting system for COVID-19
•,,,,.
...
Abstract
Objective Data-driven innovations are essential in strengthening disease control. We developed a low-cost, open-source system for robust epidemiological intelligence in response to the COVID-19 crisis, prioritising scalability, reproducibility and dynamic reporting.
Methods A five-tiered workflow of data acquisition; processing; databasing, sharing, version control; visualisation; and monitoring was used. COVID-19 data were initially collated from press releases and then transitioned to official sources.
Results Key COVID-19 indicators were tabulated and visualised, deployed using open-source hosting in October 2022. The system demonstrated high performance, handling extensive data volumes, with a 92.5% user conversion rate, evidencing its value and adaptability.
Conclusion This cost-effective, scalable solution aids health specialists and authorities in tracking disease burden, particularly in low-resource settings. Such innovations are critical in health crises like COVID-19 and adaptable to diverse health scenarios.
Introduction
The transmission of COVID-19 has led to more than 535 million cases and 6 million deaths in almost every country in the world between January 2020 and June 2022.1 Data and digital solutions are essential in strengthening pandemic management and preparedness.2 3 Dynamic reporting systems have been used as a data solution for decision-making within various fields.4 5 Dynamic reporting systems as part of surveillance systems can be critical for disease control, resource allocation and health planning.6 7 However, developing such a system within lower-income and middle-income countries can be challenging due to the scarcity of human and economic resources.8 Malaysia has experienced high levels of COVID-19 transmission.9 Despite this, there has been a paucity of data collation and translation into epidemiological intelligence for public health interventions within the public space. We aimed to develop an analytical platform that had: (1) a data collation system, (2) an automated analytical and reporting platform, and (3) a system to monitor and evaluate the solution. The solution had to be low-cost and rapidly deployable without programming expertise.
Methods
Solution architecture
A five-tiered workflow was developed using open-source tools that allowed work process automation. These tiers include: (1) data acquisition; (2) data processing; (3) databasing, sharing and version control; (4) data visualisation; and (5) dashboard deployment and monitoring (figure 1). Data acquisition was performed with automated scripts using the ‘rvest’ and ‘magick’ packages for web scraping and image handling, respectively. The collected data were then processed using the ‘tidyverse’ and ‘rmarkdown’ packages to facilitate data manipulation and dynamic report generation. These clean, structured data were managed using GitHub for databasing, sharing and version control, ensuring data integrity and accessibility for collaborative efforts. Data visualisation transitioned from the flexdashboard framework to the Shiny package for more interactive web applications. The dashboard was initially hosted on GitHub pages, subsequently migrating to shinyapps.io for improved performance. An in-line frame plugin was used to embed the dynamic reports within the Universiti Malaya’s Department of Social and Preventive Medicine’s WordPress website (https://spm.um.edu.my/knowledge-centre/covid19-epid-live/). Google Analytics plugins monitored the dashboard, tracking usage and user interactions This streamlined, automated workflow was adapted from various published methods, ultimately creating a comprehensive tool for epidemiological intelligence.10 11 All codes for the solution are available from https://github.com/spm-um/c19-epi4msia.
Open-source architecture used for data collation, analytics and dashboarding in the COVID-19 Epidemiology for Malaysia Project. iframe, in-line frame.
Data source and epidemiological indicators
Reporting of COVID-19 data in Malaysia has been carried out by the Ministry of Health (MOH), Malaysia, since January 2020 via daily press releases (https://kpkesihatan.com/), infographics (https://COVID-19.moh.gov.my/) and an official instant messaging channel (https://t.me/cprckkm). Data extraction workflow was developed to scrape case, death, testing, variant and vaccination using open-source tools beginning in September 2020. Cluster data were extracted manually daily as the MOH published these in a non-machine readable format. Data on healthcare capacity for certain states were sporadically provided via press releases and extracted when released. Data on state healthcare capacity, with no reported press release data, were extracted from previous national surveys and reports.12–14 Data on midyear populations were extracted from the population projections provided by the Department of Statistics, Malaysia.15 Additionally, data on mobility (https://www.google.com/COVID-19/mobility/) provided by Google are extracted directly from the Google data repository. Reporting on variants by the MOH was stopped in February 2022. Data on variants of concern were subsequently extracted from the GISAID Initiative (https://www.gisaid.org). Scripts were executed automatically on a daily basis to extract COVID-19 data from the above sources.
Since 21 July 2021, the MOH, Malaysia has consolidated these data sources into a single GitHub data repository (https://github.com/MoH-Malaysia/covid19-public), and as such, the crawler was retired on 31 August 2021. Data input for the dashboard was subsequently shifted to the new MOH data repository. Datasets, including retired datasets and code used, are available on GitHub (https://github.com/spm-um/c19-epi4msia-data). A random 10% of data points were validated each week to ensure accuracy. Data sources and indicators considered important for policy response were determined from a review of reports, dashboards, advisories and expert epidemiologist opinions with frequent updates based on the changing science1 16–18 (online supplemental appendix 1).
Monitoring and evaluation
Google Analytics plugins were used to monitor the dashboard’s utilisation in real time. Indicators monitored include cumulative views, new views, active views, sessions per user, user location, device and platform used in engagement, engagement event type and traffic comparison with other website elements. Additionally, user experience forms were made available to all users to identify potential issues and gaps for improvement as rapidly as possible.
Results
The architecture, developed in October 2020, initially provided semiautomated updates on COVID-19 data and analytics. With Malaysia’s transition into endemicity in April 2022, the dashboard underwent enhancements, deploying extended functionalities in June 2022. This fully automated version collated and published analytics on transmission, disease burden, testing, healthcare capacity, variants and human mobility, using a variety of visual aids (online supplemental appendix 2). Using a web-based plugin, usage statistics were monitored, yielding weekly and biannual performance reports. Remarkably, 92.5% of new users converted into active users, with changes in user engagement paralleling shifts in measures for uptake and transmission dynamics (online supplemental appendix 3). The dashboard drew a median of 0.37% of all traffic to the department website, peaking at 1.9% in November 2020 (online supplemental appendix 4).
Discussion
The escalating COVID-19 transmission within Malaysia necessitated the development of a local, low-cost, rapidly scalable and reproducible solution for data collation, analysis and dashboarding. This initiative was a collaboration of public health experts which yielded a straightforward, agile architecture that can be deployed with limited programming expertise. This open-source model demonstrated robust performance, handling large volumes of data, converting 92.5% of new users into active ones and reflecting stakeholder feedback in its ongoing enhancements. Compared with other open-source solutions in different countries, development by non-expert programmers was the key differentiator, offering potential for use in other low-resource settings. Smaller disease control programmes can use this workflow in developing dynamic reporting systems in said programmes.19 20
Several limitations were observed, including data acquisition challenges, relatively low engagement, limited monitoring of data volumes and collection efficiency, constraints in user experience. Lessons learnt from this project emphasised the importance of stakeholder involvement in the development process, an effective engagement strategy prior to deployment and the necessity of flexibility in the face of changing data formats.
Future iterations should incorporate these insights, and we propose four potential solutions to mitigate current limitations. First, in response to the volatile data sources, improving the system’s adaptability to various data formats is essential. Second, to boost engagement, an inclusive and robust communication strategy must be crafted. Third, detailed data volumes and exact efficiency metrics should also be monitored for a more comprehensive understanding of the system’s impact. Lastly, the user interface can be refined to enhance user experience and stability, without compromising speed and simplicity. Several of these issues were partially addressed in the recent dashboard update of the MOH’s recent use of the GitHub platform for data dissemination and extend functionalities to include more depth in the exploration of indicators.
The value of agile, open-source solutions like ours becomes apparent in combating diseases like COVID-19. The tool’s adaptability makes it beneficial for tracking various health concerns, underscoring the importance of continued development and enhancement.
Conclusion
The development of a low-cost, open-source, rapidly scalable and reproducible workflow as described here can be useful in many situations, especially in low-resource settings. Innovative and low-cost solutions can be critical in crises such as COVID-19 and other health- and healthcare-related scenarios, especially in automating collation, processing and reporting of data. There remains a need for more collaboration in strengthening workflows such as this to allow more rapid adoption of dynamic reporting systems in the health sector.