Introduction
A datathon is a time-constrained information-based competition involving data science applied to one or more challenges.1–7 Datathons and hackathons differ in their focus, with datathons prioritising data analysis and modelling, while hackathons concentrate on building prototypes. Furthermore, hackathons can encompass a broad range of topics, spanning from software development to hardware design, whereas datathons are more narrowly focused on data analysis. In-person datathons offer the unique opportunity to learn alongside a community of fellow students and researchers, as well as to directly interact with clinicians and medical professionals. This is in contrast to Kaggle like competitions, which are often self-learning experiences.
Context of the event
A joint event organised by the Technion, Rambam Healthcare Campus and the MIT Critical Data group in March 2022 provided a unique opportunity to understand the challenges faced by leading researchers and clinicians working in the field of medical data science. The Technion is a leading science and technology research institutes and Rambam is the largest hospital in the north of Israel. It was organised as the inaugural event of a new joint Technion-Rambam initiative in medical AI (TERA), which aims to serve as an academic centre for medical AI committed to advanced medical and clinical research, with significant and actionable benefit to patient care.3 The initiative opening event entitled ‘Technion-Rambam Hack: Machine Learning in Healthcare,’ was attended by about 250 people. The first two days consisted of a collaborative information-based competition that focused on solving real-world clinical problems through interdisciplinary teams and access to real data.1–7 The datathon was followed by a one day conference with lectures delivered by researchers from the Technion, Rambam, MIT, the Israeli Ministry of Health (MOH), Clalit Health Services, GE Healthcare, and Roche.
The datathon days
The planning of the datathon and the conference began approximately six months before the event. After an initial brainstorming between the scientific committee, which included Technion principal scientists, Rambam clinicians and MIT scientists, a fundraising campaign was launched as list of potential speakers for the conference day was drawn up and invitations were extended. Communication around the event was initiated in November 2021 via social media platforms (Twitter, LinkedIn and Facebook). Students interested in the datathon were asked to apply to the event and were asked to complete a survey about their skills, their interests and their level of education (Bsc, Msc, Ph.D, alumni) and specialty (engineering or bio/med). We accepted approximately 70% of the applicants and the participation rate exceeded 95%. To ensure commitment from registrants to participate in the datathon, we required a registration fee of $25. In parallel, we contacted clinicians from Rambam and asked them to propose projects consisting of a medical question and to provide a relevant dataset to research the question. Four challenges proposed by clinicians who had collected large datasets in recent years and who presented challenging scientific questions which could be tackled by ML were selected. The projects were (1) Prediction of newborn birth weight by maternal parameters and previous newborn siblings birthweights,8 (2) ML-based predictive model for bloodstream infections during hematopoietic stem cell transplantation,9 (3) Prediction of recurrent hospitalisation in heart failure patients10 and (4) Risk factor and severity prediction in hospitalised COVID-19 patients.11 12 Project leaders were required to provide an agreement for their dataset, following the standard Hospital Institutional Review Board (IRB) process.
Two competing teams composed of 5–7 participants were assigned to each project. This approach was adopted for two reasons: first, to increase the likelihood of obtaining interesting results from at least one of the teams, and second, due to the resource-intensive nature of dataset creation, which involves extraction, curation, and anonymization processes. The projects were designed to have comparable difficulty in terms of the structured (tabular) medical data provided, and we intentionally limited the number of variables to prevent overwhelming teams with an excessive amount of data. We had participants from diverse fields, comprising 1/3 biologists/medical professionals and 2/3 engineers, computer scientists, statisticians, or mathematicians. Ethical agreement was requested from all participants during the subscription process. Each participant signed a consent and a non-disclosure agreement. Each team was assigned a clinical mentor from the Rambam and a data science mentor from either the Technion or the industry. Participants were selected based on their interests and competency (studies and skills). Our goal was to have mixed teams in terms of data analysis capacity and field knowledge to work on each challenge. Each team had a separate virtual machine with personal, secured access for each team member. During the 2 days of the event, the teams were split in several rooms at the Technion Faculty of Biomedical Engineering. Each team was asked to present its work at the end of the second day. Thereafter, using an external jury comprised of a principal investigator from the Technion, clinicians, Rambam epidemiology and IT department, and industrial partners, the three best teams were selected for the competition final, which took place on the conference day.
The conference day
The guest talks at the conference aimed to introduce clinical data science to a wide audience and provide a perspective on its future impact on medicine. There was a total of 12 lectures delivered. The lectures were divided into three thematic sessions which are: (1) current trends in machine learning in healthcare, (2) data stakeholders, (3) deployment of machine learning in medical practice. The full list of lectures and speakers is available on the event website for reference (https://technion-hack.github.io/).