Solita Health researched: Omaolo online symptom checker helps to predict national healthcare admissions related to COVID-19

In our recently published study [1], I and my Solitan colleagues Kari Antila and Vilma Jägerroos examined the possibility of predicting the burden of healthcare using machine learning methods. We used data on symptoms and past healthcare utilization collected in Finland. Our results show that COVID-19-related healthcare admissions can be predicted one week ahead with an average accuracy of 76% during the first wave of the pandemic. Similar symptom checkers could be used in other societies and for future epidemics, and they could provide an opportunity to collect data on symptom development very rapidly - and at a relatively low cost.

The rapid spread of the SARS-CoV-2 virus in March 2020 presented challenges for nationwide assessment of the progression of the COVID-19 pandemic. In Finland, Solita helped to add a COVID-19 symptom checker to a pre-existing national, CE-marked medical symptom checker service ©Omaolo. The Omaolo COVID-19 symptom checker achieved considerable popularity immediately after its release, and the city of Helsinki, for example, has estimated annual savings of 2.5 million euros from its use. Although there have been studies about how well symptom checkers perform as clinical tools, their data’s potential on predicting epidemic progression, to our knowledge, has not yet been studied.

For this purpose, Solita developed a machine learning pipeline in the Finnish Institute for Health and Welfare’s (THL) computing environment for automated model training and comparison. The models created by Solita were retrained every week using time-series nested cross-validation, allowing them to adapt to the changes in the correlation of the symptom checker answers and the healthcare burden. The pipeline makes it easy to try new models and compare the results to previous experiments. 

We decided to compare linear regression, a simple and traditional method, to XGBoost regression, a modern option with many hyperparameters that can be learned from the data. The best linear regression model and the best XGBoost model (shown in the figure) achieved mean absolute percentage error of 24% and 32%, respectively. Both models get more accurate over time, as they have more data to learn from when the pandemic progresses.

COVID-19–related admissions predicted by linear regression and XGBoost regression models, together with the true admission count during the first wave of the pandemic in 2020.

Our results show that a symptom checker is a useful tool for making short-term predictions on the health care burden due to the COVID-19 pandemic. Symptom checkers provide a cost-effective way to monitor the spread of a future epidemic nationwide and the data can be used for planning the personnel resource allocation in the coming weeks. The data collected with symptom checkers can be used to explore and verify the most significant factors (age groups of the users, severity of the symptoms) predicting the progression of the pandemic as well.

You can find more details in the publication [1]. The research was done in collaboration with University of Helsinki, Finnish Institute for Health and Welfare, Digifinland Oy, and IT Centre for Science, and we thank everyone involved.

If you have similar register data and would like to perform a similar analysis, get in touch with me or Solita Health and we can work on it together!

Joel Röntynen, Data Scientist,


[1] ​​Limingoja L, Antila K, Jormanainen V, Röntynen J, Jägerroos V, Soininen L, Nordlund H, Vepsäläinen K, Kaikkonen R, Lallukka T. Impact of a Conformité Européenne (CE) Certification–Marked Medical Software Sensor on COVID-19 Pandemic Progression Prediction: Register-Based Study Using Machine Learning Methods. JMIR Form Res 2022;6(3):e35181, doi: 10.2196/35181, PMID: 35179497