Home » Technical Topics » Data Science

Optimizing and Accelerating COVID-19 Predictive Modeling

9615343888

The COVID-19 pandemic has wreaked havoc across the globe. As of July 30, 2021, nearly 197.5 million people have been infected with coronavirus, and 4.2 million people have died. The COVID-19 pandemic has resulted in significant pressure on healthcare systems around the globe. The need for effective diagnostic, prognostic and therapeutic procedures has never been as urgent as it is today. Despite significant investment and research on understanding and managing this disease, there is still a lack of efficient predictive models for patient stratification and management of this disease. 

Since the transmission rate of COVID-19 is extremely high, healthcare facilities are continuously facing the challenges of managing patient surges while ensuring the safety of staff, family members, and patients suffering from other illnesses. 

Machine learning techniques and artificial intelligence have been deployed to compute the risk of infection and to perform effective survival analysis and classification. However, so far, the results from these models have neither been that accurate nor consistent. This does not mean that AI and ML cannot be used more accurately. AI is and will remain a useful tool for the computing risk factor, classification, drug analysis, and response to disease, but its correct use and application is critical for optimum benefits. This is especially true in the case of a new disease like COVID-19 because understanding the disease and its impact on infected patients is essential for clinicians for improved patient outcomes. Hence, analytical models that can help predict the probability of survival and also highlight the impact of symptoms on survival probability and other related features and characteristics of the disease can be extremely useful and can provide valuable information to scientists, researchers, and healthcare providers.  

Already, this need for more data and information has been addressed by scientists and clinicians, and thousands of articles have been published on this topic since the beginning of the pandemic. The possible role of machine learning and artificial intelligence in providing useful insights into the pandemic through multivariable prediction models cannot be denied. However, this is not an easy task because the incorporation of machine learning in biomedical classification analysis requires statistical and coding knowledge, expertise in the use of algorithms, selecting the right features, designing accurate performance protocols, and minimizing the risk of methodological effort. Moreover, these techniques and models require time and effort, which is one thing that clinicians lack amid a pandemic. 

One solution to address these challenges is the use of Automated Machine Learning (AutoML). AutoML automates algorithm selection, hyper-parameter tuning, performance estimation, and result visualization and interpretation. It can deliver reliable predictive and diagnostic models that can be easily interpreted by a non-experiment and can help increase the productivity of expert analysis. 

In a recent study, researchers employed Automated Machine Learning (AutoML) to analyze three publicly available COVID-19 datasets, including serum proteomic, metabolic, and transcriptomic measurements. Researchers also performed pathway analysis of selected features. 

The proteomic and metabolomic analysis produced ten equivalent signatures of two features each in discriminating severe and non-severe COVID-19 patients. The transcriptomic analysis resulted in two equivalent signatures of eight features to identify patients from those with a different acute respiratory illness. A second transcriptomic dataset produced two equivalent signatures of nine features in identifying COVID-19 patients from virus-free individuals. Several new features that may be implicated in clinical pathways, including viral mRNA translation pathway, interferon-gamma signaling, and Innate Immune System, also emerged. 

Findings from this study highlight several advantages of AutoML. These include:

  • Significant increase in productivity. 
  • Faster execution time and analysis. 
  • Democratization to life scientists and ability to use non-expert analysts.
  • Immediate access to results without relying on analysts for modelling and interpretation. 
  • Guaranteed correctness and performance estimates based on best practices 
  • Guaranteed optimization in terms of predictive performance against human expert models. 
  • Ability to replicate and reproduce results. 

Overall, findings from this analysis show that with the application of AutoML, multiple biosignatures could be built in a fast and automated manner. The use of AutoML resulted in high predictive performance that was successfully validated. 

It is important to understand that the information derived from AutoML models can help improve clinical decision-making about each individual patient. Also, accurate prediction of survival probability, the features that have an impact on survival, and the availability of accurate and reliable data can go a long way in improving the clinician€™s response to the pandemic. 

The Just Add Data Bio (JADBio) platform is an AutoML technology that can be readily applied to low-sample, high-dimensional biomedical data and can produce accurate predictive models with their corresponding biosignatures. JADBio has been validated on 360 multi-omics datasets and has demonstrated correct estimates of performance for the predicted models on the training data population. JADBio removes the need for holding out a separate validation set to validate the final model; hence there is no need for any samples that would be lost to estimation. Already, the tool has produced novel scientific results in several important areas, including suicide prevention, Alzheimer€™s disease, cancer, nonmaterial chemical properties, protein localization, and others. 

It would be in the interest of clinicians and researchers to further develop cost-effective clinical assays to better understand and treat this disease. There is also a need for well-built datasets for optimized extraction and effective observation and conclusion. 

AutoML is a new reality in biomedicine. It can change the way healthcare data is analyzed and can democratize data analysis to non-experts, thus increasing productivity to a large extent. It can also improve replicability, improve data interpretation and shield against common methodological pitfalls. Most of all, AutoML can be effectively utilized with the minimal human effort by allowing the extraction of information from complicated datasets without making the process laborious and time-consuming. The improved analytical models offered by AutoML can enable personalized clinical decisions and improved disease management.