This SRM-AP team used Machine Learning to predict that Andhra's COVID caseload will dwindle to 100 by July. Here's how

A team from SRM University - AP has toiled over three months to draw up predictions and has sent a report to the Special Chief Secretary of Chief Minister of AP. This is how they went about it

From the top | (Pic: SRM - AP)

Updated on:

19 May 2021, 7:00 am

What if you were to know that the number of infected people in Andhra Pradesh could gradually come down to 15,000 by May 30, 1,000 by June 14, 500 by June 23 and 100 by July 15! Ray of hope in what has been a very dark time, isn't it? These are the result of the study carried out by the good folks of SRM University – AP including Assistant Professor Dr Soumyajyoti Biswas from the Department of Physics and four students who are pursuing their BTech in Computer Science and Engineering, namely, Anvesh Reddy, Hanesh Koganti, Sai Krishna and Suhas Reddy.

Over three months of hard work, which started from the first week of February, has been poured into this project. "I had given this project keeping in mind its relevance and the machine learning approach, which could be interesting. Then these students contacted me and we started working on it," says the professor who has been a postdoctoral fellow at Friedrich-Alexander-University, Max Planck Institute for Dynamics (both in Germany) and Self-Organization and Institute of Mathematical Sciences, Chennai. The professor takes us through their process, why they used a specific mathematical model and how are these predictions going to help us. Excerpts from the interview:

Why was it that only the SIR model was adopted to predict this particular scenario? Were any other models considered? Also, what is the role of Machine Learning?
(S)usceptible-(I)nfected-(R)emoved model is the simplest one that can represent this epidemic spreading. There are other variants of the model, for example, SEIR, that considers an additional state (exposed), representing the number of people who were exposed to the virus but yet to show symptoms. But we had to keep in mind that the actual data available for the pandemic is limited. For example, even if we include the exposed state in our model, there is no corresponding data for it.

Machine learning, generally speaking, is a tool that systematically explores various statistical features of data (for example, the sizes of fluctuations and so on) in order to make predictions on what is called a target variable. In our case, the target variable was the end-time of the second wave of this pandemic. We used the SIR model to train the machine learning algorithm, which basically means that the algorithm gets the input on how the various features of the data are related to the target variable (end-time). After that, when the actual data is fed to the algorithm, it looks at those features in the actual data and gives the prediction for the target variable. The prediction becomes better with larger sets of data.

READ ALSO: SRM University's AP campus to start MBA in Data Science from March 15

How much time did you and the students take to actually work out the predictions? Any challenges that you faced while you were working towards the result?
We were working on this particular project for over three months. Given its relevance, this has been my primary focus of research during this period. I have worked on other aspects of epidemic spreading before (like Parallel Minority Game and its application in movement optimisation during an epidemic - ScienceDirect). Of course, given the situation, we all had to work from home. But given that the work is computational, it was not a major issue.

The team | (Pic: SRM - AP)

Since the numbers the states are sharing might not be accurate, was there a discussion on where you will actually source the date from?
It is quite possible that the numbers shared by the states might not be accurate. However, we could not have any other source which is this detailed. We are currently working on extracting the above-mentioned statistical features from the data of other countries as well.

Can you tell us about arriving at the result that the caseload will reduce to 100 by July 15? Which variables can lead to the predictions being slightly off the mark?
We applied a machine learning algorithm to predict the end-time. Of course, the prediction accuracy will depend on several variables. The major points here are: we did not take into account possible partial or complete lockdown, virus mutations that could change infection rate and a possible third wave, vaccinations and its effects.

Apart from assuring us that things will be all right, what else do predictions offer to society, scientists and the state? We are aware you have submitted the data to the Special Chief Secretary of CM of AP. How would it help them specifically?
We think that an estimate of the end time for the pandemic is crucial for multiple reasons. From an economic standpoint, it could help estimate the impact on medium and small businesses. In the education sector, it could help planning for the upcoming academic sessions and it could also help plan for health infrastructure support.

For scientists, machine learning prediction methods are comparatively a new tool, which has been successful in many cases. It is crucial to know how it performs in this case and how it can be improved later on.

This SRM-AP team used Machine Learning to predict that Andhra's COVID caseload will dwindle to 100 by July. Here's how

The team | (Pic: SRM - AP)

Related Stories