Dear everyone,

I apologize in advance if this was discussed somewhere and I simply cannot find it, or if this is not the appropriate forum. 

Our lab is generating tons of single cell RNAseq data and we would like to develop a strategy that can take only control data and model what would happen if we computationally generate mutants. The idea is that, since all normal organ development is encoded and regulated by a multitude of mechanisms, malfunctioning mechanisms should be similarly encoded in some manner, in the sense that an organism and its cells still sort of know what to do when regulatory mechanisms malfunction (even if that results in disease states). 

Given the non-complete data one gets from scRNAseq (only highly expressed genes are detected), the positive and negative signaling pathway loops, and non-cell autonomous interactions, we do not expect to design a perfect model. That being said, we feel there should be some baby steps in the right direction. Unfortunately, being biologists, we have little data science background, so don't really know where to start. 

We've read up a bit from linear regression to neural networks. My impression is that one cannot 'predict' a disease outcome with classification approaches, when we don't have the data to assign those classifications. Since disease states tend to be on a continuous scale, linear regression may be more appropriate. Optimal transport seems interesting, when we think of cells simply taking alternative routes to a destination that is not the primary choice, but at least the next best one. Random forests may help to classify one cell becoming another cell, but this would be limited to the cell types captured in the control data set. Then again, we could also be completely wrong about any of this. 

We would therefore kindly like to ask if someone could provide some suggestions on what approaches would be appropriate to our problem and where to find more information on the development. 

Thank you very much!

