Wan He
11th floor
58 St Katharine's Way
London E1W 1LP, UK
Advances in next generation sequencing technology are producing large amounts of biological data. In this dissertation, I focus on the integration of AI and network science to analyze genomics and transcriptomics data. Specifically, I answer the following questions: (1) How can we compare a large collection of long genome sequences? (2) How can we use hypergraphs to improve clustering of single-cell RNA sequencing data? (3) Can representation learning on single-cell RNA sequencing data create co-expression networks with a higher signal-to-noise ratio? For the first question, I find that misclassifications from AI models provide insights for comparative genome analysis. In particular, misclassification likelihoods reveal (spatial) associations between genome ensembles. For the second question, I identify inflated signals in co-expression networks due to data sparsity; and show that using hypergraphs and co-expression networks together with a memory mechanism outperforms established methods, especially for weakly modular data. The third question is ongoing work. However, preliminary results show that embeddings from representation learning methods produce networks with less noise, which in turn leads to more distinct communities. I also investigate a related research question: How can we use hypergraphs to solve constrained optimization problems? I find that for resource allocation problems, optimizing for the algebraic connectivity of the hypergraph leads to robust and resilient solutions.
Want to be notified about upcoming NetSI events? Sign up for our email list below!
Thank you! You have been added to our email list.
Oops! Something went wrong while submitting the form