Representation Learning of Human Disease Mechanisms for a Foundation Model in Rare and Common Diseases
Publication
NetSI authors
Research area
Resources
Abstract
The limited amount of data available renders it challenging to characterize which biological processes are relevant to a rare disease. Hence, there is a need to leverage the knowledge of disease pathogenesis and treatment from the wider disease landscape to understand rare disease mechanisms. Furthermore, it is well understood that rare disease discoveries can inform the our knowledge of common diseases. In this paper, we introduce Dis2Vec (Disease to Vector), a new representation learning method for characterizing diseases with a focus on learning the underlying biological mechanisms, which is a step toward developing a foundation model for disease-association learning. Dis2Vec is trained on human genetic evidence and observed symptoms, and then evaluated through cross-modal transfer-learning scenarios based on a proposed drug association learning benchmark with drug targets (positive controls) and Orphanet Rare Disease Ontology (negative controls). Finally, we argue that clustering diseases in the Dis2Vec space, which captures biological mechanisms instead of drug-repurposing information, could increase the efficiency of translational research in rare and common diseases, and ultimately improve treatment strategies for patients.