Dissertations, Theses, and Capstone Projects
Date of Degree
6-2021
Document Type
Dissertation
Degree Name
Ph.D.
Program
Computer Science
Advisor
Lei Xie
Committee Members
Liang Zhao
Matluba Khodjaeva
Jia Xu
Subject Categories
Artificial Intelligence and Robotics | Data Science
Keywords
Deep Learning, Machine Learning, Transfer Learning, Representation Learning, Multi-Omics
Abstract
Machine learning has made significant contributions to bioinformatics and computational biology. In particular, supervised learning approaches have been widely used in solving problems such as biomarker identification, drug response prediction, and so on. However, because of the limited availability of comprehensively labeled and clean data, constructing predictive models in super vised settings is not always desirable or possible, especially when using datahunger, redhot learning paradigms such as deep learning methods. Hence, there are urgent needs to develop new approaches that could leverage more readily available unlabeled data in driving successful machine learning ap plications in this area.
In my dissertation, I focused on exploring and designing deep learningbased unsupervised representation learning methods. A consistent scheme of these methods is that they construct a low dimensional space from the unlabeled raw datasets, and then leverage the learned lowdimensional embedding explicitly or implicitly for diverse downstream supervised tasks. Although progress has been made in recent years, most deep learning applications in biomedical studies are still in their infancy. It remains a challenging task to fully extract the biological meaningful information from a biomedical dataset such as multiomics data to support predictive modeling for practical tasks of interest. To improve the biological relevance of learned representations, innovative approaches that could better integrate mulitomics data and utilize their specific characteristics and natural ”annotations” are needed.
Hence, we proposed two approaches, namely, Cross LEvel Information Transmission (CLEIT) network and Coherent Cellline Tissue Deconfounding Autoencoder (CODEAE). Specifically, CLEIT aims to leverage the hierarchical relationships among omics data at different levels to drive the biologically meaningful representation learning, and CODEAE learns biologically meaningful representations by explicitly deconfounding the confounding factors such as data source origins. As the benchmark results showed, these two methods are able to improve knowledge transfer be tween multiomics data, and invitro and invivo samples respectively, and significantly boost respective performance in drug response prediction task. Thus, they are potentially powerful tools for precision medicine and drug discovery.
Recommended Citation
He, Di, "Learn Biologically Meaningful Representation with Transfer Learning" (2021). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/4276