Date of Degree

6-2021

Document Type

Dissertation

Degree Name

Ph.D.

Program

Computer Science

Advisor

Lei Xie

Committee Members

Liang Zhao

Matluba Khodjaeva

Jia Xu

Subject Categories

Artificial Intelligence and Robotics | Data Science

Keywords

Deep Learning, Machine Learning, Transfer Learning, Representation Learning, Multi-Omics

Abstract

Machine learning has made significant contributions to bioinformatics and computational biology. In particular, supervised learning approaches have been widely used in solving problems such as biomarker identification, drug response prediction, and so on. However, because of the limited availability of comprehensively labeled and clean data, constructing predictive models in super vised settings is not always desirable or possible, especially when using datahunger, redhot learning paradigms such as deep learning methods. Hence, there are urgent needs to develop new approaches that could leverage more readily available unlabeled data in driving successful machine learning ap plications in this area.

In my dissertation, I focused on exploring and designing deep learningbased unsupervised representation learning methods. A consistent scheme of these methods is that they construct a low dimensional space from the unlabeled raw datasets, and then leverage the learned lowdimensional embedding explicitly or implicitly for diverse downstream supervised tasks. Although progress has been made in recent years, most deep learning applications in biomedical studies are still in their infancy. It remains a challenging task to fully extract the biological meaningful information from a biomedical dataset such as multiomics data to support predictive modeling for practical tasks of interest. To improve the biological relevance of learned representations, innovative approaches that could better integrate mulitomics data and utilize their specific characteristics and natural ”annotations” are needed.

Hence, we proposed two approaches, namely, Cross LEvel Information Transmission (CLEIT) network and Coherent Cellline Tissue Deconfounding Autoencoder (CODEAE). Specifically, CLEIT aims to leverage the hierarchical relationships among omics data at different levels to drive the biologically meaningful representation learning, and CODEAE learns biologically meaningful representations by explicitly deconfounding the confounding factors such as data source origins. As the benchmark results showed, these two methods are able to improve knowledge transfer be tween multiomics data, and invitro and invivo samples respectively, and significantly boost respective performance in drug response prediction task. Thus, they are potentially powerful tools for precision medicine and drug discovery.

Recommended Citation

He, Di, "Learn Biologically Meaningful Representation with Transfer Learning" (2021). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/4276

Download

Included in

Artificial Intelligence and Robotics Commons, Data Science Commons

COinS

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

Learn Biologically Meaningful Representation with Transfer Learning

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Links

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

Learn Biologically Meaningful Representation with Transfer Learning

Author

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links