Dissertations, Theses, and Capstone Projects

Date of Degree

9-2025

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy

Program

Computer Science

Advisor

Lei Xie

Committee Members

Jonathan Gryak

Liang Zhao

Zhongming Zhao

Subject Categories

Biomedical Informatics | Biotechnology | Computational Biology | Molecular Genetics | Systems Biology

Keywords

Computational biology, Deep learning, Molecular biology, Precision medicine, Single cell, Transfer learning

Abstract

This dissertation presents a series of machine learning frameworks for modeling genotype–environment–phenotype relationships through integrative predictive modeling of multi-omics data. The work addresses three major axes of biological complexity: modeling biological information transmission cross-levels from genes to proteins to phenotypes, predicting molecular features cross-scale from cells to tissues to organisms, and translating phenotypes cross-species from model systems to humans. Each proposed method also tackles key machine learning (ML) challenges in the biomedical domain, including data scarcity, domain shift, out-of-distribution (OOD) generalization, and hierarchical modeling. Specifically, this dissertation introduces five novel deep learning algorithms: MultiDCP predicts drug-induced transcriptomic and viability responses by modeling perturbations in gene expression from untreated transcriptomics, while TransPro extends this modeling from transcriptomic inputs to drug-induced proteomic responses. Together, they enable omics-driven compound screening by capturing chemical-induced perturbations across molecular layers; CODE-AE, a context-aware autoencoder that enables domain adaptation from model systems to patient-derived samples; SpatialPro, a multimodal model aligning single-cell RNA-seq and spatial proteomics to map transcriptomic signals into tissue context; MMAPLE, a novel meta-semi-supervised framework that improves prediction of understudied molecular interactions under OOD and low-label regimes. These methods offer a unified and extensible foundation for data-efficient, biologically grounded modeling, aiding applications in drug discovery, personalized medicine, and computational biology.

This work is embargoed and will be available for download on Friday, February 20, 2026

Share

COinS