Dissertations, Theses, and Capstone Projects

Date of Degree


Document Type


Degree Name





Lei Xie

Committee Members

Weigang Qiu

Eugenia G. Giannopoulou

Shaneen Singh

Subhash Sinha

Subject Categories



drug repurposing, gene expression, data mining


The conventional drug discovery process that employs the "one disease, one target, one drug'' paradigm is expensive, time-consuming, and has a high rate of failure for multi-genic complex diseases. An alternative approach to drug discovery is to repurpose an existing drug that has been used to treat some medical conditions. Drug repurposing is considered a promising method due to its accelerated the process of drug discovery and lower overall cost and risk.

Drug-perturbed gene expression profiles are powerful phenotype readouts of biological systems, and they have been widely used in drug repurposing studies. However, the existing drug-perturbed gene expression datasets are extremely noisy and the profiling is performed only in selected cell lines and compounds, limiting its applications to drug repurposing and compound screening. This thesis focuses on addressing those challenges---I have developed several novel computational methods and have demonstrated their powers in discovering novel therapeutics for multiple diseases.

First, we have designed a Bayesian signature detection pipeline to process raw data from L1000 assays into robust z-scores. The pipeline produced drug signatures for in silico drug screening and repurposing with excellent accuracy and robustness. Based on these drug signatures, we have developed a phenotypic screening pipeline to repurpose Ibudilast and MK-2206 for Alzheimer's Disease.

Second, we have employed machine learning models to predict gene expression patterns perturbed by new chemicals in new cell types. The predicted gene expression profiles are used for drug repurposing for COVID-19, pancreatic cancer, and Alzheimer's Disease, without prior experimental data. This method greatly expands the domain of in silico drug screening and phenotype-based drug repurposing.

Third, I have applied the knowledge graph model to drug repurposing. The model integrates multiple sources of information from diverse biomedical databases, including genes, drugs, phenotypes, and patients. The knowledge graph embeddings provide representations of biological entities and knowledge, helping us to uncover the relationships between the drugs and diseases.