Date of Degree


Document Type


Degree Name





Lei Xie

Committee Members

Weigang Qiu

Shaneen Singh

Emilio Gallicchio

Ping Zhang

Subject Categories

Artificial Intelligence and Robotics | Biochemistry | Bioinformatics | Biostatistics | Computational Biology | Databases and Information Systems | Data Science | Systems Biology


Machine Learning, Artificial Intelligence, Drug Repurposing, Drug Repositioning, Computational Biology, Matrix Factorization


The cost of bringing a drug to market is astounding and the failure rate is intimidating. Drug discovery has been of limited success under the conventional reductionist model of one-drug-one-gene-one-disease paradigm, where a single disease-associated gene is identified and a molecular binder to the specific target is subsequently designed. Under the simplistic paradigm of drug discovery, a drug molecule is assumed to interact only with the intended on-target. However, small molecular drugs often interact with multiple targets, and those off-target interactions are not considered under the conventional paradigm. As a result, drug-induced side effects and adverse reactions are often neglected until a very late stage of the drug discovery, where the discovery of drug-induced side effects and potential drug resistance can decrease the value of the drug and even completely invalidate the use of the drug. Thus, a new paradigm in drug discovery is needed.

Structural systems pharmacology is a new paradigm in drug discovery that the drug activities are studied by data-driven large-scale models with considerations of the structures and drugs. Structural systems pharmacology will model, on a genome scale, the energetic and dynamic modifications of protein targets by drug molecules as well as the subsequent collective effects of drug-target interactions on the phenotypic drug responses. To date, however, few experimental and computational methods can determine genome-wide protein-ligand interaction networks and the clinical outcomes mediated by them. As a result, the majority of proteins have not been charted for their small molecular ligands; we have a limited understanding of drug actions. To address the challenge, this dissertation seeks to develop and experimentally validate innovative computational methods to infer genome-wide protein-ligand interactions and multi-scale drug-phenotype associations, including drug-induced side effects. The hypothesis is that the integration of data-driven bioinformatics tools with structure-and-mechanism-based molecular modeling methods will lead to an optimal tool for accurately predicting drug actions and drug associated phenotypic responses, such as side effects.

This dissertation starts by reviewing the current status of computational drug discovery for complex diseases in Chapter 1. In Chapter 2, we present REMAP, a one-class collaborative filtering method to predict off-target interactions from protein-ligand interaction network. In our later work, REMAP was integrated with structural genomics and statistical machine learning methods to design a dual-indication polypharmacological anticancer therapy. In Chapter 3, we extend REMAP, the core method in Chapter 2, into a multi-ranked collaborative filtering algorithm, WINTF, and present relevant mathematical justifications. Chapter 4 is an application of WINTF to repurpose an FDA-approved drug diazoxide as a potential treatment for triple negative breast cancer, a deadly subtype of breast cancer. In Chapter 5, we present a multilayer extension of REMAP, applied to predict drug-induced side effects and the associated biological pathways. In Chapter 6, we close this dissertation by presenting a deep learning application to learn biochemical features from protein sequence representation using a natural language processing method.