Publications and Research
Document Type
Working Paper
Publication Date
12-1-2021
Abstract
Advances in biomedicine are largely fueled by exploring uncharted territories of human biology. Machine learning can both enable and accelerate discovery, but faces a fundamental hurdle when applied to unseen data with distributions that differ from previously observed ones—a common dilemma in scientific inquiry. We have developed a new deep learning framework, called Portal Learning , to explore dark chemical and biological space. Three key, novel components of our approach include: (i) end-to-end, step-wise transfer learning, in recognition of biology’s sequence-structure-function paradigm, (ii) out-of-cluster meta-learning, and (iii) stress model selection. Portal Learning provides a practical solution to the out-of-distribution (OOD) problem in statistical machine learning. Here, we have implemented Portal Learning to predict chemical- protein interactions on a genome-wide scale. Systematic studies demonstrate that Portal Learning can effectively assign ligands to unexplored gene families (unknown functions), versus existing state-of-the-art methods. Compared with AlphaFold2-based protein-ligand docking, Portal Learning significantly improved the performance by 79% in PR-AUC and 27% in ROC-AUC, respectively. The superior performance of Portal Learning allowed us to target previously “undruggable” proteins and design novel polypharmacological agents for disrupting interactions between SARS-CoV-2 and human proteins. Portal Learning is general-purpose and can be further applied to other areas of scientific inquiry.
Comments
This work was originally published in Research Square and can be accessed at https://doi.org/10.21203/rs.3.rs-1109318/v1.
This work is licensed under a Creative Commons Attribution 4.0 International License.