Date of Degree
Bioinformatics | Computer Sciences | Theory and Algorithms
RNA Interaction, Multiple RNA Interaction
The interaction of two RNA molecules involves a complex interplay between folding and binding that warranted recent developments in RNA-RNA interaction algorithms. However, biological mechanisms in which more than two RNAs take part in an interaction also exist.
A typical algorithmic approach to such problems is to find the minimum energy structure. Often the computationally optimal solution does not represent the biologically correct structure of the interaction. In addition, different biological structures may be observed, depending on several factors. Furthermore, scoring techniques often miss critical details about dependencies within different parts of the structure, which typically leads to lower scores (i.e., higher energies). This necessitates development of algorithms to determine not only the optimal solution but also suboptimal solutions, while accounting for dependencies.
We formulate multiple RNA interaction as a combinatorial optimization problem. This problem is NP-hard, so we focus on three aspects in our thesis:
1) Design and analyze approximation algorithms for solving the optimization version,
2) Develop enumeration and sampling algorithms to generate and cluster suboptimal solutions, and
3) Extend existing scoring formulations to account for dependencies within the structure.
The inclusion of dependencies in scoring formulations increases the accuracy of sampling algorithms.
For the first task, we develop and implement a dynamic programming based polynomial time approximation scheme. To generate suboptimal solutions, we consider two different approaches. The first is an enumeration algorithm that generates all possible structures within a given fraction of the optimal energy. The other uses Gibbs sampling and Markov Chain Monte Carlo methods to draw samples of solutions from the Boltzmann distribution. Since both approaches generate many structures that may be similar, we develop a distance function to cluster them into unique shapes. For our third goal, we explore several formulations to capture dependencies, and adopt those that are computationally tractable.
We implement the optimization and approximation algorithms to predict the optimal solution, as well as the sampling algorithm to predict suboptimal solutions, and validate their results experimentally.
Ahmed, Syed Ali, "A Combinatorial Framework for Multiple RNA Interaction Prediction" (2017). CUNY Academic Works.
This work is embargoed and will be available for download on Friday, March 30, 2018
Graduate Center users:
To read this work, log in to your GC ILL account and place a thesis request.
See the GC’s lending policies to learn more.