Dissertations, Theses, and Capstone Projects

Date of Degree

9-2017

Document Type

Dissertation

Degree Name

Ph.D.

Program

Computer Science

Advisor

Saad Mneimneh

Committee Members

Amotz Bar-Noy

Lei Xie

Iman Hajirasouliha

Subject Categories

Bioinformatics | Computer Sciences | Theory and Algorithms

Keywords

RNA Interaction, Multiple RNA Interaction

Abstract

The interaction of two RNA molecules involves a complex interplay between folding and binding that warranted recent developments in RNA-RNA interaction algorithms. However, biological mechanisms in which more than two RNAs take part in an interaction also exist.

A typical algorithmic approach to such problems is to find the minimum energy structure. Often the computationally optimal solution does not represent the biologically correct structure of the interaction. In addition, different biological structures may be observed, depending on several factors. Furthermore, scoring techniques often miss critical details about dependencies within different parts of the structure, which typically leads to lower scores (i.e., higher energies). This necessitates development of algorithms to determine not only the optimal solution but also suboptimal solutions, while accounting for dependencies.

We formulate multiple RNA interaction as a combinatorial optimization problem. This problem is NP-hard, so we focus on three aspects in our thesis:

1) Design and analyze approximation algorithms for solving the optimization version,

2) Develop enumeration and sampling algorithms to generate and cluster suboptimal solutions, and

3) Extend existing scoring formulations to account for dependencies within the structure.

The inclusion of dependencies in scoring formulations increases the accuracy of sampling algorithms.

For the first task, we develop and implement a dynamic programming based polynomial time approximation scheme. To generate suboptimal solutions, we consider two different approaches. The first is an enumeration algorithm that generates all possible structures within a given fraction of the optimal energy. The other uses Gibbs sampling and Markov Chain Monte Carlo methods to draw samples of solutions from the Boltzmann distribution. Since both approaches generate many structures that may be similar, we develop a distance function to cluster them into unique shapes. For our third goal, we explore several formulations to capture dependencies, and adopt those that are computationally tractable.

We implement the optimization and approximation algorithms to predict the optimal solution, as well as the sampling algorithm to predict suboptimal solutions, and validate their results experimentally.

Share

COinS