Dissertations, Theses, and Capstone Projects

Date of Degree

9-2017

Document Type

Dissertation

Degree Name

Ph.D.

Program

Biology

Advisor

Michael J. Hickerson

Committee Members

Ana C. Carnaval

Stéphane Boissinot

Andrew D. Kern

Brenna M. Henn

Subject Categories

Computational Biology | Evolution | Genomics

Keywords

Comparative Phylogeography, Population Genomics, Hierarchical Co-Demographic Modeling, Approximate Bayesian Computation, Random Forest Machine Ensemble Learning Method

Abstract

Comparing demographic histories across assemblages of populations, species, and sister pairs has been a focus in phylogeography since its inception. Initial approaches utilized organelle genetic data and involved qualitative comparisons of genetic patterns for evaluating hypotheses of shared evolutionary responses to past environmental changes. This endeavor has progressed with coalescent model-based statistical techniques and advances in next-generation sequencing, yet there remains a need for methods that can analyze aggregated genomic-scale data from non-model organisms within a unified framework that considers individual taxon uncertainty and variance. To this end, the aggregate site frequency spectrum (aSFS), an expansion of the site frequency spectrum to exploit SNP data collected from multiple independent populations, and the aggregate joint site frequency spectrum (ajSFS), an extension of the aSFS for population-pairs, are introduced and explored here for the purpose of assemblage-level demographic inference. Furthermore, introduced and described here is the R package Multi-DICE, a wrapper program that exploits existing simulation software for straight-forward and flexible execution of hierarchical co-demographic model-based inference given either the aSFS or single-locus sequence data. These methodological developments were validated through a succession of in silico experiments that tested a range of sampling configurations, alternative inferential frameworks, and various prior specifications. Additionally, empirical demonstrations were conducted from published RAD-seq data of five threespine stickleback populations as well as eight local replicates of a lamprey species-pair. Synchronous demographic trajectories were detected for both of these analyses. Moreover, similar techniques were utilized to investigate LINE selection among population-level whole-genome vertebrate datasets. In brief, a null demographic background was inferred utilizing SNP data, which was then exploited to simulate a putative null distribution of summary statistics that was compared to LINE data for detecting selection. Subsequently, the null demographic model was leveraged to evaluate selection presence, directionality, and strength. There was a robust signal for purifying selection along with a pattern of LINE size affecting selection strength in two species. As large-scale SNP data become routine, the aSFS, Multi-DICE, ajSFS, and protocol employed here for detecting selection will collectively expand the potential for powerful comparative phylogeographic and population genomic inference.

Share

COinS