PgmNr P2121: Population genetics models with selection for phylogenetic inference.

Authors:
Jeremy Beaulieu; Cedric Landerer; Russell Zaretzki; Michael Gilchrist; Brian O'Meara


Institutes
University of Tennessee, Knoxville, TN.


Abstract:

Models used for phylogenetic inference typically ignore mutation, drift, and selection processes when analyzing protein-coding sequences, and thus try to fit the pattern with parameters that are not entirely based on population genetics. In other words, they do not allow different proteins to have different sensitivities to protein structure, incorporate gene expression levels, and/or strengths of selection. We develop a population genetics based model that uses biological parameters, such as energy cost of protein production, physical properties of amino acids, and levels of gene expression, to create more realistic models for DNA substitution for protein-coding genes. Our new model, which we refer to as SELAC (SELection on Amino acids and/or Codons), specifically infers the strength of selection on codon usage and amino acid sequence, sensitivity of protein function to different amino acid properties such as size and polarity, and mutation rates when inferring phylogenetic relationships among taxa. We use a rigorous simulation approach to show that our model can detect meaningful differences in the model parameters across genes under a variety of many different scenarios, including those that naturally violate the assumptions made by SELAC. We will also present preliminary phylogenetic analyses of several empirical genomic data sets for clades that represent different parts of the tree of life, which all show a dramatic improvement in model fit compared to traditional models. Our model also provides better estimates of the underlying branch lengths in the phylogeny, and can better predict empirical sequences, an indication of its overall adequacy.