PgmNr P2152: Improved accuracy of phylogenetic analyses by partitioning schemes that incorporate structural information.

Authors:
Akanksha Pandey; Edward Braun


Institutes
University of Florida, Gainesville, FL.


Abstract:

Phylogenetic analyses of ancient evolutionary relationships, like the earliest divergences among metazoans, have often used protein sequences. Maximum likelihood (ML) analyses of aligned protein sequences typically use “empirical models” of evolution, where the parameters describing the instantaneous rate of change among amino acids is estimated from large-scale “training sets” of proteins. This creates the problem that the analyses essentially assume that all positions in the alignment exhibit similar patterns of evolution (with the exception of the overall rate, which is typically modeled by assuming rates at different sites are drawn from a gamma distribution). Effectively, one assumes that all sites evolve like the average site in the average protein in these analyses. However, it is clear that different sites in proteins exhibit substantial heterogeneity in their patterns of evolution. This variation probably reflects various structural and functional constraints. A number of approaches have been proposed to incorporate this heterogeneity into phylogenetic analyses, but it seems reasonable to postulate that incorporating structural information might provide a means to move toward a more realistic model of protein evolution. Here, we examine two straightforward and computationally efficient partitioning approaches that divide proteins into subset of sites using structural information (i.e. secondary structure and relative solvent accessibility). We then compared the performance of our model on an alignment of 242 orthologous proteins for 19 metazoan taxa (104,840 sites with 16.9 % missing data). This dataset has been somewhat equivocal regarding the topology it supports but all of our analyses using the new structural partitioning schemes, with parameter estimates for each structural class, place ctenophores sister to all other metazoans. This is the topology found in a number of other analyses using more extensive taxon sampling so it is encouraging for the phylogenetic accuracy of this approach. Our approach is straightforward to implement in existing software and, based on information theoretic criteria, outperform available empirical models. The best-fitting partitioning scheme included both secondary structure and relative solvent accessibility and it used partition boundaries generated by programs in the SCRATCH package on a weighted consensus sequence for each protein. Estimates of parameters for different structural classes show many differences, suggesting that this approach provides a better overall fit to the evolutionary process.