PgmNr M5102: Phylogenetically based Gene Ontology (GO) Annotations using the Phylogenetic Annotation and INference Tool (PAINT).

Authors:
K. R. Christie 1 ; M. Feuermann 2 ; P. Gaudet 2 ; S. E. Lewis 3 ; D. Li 4 ; H. Mi 5 ; M. C. Munoz-Torres 3 ; P. D. Thomas 5 ; J. A. Blake 1 ; The Gene Ontology Consortium


Institutes
1) The Jackson Laboratory, Bar Harbor, ME; 2) Swiss Institute for Bioinformatics, Geneva, Switzerland; 3) Lawrence Berkeley National Laboratory, Berkeley, CA; 4) Phoenix Bioinformatics, Redwood City, CA; 5) University of Southern California, Los Angeles, CA.


Abstract:

A major goal of the Gene Ontology (GO) project is to describe the functions of genes from all kingdoms of life in a consistent way. For a limited set of model organisms, annotation of the functions of at least some of the genes can be done based on direct experimental data. However, for many organisms, little or no experimental data exists. Even for model organisms, there are many genes that are not directly characterized. In these situations, other methods of capturing functional data are necessary. One method utilized by the GO community uses phylogenetic trees of related sequences (via Panther) that are overlaid with experimental GO annotations from which evolutionarily based functional annotations can be inferred.  The PAINT (Phylogenetic Annotation and INference Tool) curation tool has been developed to support this effort. The curator can view the distribution of GO terms that are based on direct experimental evidence for each species to infer the likely evolutionary history of functions and thus which functions can be propagated to which sequences within the tree. This phylogenetic method helps provide more complete annotations of genomes for use in term enrichment and genomic analyses, sometimes more detailed that what is provided using domain analysis, e.g. via InterPRO domains. For example, the small subunit (SSU) processome involved in biogenesis of the small ribosomal subunit is well characterized in S. cerevisiae, but not in the laboratory mouse. In the case of the mouse Wdr3 gene, which is homologous to S. cerevisiae DIP2 (aka UTP12), curation using phylogenetic analysis provided detailed annotations for mouse Wdr3. In contrast, the InterPRO domains for this gene did not provide useful annotations, and an annotation transferred by sequence similarity comparison provided a somewhat misleading annotation to an overly specific RNA binding term. Thus annotations from this phylogenetic annotation method can provide detailed GO annotations for genes based on experimental characterization of their homologs in other organisms.

This work is funded by HG 002273 to the Gene Ontology Consortium.