PgmNr Z6086: Leveraging comparative genomics for zebrafish annotation.

Authors:
Jane Loveland; Sarah Donaldson; Deepa Manthravadi; Jen Harrow


Institutes
Wellcome Trust Sanger Institute, Cambridge, GB.


Abstract:

A high quality reference genome and gene set is essential. In the Human and Vertebrate Analysis and Annotation (HAVANA) team we use our Otter/Zmap annotation tools to manually annotate the zebrafish genome, GRCz10, and collaborate closely with ZFIN to provide an accurate, dynamic and distinct resource for the zebrafish community. We annotate multiple biotypes, so as well as protein coding genes, we annotate pseudogenes and long non-coding RNAs (lncRNAs). Within these main biotypes we also have over 50 controlled vocabulary terms to further categorise and describe our genes. Our annotation tools also allow us annotate multiple species at the same time and unlock the power of comparative genomics, which can tell us a great deal about how genes and genomes evolve.

Gene clusters are a particular target for manual annotation as they are difficult to annotate with automated methods, such as Ensembl. One such cluster is the Olfactory receptors, which consists of protein coding genes and pseudogenes, whose numbers and biotypes vary hugely between organisms, and are the largest multigene family in vertebrates.  We are manually annotating these genes and have found them to be greatly expanded in the mouse genome (~1500 genes/pseudogenes) relative to human (~900 genes/pseudogenes), and much fewer in the zebrafish genome (~135 genes, of these only 3 are pseudogenes).

LncRNAs show an absence of sequence homology between different organisms. Despite this many exhibit positional synteny, such as the lncRNA gene sox2ot, which suggests functional conservation. The sox2ot gene has been manually annotated and shown to be highly conserved between human, mouse and zebrafish and is thought to be important during embryo development and is deregulated in cancer.

Our experience of annotating across several species is proving to be a powerful tool for gene discovery in model organisms.

All of the manual annotation is publicly available from the Vertebrate Genome Annotation database (VEGA), and this is merged with the Ensembl gene set quarterly. Annotation is a continuous process and so between database updates all new annotation is made available for all of our whole genome species in the update track in VEGA:  http://vega.sanger.ac.uk/info/data/frequent_update.htm

Our annotation software is freely available via our website: http://www.sanger.ac.uk/resources/software/otterlace/.



ZFIN Genetics Index
1. sox2ot