PgmNr D1524: Improving Genome Annotation across the Drosophila Clade.

Authors:
T. D. Murphy; the Eukaryotic Genome Annotation Team


Institutes
NCBI/NLM/NIH, Bethesda, MD.


Keyword: computational algorithms

Abstract:

Comparative genomics research in the Drosophila clade has advanced considerably in the last decade with the availability of dozens of whole genome assemblies for various Drosophila species. However, many of these genomes have either old annotations that pre-date the use of RNA-seq evidence, or are from different pipelines that may complicate their use in cross-species analyses. The NCBI Eukaryotic Genome Annotation Pipeline (www.ncbi.nlm.nih.gov/genome/annotation_euk/) has been used to annotate over 300 organisms, ranging from Insects to Plants and Mammals. The pipeline generates alignment evidence including RNA-seq and cross-species protein alignments to predict gene models with Gnomon, an alignment- and HMM-based gene prediction program developed at NCBI. The pipeline also includes robust tracking logic to preserve gene, transcript, and protein identifiers with an annotation update, even with an update in the assembly.

We have used this pipeline to re-annotate many species in the Drosophila clade. In collaboration with FlyBase, these annotations have been used to update eight of the reference Drosophila annotations. This poster will discuss details of NCBI's pipeline, summarize details of the new annotation sets, and provide details on NCBI's future plans for providing annotations of other Drosophila assemblies currently available in GenBank. Some details of this presentation will also be covered in an NCBI workshop on the Saturday morning session.