PgmNr M5055: The future of reference assembly updates.

Authors:
V. A. Schneider; on behalf of the Genome Reference Consortium and NCBI Annotation Team


Institutes
NIH/NCBI, Bethesda, MD.


Abstract:

An organism’s reference genome assembly provides a standard coordinate system for reporting, a substrate for annotation and serves as the basis for analyses ranging from the study of individual genes to population genomics. As a result, the quality and content of a reference assembly is critical to research success. The Genome Reference Consortium (GRC) is responsible for updates to the mouse and zebrafish reference genome assemblies, including closing gaps, correcting path and sequencing errors and adding sequence to represent diversity. The chromosomes of the clone-based reference genome assemblies for these species each represent a single strain, C57BL/6J and Tü, respectively. However, due to inter-strain variation, some genomic regions are insufficiently represented by a single strain. Historically, the GRC has provided sequence representations for additional mouse strains at divergent regions with alternate loci scaffolds that are also comprised of finished, clone-based sequence. However, costs associated with mapping and sequencing genomic clones have limited this effort. As sequencing costs continue to fall and read lengths grow longer, though, ongoing and planned efforts by various investigators to sequence and assemble high quality genomes from different mouse and zebrafish strains provide the GRC with new opportunities to represent diversity and correct errors. We will present examples of recent curations to the mouse and zebrafish reference assemblies, as well as plans for curation in the context of multiple high quality assemblies. NCBI provides bioinformatics and database support for the GRC and annotates these reference assemblies, including the alternate loci, as part of its genome annotation pipeline. Annotation features include genes, RefSeq transcripts, genomic clone placements, repeats and genomic sequences not included in the assembly. Additionally, the GRC provides annotations that provide information about assembly quality and curation efforts. These annotations can be viewed in the NCBI Map Viewer, which permits the simultaneous display of genomic maps with different coordinate systems, as well as the Genome Data Viewer, a browser that supports the upload of user data, enabling it to be viewed alongside NCBI annotations. We will demonstrate how to use these browser resources to evaluate GRC-curated reference assemblies. The GRC welcomes public feedback on the mouse and zebrafish assemblies and displays information about regions under review on its website (http://genomereference.org).