PgmNr M265: Multiple mouse reference genomes defines subspecies specific haplotypes and novel coding sequences.

Authors:
T. M. Keane; Mouse Genomes Project consortium


Institutes
Wellcome Trust Sanger Institute, Cambridge, GB.


Abstract:

The Mouse Genomes Project is nearing the completion of the first draft assembled genome sequences and strain specific gene annotation for 16 laboratory and wild-derived mouse strains. The sequence accuracy of these draft genomes compares favourably to the first release of the mouse genome (MGSCv3), with increased base pair accuracy and comparable structural accuracy. For genome annotation, we have used a hybrid approach that combines evidence from the C57BL/6J Gencode annotation and strain specific transcript evidence (RNA-Seq and Pacbio cDNA) to identify and refine strain specific gene structures and alleles. The strain gene sets have provided many updates to the C57BL/6J genes, including genes that were previously missing or mis-annotated. We observe the largest number of novel gene structures in the wild derived strains. We can now determine the underlying sequence and coding alleles in the subspecies specific haplotype regions of the genome. We have identified over a hundred of these loci to date, finding enrichment in genes related to immunity, olfaction, and sensory function. We highlight examples of some well characterised QTLs that are located in these regions, which will enable more precise targeting and functional studies of these alleles. We show how these new reference genomes can be used to improve the accuracy of gene expression analysis of RNA-Seq from heterogenous mice.


The draft genome sequences and annotation can be viewed through our development UCSC genome browser (http://hgwdev-mus-strain.sdsc.edu/cgi-bin/hgGateway) and the gEVAL assembly browser (http://mice-geval.sanger.ac.uk/index.html), with full Ensembl and UCSC genome browser support coming in 2016.