PgmNr M261: Discovery, assembly, and annotation of subspecies specific haplotypes in classical and wild-derived mouse strains.

Authors:
J. Li; Keane Thomas


Institutes
Wellcome Trust Sanger Institute, Cambridge, GB.


Abstract:

The vast majority of modern mouse strains are derived from three M. Musculus subspecies: M. m. domesticus, Mus musculus musculus, and Mus musculus castaneus. Currently, the only mouse strain with a fully assembled and annotated genome, C57BL/6J, is primarily M. m. domesticus in origin. Apart from a small number of well-studied loci, this has meant the sequence and coding alleles in regions of the genome containing subspecies specific haplotypes are largely unknown. In this project, we have used the whole-genome assemblies produced by the Mouse Genomes Project to create the first genome-wide catalog subspecies specific haplotypes and alleles across 16 laboratory mouse strains, including wild-derived strains representatives of Mus musculus musculus, Mus musculus castaneus, and Mus spretus.

In this project, we have discovered more than one hundred subspecies specific haplotypes by identifying dense clusters of heterozygous SNPs in the reference genome as a marker and examining the corresponding assembled sequence in the strains. We find many more of these highly polymorphic loci in the wild derived laboratory strains and observe significant enrichment for genes involved in immunity, programmed cell death, kin recognition, neuron development and sensory functions. We successfully reassembled and annotated four immune related loci: Ifi47 (IRG), schlafen and Nlrp1 loci in four wild derived mouse strains. The results show a startling amount of structural variation compared to other loci in the genome reflecting the remnants of balancing selection and repeated host-pathogen co-evolution. We have identified new allelic forms of these genes, gene shuffling, large translocations, and incorporation of open reading frames (ORFs) from other parts of the genome, including several cases of gene disruption caused by transposable elements, and rearrangement between promoter regions and ORF of gene family members, which significantly alter gene expression levels. The genome structure of these regions will provide the basis for understanding different infection phenotypic responses observed in genetic reference panels such as the Collaborative Cross and the Diversity Outbred Cross.