PgmNr M5047: Full length transcript sequencing of wild derived mouse strains identifies strain specific novel gene structures.

Authors:
Monica I. Abrudan 1 ; Anne Czechanski 2 ; Laura Reinholdt 2 ; Marcela Sjoberg 1 ; Thomas M. Keane 1


Institutes
1) Wellcome Trust Sanger Institute, Cambridge, Cambridgeshire, GB; 2) The Jackson Laboratory, Bar Harbor, ME.


Abstract:

The mouse gene annotation is currently based on the C57BL/6J strain, whereas there is much less evidence for gene structures in genetically distant strains such as CAST/EiJ, PWK/PhJ, and SPRET/EiJ. These distant strains are important as they include founders for recombinant panels such as the Collaborative Cross (CC) and the Diversity Outbred Cross (DO). Short read RNA-Seq has previously been generated and used to explore the splicing landscape of these strains. Full length transcript reconstruction from short reads is difficult and there has not yet been a systematic effort to generate full length cDNA sequencing for the wild-derived strains. In this experiment, we generated matched long read Pacbio cDNA and Illumina RNA-Seq for two tissues (liver and spleen) and four strains (C57BL/6J, PWK/PhJ, CAST/EiJ, and SPRET/EiJ). The vast majority of the Pacbio reads contain full length transcripts. We find that spleen has a significantly richer transcriptional landscape. The PacBio cDNA reads capture on average 23% more unique protein coding genes in spleen than in liver, across all strains, despite the fact that there exist similar total numbers of reads in the two tissues. Using a conservative approach, in liver Pacbio cDNA we found 2760, 2812, 3352 and 3460 novel splice junctions in C57BL/6J, CAST/EiJ, PWK/PhJ, and SPRET/EiJ respectively. Intersecting the splice junctions extracted from the PacBio cDNA reads, we find 91 splice junctions specific to wild derived strains only, absent from GENCODE and from the C57BL/6J cDNA reads. From the analysis of the Illumina RNA-Seq reads, we find on average 1189 new splice junctions in each sample, across all strains, in liver and 1328 in spleen. Using the PacBio cDNA reads, we identify a potential new gene belonging to the PHD Finger family, specific to SPRET/EiJ, located on Chromosome 14. This novel transcript was also confirmed by the AUGUSTUS gene prediction pipeline. In a similar fashion, we identified a potential new isoform of a protein phosphatase, Ppp2r3d, also specific to SPRET/EiJ, that has also been predicted by AUGUSTUS.