PgmNr C14: De-Novo sequencing of the Paramecium tetraurelia macronucleolar (MAC) genome using Pacific Biosciences single molecule long reads for improvement of genome assembly and annotation.

Authors:
R. Woycicki; C. Hoehener; K. Schneeberger; E. Swart; S. Bhullar; M. Nowacki


Institutes
University of Bern, Bern, Bern, CH.


Abstract:

De-novo sequencing of the Paramecium tetraurelia MAC genome, set previously its size to be of 72 Mbp. Since discovery of NGS methods the Paramecium MAC genome is routinely sequenced for mapping proposes during the course of studying the massive whole genome rearrangements system. Our recent estimation of the genome size made with Illumina 2x125 PE data (coverage > 100x) using the PreQC module from the String Graph Assembler software, showed that the actual MAC genome size can be about 20 percent larger. Unlike the first and second generation sequencing methods, Pacific Biosciences (PacBio) sequencing shows more equal coverage along the genome. No template amplification eliminates bias towards cloned/amplified sequences before the actual sequencing. Taking into consideration the above as well as the very long PacBio reads, we have chosen this system to de-novo sequence and assemble the Paramecium tetraurelia MAC genome with the goal of improvement both the genome assembly as well as the genes annotation.

Using the newest PacBio P6-C4 chemistry and 13 SMRT cells, the sequencing resulted in more than 250x coverage of the Paramecium tetraurelia strain 51 mating type 7 MAC genome. The reads error correction and assembly was conducted using first the PBcR pipeline from Celera Assembler ver. 8.3rc2 and later using Canu software (v1.0/v1.1).

Our preliminary de-novo assembly allowed us to map to it more shotgun Illumina and Sanger genomic, cDNA and MAC matching 25nt smallRNA reads, than to the reference 51 strain assembly. Based on cDNA mapping we estimate that our current assembly may have a few hundred more new genes.

We will present the current state of genome assembly and annotation which aims to reconstruct a more complete genome.