PgmNr D1521: Highly contiguous de novo genome assembly of a non-model metazoan using PacBio long reads.

Authors:
Patrick F. Reilly; Julie Z. Peng; Peter Andolfatto


Institutes
Princeton University, PRINCETON, NJ.


Keyword: next-generation sequencing

Abstract:

Background: With the advent of population and comparative genomics research in non-model organisms, a highly contiguous and accurate reference genome has become paramount in assuring the validity of conclusions from such genome-wide analyses.  In the last two years, the references assemblies of two model eukaryotes have been significantly revised using PacBio long read sequencing technology (Saccharomyces cerevisiae and Drosophila melanogaster).  Here we extend these methods to a non-model Drosophilid, D. yakuba, in order to assess the practicality of PacBio long read assemblies for non-model metazoans, with the particular aims of rectifying previously characterized misassemblies, and producing a whole chromosome arm genome assembly for an insect genome within a reasonable budget.

Methods: We obtained ~100x of long reads from 20 SMRT cells, performed de novo assembly of the PacBio long reads using Celera Assembler and FALCON, meta-assembled these assemblies, and re-incorporated read information thrown out by the assemblers.  Using an existing linkage map, we scaffolded the contigs into full chromosome arms.

Results: We generated meta-assembled contigs with an N50 of 12.2 Mb, three of which spanned at least 90% of a chromosome arm.  Minimal scaffolding was necessary to generate chromosome arms (on the order of tens of contig joins across the genome).  The contigs also corroborated evidence of and fixed two major (5 Mb and 3 Mb) misassemblies in the existing D. yakuba reference genome.

Significance: We have shown that PacBio-only assembly can generate a highly accurate whole chromosome arm genome sequence for non-model metazoans, a necessary prerequisite for reliable genomics analyses.