PgmNr D140: Beyond the tip of the iceberg: New Drosophila reference genomes reveal novel structural variants.

Authors:
M. Chakraborty; Anthony Long; J. J. Emerson


Institutes
University of California, Irvine, CA.


Keyword: genome evolution

Abstract:

Identifying functionally important sequence variants in a genome is a key step in studying phenotypic evolution and uncovering disease causing mutations. Variation that copies, deletes, or rearranges chromosome segments has been shown both to be ubiquitous and phenotypically important. However, our ability to study such structural variation is hobbled by incomplete and fragmented genomes assembled from short reads.  Recent advances in long read sequencing technology makes studying copy number variation (CNV) and transposable elements (TE) far easier, revealing a vast reservoir of genetic variation that was previously invisible. Here we report a new assembly and SV analysis of a Drosophila melanogaster strain, called A4, using PacBio long reads.  The A4 genome is accurate, complete, and exceeds sequence contiguity of even the D. melanogaster reference genome assembly. Using this high quality genome, we discovered previously unknown gene sequences, like Mitf on chromosome 4, in the heterochromatic regions. Additionally, comparative genomics between the A4 and the reference genome ISO1 revealed a large number of CNVs, 40% of which were invisible to previous methods relying on PE Illumina data. Nearly 20% of the A4 genome is made up of TEs, with hundreds of new insertions distributed non-uniformly across the genome. In order to characterize the population genetics and phenotypic consequences of structural variation, we chose to sequence the 15 Drosophila Synthetic Population Resource (DSPR) founder strains (www.flyrils.org).  We have generated a list of genetic variants, including both SNP and SV from the platinum grade DSPR assemblies. These data provide a comprehensive platform for understanding evolution and the phenotypic consequences of structural variation.