PgmNr Y506: The 1002 yeast genomes project.

Authors:
J. Schacherer 1 ; J. Peter 1 ; M. De Chiara 2 ; D. Pflieger 1 ; JX. Yue 2 ; A. Bergstrom 2 ; A. Sigwalt 1 ; A. Llored 2 ; K. Freel 1 ; S. Engelen 3 ; A. Lemainque 3 ; P. Wincker 3 ; A. Friedrich 1 ; G. Liti 2


Institutes
1) University of Strasbourg, Strasbourg, FR; 2) Institute of Research on Cancer and Ageing, Nice, FR; 3) Institut de Génomique - Genoscope, Evry, FR.


Keyword: Evolution/Comparative Genomics

Abstract:

Genome-wide investigation of the patterns of polymorphism in a large sample of individuals is the first step to assess the relationship between genotype and phenotype within a species. To date, yeast population genomics only focused on a limited number of isolates. In this context, we initiated a project with the goal of describing whole-genome sequence variation in more than 1,000 natural S. cerevisiae genomes, avoiding monosporic strains (http://1002genomes.u-strasbg.fr/). Genomes were sequenced using an Illumina HiSeq 100-bp paired-end strategy that yielded a 200-fold coverage on average. Sequenced strains were selected to include as much diversity as possible in terms of global locations (including Australia, Europe, Russia, Vietnam and South Africa), as well as ecological sources (such as dairy products, trees, insects, flowers, fruit and wine). In addition, almost 1,000 were phenotyped on different conditions impacting various physiological and cellular responses, including different carbon sources, membrane and protein stability, signal transduction, sterol biosynthesis, transcription, translation, as well as osmotic and oxidative stress. In total, we analyzed 34,740 measurements for 36 traits.

Due to the broad diversity of isolates selected, this population genomic dataset revealed an accurate picture of the genomic variation (i.e. ploidy, aneuploidy, copy number and gene content variation). We found that genomic variations are correlated with the ecological origin of the isolates but also have a direct and general impact on fitness. Concerning the single nucleotide polymorphisms, a total of 58,912,916 high-quality SNPs were detected across the 1,011 genomes, which are distributed over 1,625,809 polymorphic positions. The frequency spectrum of the observed polymorphisms is highly skewed towards an excess of low-frequency alleles. The heterozygosity level also depends on the ecological origin and losses of heterozygosity are considerable in S. cerevisiae, with an average of 21 regions covering ~5 Mb per heterozygous genome.

Overall, our study led to a comprehensive view of the multiple genome evolution patterns across subpopulations within the S. cerevisiae species. Furthermore, because we performed extensive phenotyping, the high SNPs density allowed us to perform genome-wide association studies. This dataset led to the identification of a large set of functional polymorphisms that underlie phenotypic variation.