PgmNr P2030: Genome wide association in presence of high density marker panels and genotyped causal variants.

Authors:
S. Toghiani 1 ; L. Y. Chang 1 ; S. Aggrey 2,3 ; R. Rekaya 1,3


Institutes
1) Department of Animal and Dairy Science, The University of Georgia, Athens, GA, USA; 2) Department of Poultry Science, The University of Georgia, Athens, GA, USA; 3) Institution of Bioinformatics, The University of Georgia, Athens, GA, USA.


Abstract:

Genome wide association studies (GWAS) rely on estimating the association between phenotypic variation and a large number of SNPs often within the framework of a linear regression (LR) or mixed linear (ML) models. Although both models have a certain level of statistical equivalency, they differ in the manner of associating phenotypic variation to genetic polymorphisms. Linear regression models directly associate the phenotype with SNP genotypes allowing for a direct dissection of the trait especially if causal variants are included on the panel. However, these models suffer from the excessive high dimensionality of the parameter space and co-linearity between SNPs. ML models remove some of the issues associated with LR methods and allow for straightforward accommodation of confounding due to population structure and family relationships. ML models rely on the estimation of the realized genetic similarity matrix (GM) between individuals using all available SNP genotypes. It is well known that the GM matrix change very little with the increase in the number of markers in the panel once a certain density threshold is reached even when causal variant genotypes are available. As the density of genotyping panels continuous to increase in density so does the probability of causal variants being genotyped. In order to investigate the robustness of ML models in the era of next generation sequencing, a simulation study was carried out for a trait with heritability of 0.4. One chromosome with 75,000 SNPs and 35 casual variants explaining 5% of total genetic variance was simulated mimicking a 2 million SNP marker panel and 1000 casual variants at the genome level. Simulated data was analyzed using LR and ML models either including or excluding causal variants. Using ML models, prediction accuracy defined as the correlation between true and estimated genetic values was the same independently of panel density, exclusion (0.293) or inclusion (0.294) of causal variants. Genetic variance was under estimated (0.013 vs. 0.02). Prediction accuracy increased by around 3% when a LR model was used and causal variants were included. Genetic variance was more accurately estimated (0.017 vs. 0.02). LR model used all available SNPs and its performance is expected to increase if a model averaging (BayesB) or SNPs prioritizing approaches were applied. As density increases and more causal variants are include in the marker panels, ML models will need some improvements mainly in the calculation of GM matrix.