PgmNr P340: Using haplotype-based models for genomic predictions in crossbred animals and multiple breeds.

Authors:
J. E. Decker 1,2 ; M. L. Wilson 1 ; R. D. Schnabel 1,2 ; R. Weaber 3 ; J. F. Taylor 1


Institutes
1) Division of Animal Sciences, University of Missouri, Columbia, MO; 2) Informatics Institute, University of Missouri, Columbia, MO; 3) Department of Animal Sciences and Industry, Kansas State University, Manhattan, KS.


Abstract:

One of the major shortcomings of genomic prediction is the low prediction accuracy across populations. We explore the use of haplotypes, rather than SNP genotypes, as effects in genomic predictions trained to be accurate in multiple populations. We analyzed 651 Angus and 1,095 Hereford purebreds along with 695 Charolais, 283 Limousin, 301 Maine-Anjou, and 516 Simmental sired (with predominantly Angus dams) samples with phenotypes and BovineSNP50 genotypes from the Carcass Merit Project (CMP). We used 3,993 Angus, 101 Charolais, 1,225 Hereford, 2,366 Limousin, 11 Maine-Anjou, and 1,913 Simmental purebred animals in addition to the CMP animals to phase the genotype data and impute missing genotypes using BEAGLE v3. Using GEMMA, we fit traits in a Bayesian Sparse Linear Mixed Model (BSLMM). Core SNPs with the largest effects were identified from the SNP-based BSLMM analysis. Using the four flanking SNPs to each identified core SNP, we constructed non-overlapping five SNP blocks that were used to form haplotypes. Using GEMMA, we fit this new haplotype matrix as random effects in a BSLMM prediction model. We used two methods to validate these predictions. First, we used a three-fold cross-validation within breeds where clusters were chosen using genomic relationships to maximize relatedness within a cluster and minimize relatedness between clusters. Second, we omitted one breed and trained on the remaining five to assess the model’s ability to predict genetic merit in breeds not represented in the training data. Using 500 to 1,000 QTL regions (core SNPs) maximized the correlation between phenotype and predicted breeding value. Results indicate that feature selection maybe more important than the use of haplotypes. When we trained a genomic prediction model with 38,686 SNPs in all breeds, the correlations from three-fold cross validation averaged 0.17 for Hereford. When we trained a genomic prediction model using 22,427 haplotypes representing 1,000 QTLs, the average correlation for Hereford was 0.41. When we trained using the 5,000 SNPs corresponding to the 1,000 QTL regions, the average cross-validation correlation for Hereford was 0.52. When we trained in five breeds omitting Hereford and then validated in Hereford, we observed a correlation of 0.33 for the haplotype model and a correlation of 0.54 for the 5,000 SNP model. Using only the top 5,000 haplotype effects, we achieved correlations of 0.65 for both the three-fold cross validation and the validation when Hereford was excluded from the training set. We have recently genotyped 1,240 additional CMP animals from 5 breeds (Brangus, Gelbvieh, Red Angus, Salers, and Shorthorn) to be used for additional validation. We continue to evaluate models to identify the optimal use of haplotypes in genomic prediction.