PgmNr P2093: A Bayesian approach for the imputation of genotypes on observed markers in complex pedigrees.

Authors:
D. Leroux; S. Jasson


Institutes
MIAT, Université de Toulouse, INRA, Castanet-Tolosan, FR.


Abstract:

We present a method to tackle the challenges posed by the modern genotype data sets in quantitative genetics and allow for quick imputation of parental origin probabilities (POP) in complex populations, for each marker, and given any number of allelic observations.

QTL analysis and genetic cartography software traditionally use parental origin (PO) data on a set of observed markers to perform their computations. This data used to be easily encoded in simple pedigrees such as backcrosses, bi-parental series of selfings, or the like. The most common encoding, introduced by Mapmaker, uses the letters ABHCD to denote PO in populations with two ancestral lines and is the result of a hand-made inference: restricting the data to markers that are homozygous on both ancestors yet distinct, one can immediately derive the PO for any individual in the population. Unfortunately this method doesn't generalize to the multi-parental population designs that are the modern trend. With complex population structures such as multi-parental generalized intercrosses (MAGIC), the allelic observations are often not informative enough to simply derive PO the old way, mainly due to limited polymorphism in regard to the number of ancestral lines.

We present a technique to infer the POP on all individuals in a given pedigree of any size and structure given observations on any subset of individuals. We represent the pedigree as a Bayesian network with discrete variables. Each variable corresponds to an individual in the pedigree and its domain is the Cartesian product of the possible PO and observed alleles (typically SNP data). This paradigm makes full use of all the available information for each individual, including the information from its relatives, to compute the POP. We also provide an algorithm to efficiently compute in amortized linear time the junction tree corresponding to the pedigree, which leads to exact inference on each marker of the PO/allele probabilities by means of simple belief propagation. This algorithm takes into account non-trivial reentrant individuals to compute the proper joint probabilities where required. We finally show how these results can be used in QTL analysis by extracting the POP for the individuals that have also been phenotyped and to genetic cartography by extracting the selection variable of each meiosis in the pedigree to create datasets that can be treated like back-crosses by the cartography software. An implementation of this method is in progress and will be made available soon to the research community.