PgmNr P369: Estimating ages of singletons and other rare alleles.

Authors:
A. Platt; J. Hey


Institutes
Temple University, Philadelphia, PA.


Abstract:

The ages of different variants segregating in the human population is a topic of considerable current interest. In a typical population genomic sample, however, a full half of the identified variants are present in only a single copy. Existing methods that estimate ages based on allele frequencies can only assign all of these variants to a single, yougest, age class, and methods based on the decay of linkage disequilbrium or shared haplotypes surrounding rare alleles are inapplicable.  This leaves this largest class of variants almost completely uncharacterized. 

There exists, however, real information in a population genomic sample that will allow us to estimate the ages of individual singleton alleles. In an infinite sites model where each allele has a unique origin, the mutation that created an allele found in a singleton must post-date the most recent common ancestor shared between the individual carrying the singleton allele and any other individual in the sample. We propose an estimate of the time since this common ancestor as a function of the maximum length of haplotype shared between the individual containing the singleton allele and any other individual in the sample. Conditional on the age of this ancestor, the probability distribution of the age of the allele is uniform over the open range (0, age of ancestor) and has an expected age half that of the common ancestor.

This estimator applies not just to singletons, but any rare allele. For alleles present in multiple copies we use the singleton model to generate a composite likelihood estimate of the age of the most recent common ancestor shared between all of the individuals carrying the allele and any of the other individuals in the sample. This estimate will be more a more precise than in the singleton case due to combining the estimates from multiple observations.  Furthermore, previous methods for estimating the age of the most recent common ancestor of the individuals who share the allele refine the estimate of the age of the allele itself to a uniform density over the interval (age of ancestor among carriers, age of ancestor between carriers and others).