PgmNr P334: Using network theory to infer and analyze population structure from genetic data.

Authors:
G. Greenbaum 1 ; A. R. Templeton 2,3 ; S. Bar-David 1


Institutes
1) Ben-Gurion Univ., Midreshet Ben-Gurion, Israel; 2) Washington Univ., St. Louis, MO; 3) Haifa Univ., Haif, Israel.


Abstract:

Clustering individuals to subpopulations based on genetic data has become commonplace in many population genetic studies. Inference of population structure is most often done by applying model-based approaches, for example as implemented in the program STRUCTURE, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail often unrealistic assumptions of prior conditions, such as that the subpopulations are at Hardy-Weinberg equilibria. We present a novel distance-based approach for inference of population structure using genetic data by defining population structure using network theory terminology and methodologies. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of the network to dense subgraphs, is equated with population structure, i.e. partition to subpopulations. Furthermore, by applying a threshold for removal of weak connections from the network, we can explore the hierarchical structure of the population. The statistical significance of each hierarchical level of structure can be estimated using permutation tests and evaluation of the partition’s modularity, a network measure for the quality of community partitions. In order to further characterize population structure, we formulate the Strength of Association (SA), the strength in which each individual is associated with its assigned community. We develop the Strength of Association Distribution (SAD) analysis, in which the SA distributions are interpreted as isolation and gene flow patterns between the subpopulations. We use both simulated data and real data of 11 human groups, extracted from the HapMap project, to demonstrate the applicability of our method. With the human data, the method detected three statistically significant hierarchical levels, corresponding to African/none-African, African/Indo-European/East-Asian, and fine-scale divisions of the population. SAD analysis showed differences in gene flow patterns between subgroups, for example the African-American and Masai groups showed lower association to the African subpopulation, but evidence of more recent gene flow was observed in the African-American SAD. The approach presented here provides a novel, computationally efficient, model-free method for inference of population structure that does not entail a priori assumptions. The method is implemented in the software NetStruct (available at https://giligreenbaum.wordpress.com/software).