PgmNr P2087: Integrated Genetic Analysis Platform (IGAP) for Web-based Interactive Association Analysis and Visualization of Large Scale Genotype/Phenotype Data.

Authors:
G. Jun


Institutes
UTHealth School of Public Health, Houston, TX.


Abstract:

Modern genetic association studies require analysis of millions of variants often with hundreds or thousands of phenotypic variables. Association analysis typically involves many steps of human interventions, such as parsing variant annotation, unification of missing data identifier, sample ID matching, identification of population or family structures, stratification of input data according to population or other structural batches, and parsing phenotypic labels for traits of interest and covariates. Genotype and phenotype data are usually provided in human readable formats, but not with strictest format guidelines for automated pipelines.  Due to these issues, a significant amount of analysts’ time is spent on rectifying input files both for genotypes and phenotypes, and also on visualizing and parsing various quality metrics. These tasks typically require a fair amount of computer programming, and such (often in-house) programs are not being reused widely because every project has a slightly different data structure from each other. It is important to automate these steps, because computing time is cheap, and will get cheaper in the future, while an analyst’s time is not. We propose a new web-based interactive pipeline for genetic association analysis, named Integrated Genetic Analysis Platform (IGAP) that provides automated, re-usable framework for large-scale genetic association analysis. IGAP is a web-based front-end environment that provides encapsulation of a collection of external tools and pipelines including PLINK, Merlin, and EPACTS. It accepts and provides easy data conversion tools between common genotype formats (VCF, PLINK and Merlin) from web browser interface, and also parses metadata from VCF file’s INFO field to generate functional groups with minimal user interventions. A user can easily visualize population structures using PCA or MDS using back-end tools and data converters, and resulting plot is dynamically loaded onto the web browser. Phenotype data are stored in MySQL database together with meta-data from genotypes, and provides an interactive interface for common tasks such as matching sample IDs and defining subsets of samples for a given project. It invokes association analyses and visualizes results on the web browser. IGAP provides a flexible and easy-to-use interactive environment for various types of genetic analyses.