PgmNr D158: Genome-wide spatial-temporal gene expression pattern prediction in Drosophila melanogaster embryonic development.

Authors:
J. Zhou 1 ; I. Schor 3 ; V. Yao 1 ; O. Troyanskaya 1,2 ; E. Furlong 3


Institutes
1) Princeton University, Princeton, NJ; 2) Simons Foundation, New York, NY; 3) European Molecular Biology Laboratory, Heidelberg, Germany.


Keyword: computational algorithms

Abstract:

Spatial-temporal gene expression patterns are fundamental information for understanding embryonic developmental program and tissue-specific gene functions. High-throughput in situ experiments have provide abundant measurements for Drosophila embryogenesis, but a high proportion of protein coding genes (>5000) have not been measured and the primary output of these experiments are qualitative.  A genome-wide data-integration approach will complement the current spatial-temporal in situ data by training machine learning models that integrate all public expression and chromatin profiling data to provide genome-wide and quantitative predictions for all genes. To systematically predict spatial-temporal expression patterns in 282 tissue-stages of Drosophila melanogaster embryonic development, we developed structured in silico nano-dissection, a computational approach that predicts tissue and developmental stage specific gene expression using cell lineage information and gene co-regulation patterns from a diverse compendium of 6,378 genome-wide expression and chromatin profiling samples. Our method employs a two-stage approach, the first stage trains a predictor for each tissue-stage category, and the second stage integrates all predictions with a global cell lineage based probabilistic graphical model. We systematically evaluated our performance on holdout genes, and validated new predictions by literature and performing new experiments. On genes without previously known spatial temporal patterns or literature evidence for tissue-specificity, we verified the predictions by new in situ hybridization. Among all 13 in situ experiments with detectable expression signal, all five predicted genes for brain primordium and eight predicted genes for embryonic muscle systems were verified. The co-expression patterns of these genes in other tissues such as gut and plasmatocytes, as well as temporal patterns across developmental stages were also correctly predicted.  Furthermore, we show that our spatial-temporal expression predictions can be applied to detect the tissue specificity signals in twimef2, and trh mutant embryos, demonstrating its potential in analyzing tissue specificity signal from non-tissue-dissected experiments. Our resource together with exploratory tools are available athttp://find.princeton.edu.