PgmNr W4126: Textpresso: mining full text for efficiently obtaining information from the biological literature.

Authors:
Paul Sternberg 1 ; Hans-Michael Müller 1 ; Yuling Li 1 ; Kimberly Van Auken 1 ; Christian Grove 1 ; Karen Yook 1 ; Ranjana Kishore 1 ; Seth Carbon 2 ; Chris Mungall 2 ; Suzanna Lewis 2


Institutes
1) Division of Biology and Biological Engineering, California Inst of Technology, Pasadena, CA; 2) Genomics Division, Lawrence Berkeley National Lab, Berkeley,CA USA.


Keyword: Other ( database )

Abstract:

We all face a continual increase in the number of papers whose content we need to know. Curators at biological databases such as WormBase also need to efficiently locate and extract information from this expanding corpus of papers. To help reduce the tedium and cost of scouring the literature for specific experimental results, we developed the Textpresso text mining system available at www.textpresso.org.  We have completely rewritten the system to scale to millions of papers (e.g., all of PubMedCentral) and to help biocurators at the model organism databases and the Gene Ontology Consortium (GOC) make annotations to genes, cells, molecules and so forth. Textpresso indexes individual sentences with keywords and categories of terms that allow highly specific searches. Category searches in C. elegans Textpresso allow you to find all sentences in any C. elegans paper that mentions, for example, a worm gene and a human disease, or a drug, a cell type and a phenotype. Textpresso also indexes the Drosophila, mouse, zebrafish and yeast literature. We are now integrating the new system (TextpressoCentral) into the GOC’s Common Annotation Framework. This framework was designed to work for all biological literature and allows annotation using the Noctua curation tool developed to support GO’s more expressive LEGO curation paradigm, which relates gene products to specific functions in context.  We will explain the use of Textpresso for individual researchers and for biocurators, and howTextpresso will make your life better by making it much easier to find particular information quickly. For example, by automating the process of skimming thousands of papers for some detailed information, Textpresso will allow you to spend more time reading papers.