Text Mining Summary Statements for Evidence of Innovation

Wednesday, September 24, 2014 — Poster Session IV
10:00 a.m. –12:00 p.m.	FAES Academic Center	CIT	COMPBIO-14

* FARE Award Winner

Authors

D.E. Russ
C.A. Johnson
L. Roberts

Abstract

Questions about the NIH research portfolio can be difficult to address in cases where the answers are not stored as structured fields in a database. Inferences can be made to address such questions using text analysis, but robust text analysis methods for dynamically testing hypotheses about the NIH research data are lacking. Reviewers’ narrative critiques of innovation are contained in summary statements. We present a study on the use of text mining to identify applications that peer reviewer assessed as innovative. To develop a training set, we asked NIH scientific review officials to select text from summary statements that indicated innovation (or lack of innovation) on a 5-point scale. Using the annotated text, we built a lexicon of words and phrases that describe innovation, and developed a classifier that can select innovative documents. We identified no meaningful differences between new and established investigators in reviewers’ sentiments about innovation, but we found that new investigators whose applications were described by reviewers as innovative were significantly more likely to submit subsequent applications that were also described by reviewers as innovative. Thus, new investigators who are identified as innovative are more likely to be innovative in the future.

2014 program

Text Mining Summary Statements for Evidence of Innovation

Authors

Abstract