Skip to main content

A biologically rich approach to identifying pharmacogenomic relations in text

Monday, October 24, 2011 — Poster Session I

Noon – 2:00 p.m.

Natcher Conference Center



* FARE Award Winner


  • B Rance
  • E Doughty
  • D Demner-Fushman
  • MG Kann
  • O Bodenreider


Objectives: Pharmacogenomics attempts to assess the influence of genetic variation on drug response. The aim of our study is to define the notion of biologically-rich pharmacogenomic relation extraction and to evaluate our approach against reference relations and against alternative approaches. Methods: From a corpus of MEDLINE articles relevant to genetic variation, we identify co-occurrences between drug mentions extracted using MetaMap (a medical concept recognizer) and RxNorm (a standardized nomenclature for drugs), and genetic variants extracted by EMU. Our results are evaluated against reference relations curated manually in PharmGKB and against the results of an NLP-rich approach. Results: One crucial aspect of our strategy is the use of biological knowledge for identifying specific genetic variants in text, not simply gene mentions. On the 104 reference articles from PharmGKB, the recall of our biologically-rich approach is 33%, similar to that of the NLP-rich approach (35%). Applied to the MEDLINE dataset, the NLP-rich approach yielded 19,978 articles, while our approach identified 4,833 articles. The overlap between the two approaches is limited (224 articles). Conclusion: We show that biologically-rich and NLP-rich approaches are complementary. This high-throughput approach could be used to assist biocurators in the identification of pharmacogenomics relations of interest from the literature.

back to top