Skip to main content

Surveying the NIH Biomedical Informatics and Computational Biology (BICB) Portfolio through Supervised Machine Learning

Wednesday, September 24, 2014 — Poster Session IV

10:00 a.m. –12:00 p.m.

FAES Academic Center




  • P.M. Lyster
  • K.A. Smith
  • W.W. Lau
  • C.A. Johnson


We describe an approach for classifying NIH research funding dealing with “Biomedical Informatics and Computational Biology” (BICB). In doing so, we describe a method to obtain an inventory, including dollar amounts, of grants and contracts. The approach we have adopted involves first developing a set of parsimonious categories that describe the types of BICB research projects that are funded by NIH: applications and modeling, informatics, image and signal analysis, high throughput tools, software and productivity, biostatistics, and high end computing. One of us (PML) then acted as a ‘rater’ who identified a gold-standard set of projects out of the broad NIH portfolio of research grants and assessed to be a best fit to the BICB classes; this is called a ‘training set’. In the course of our research we developed a support-vector machine based classifier, here referred to as SAE-SVM, which was developed for retrieving a complete inventory of grants and contracts based on the training set. The SAE-SVM is a flexible and extensible framework for retrieving general research inventories. We show that the knowledge gained from the model-based survey of the BICB portfolio allows for a greater understanding of IC investments in the BICB categories.

back to top