Skip to main content

The Use of Latent Dirichlet Allocation to Measure Disease Similarity

Wednesday, September 16, 2015 — Poster Session I

3:30 p.m. – 5:00 p.m.
FAES Terrace


  • JM Frick
  • R Guha
  • NT Southall
  • DT Nguyen


Measuring similarity between diseases is valuable for the tasks of drug repositioning and identifying shared pathways. Ontologies exist to capture such disease relationships, but some of the richest relational data exists in free text. Here we present an application of Latent Dirichlet Allocation as a way to capture similarity between disease-related documents. We use this to map uncategorized disease-related text to an existing ontology and to suggest novel treatment hypotheses. To accomplish this we learn bag-of-words topics from a corpus of OMIM records and assign each record a topic distribution. Comparison of topic distribution between documents provides a metric for disease similarity. We demonstrate it is an effective method for identifying where unmapped documents belong in an ontology and use it to suggest a means of identifying novel disease associations.

Category: Computational Biology