The Use of Latent Dirichlet Allocation to Measure Disease Similarity

Wednesday, September 16, 2015 – Poster Session I

3:30 – 5:00 p.m.

FAES Terrace

NCATS

COMPBIO-9

Authors

JM Frick
R Guha
NT Southall
DT Nguyen

Abstract

Measuring similarity between diseases is valuable for the tasks of drug repositioning and identifying shared pathways. Ontologies exist to capture such disease relationships, but some of the richest relational data exists in free text. Here we present an application of Latent Dirichlet Allocation as a way to capture similarity between disease-related documents. We use this to map uncategorized disease-related text to an existing ontology and to suggest novel treatment hypotheses. To accomplish this we learn bag-of-words topics from a corpus of OMIM records and assign each record a topic distribution. Comparison of topic distribution between documents provides a metric for disease similarity. We demonstrate it is an effective method for identifying where unmapped documents belong in an ontology and use it to suggest a means of identifying novel disease associations.

Scientific Focus Area: Computational Biology

This page was last updated on Friday, March 26, 2021

NIH Research Festival

The Use of Latent Dirichlet Allocation to Measure Disease Similarity

Authors

Abstract

2015 program