Skip to main content

A multi-factor approach to identifying missed synonymy in the UMLS

Tuesday, October 09, 2012 — Poster Session I

1:00 p.m. – 3:00 p.m

Natcher Conference Center, Building 45




  • B Rance
  • O Bodenreider


Motivation: The Unified Medical Language System (UMLS) integrates 160 biomedical vocabularies. Despite the quality assurance procedures built in the UMLS development process, missed synonymy, i.e., the existence of several distinct concepts for the same meaning, has been reported. Misspellings and, more generally, uncontrolled lexical variation in biomedical terminologies can lead to missed synonymy and have detrimental consequences on data retrieval, reasoning or literature mining processes. Material and methods: In order to detect missed synonymy. we collect lexical, semantic and structural information in the UMLS, as well as contextual information from the biomedical literature and internet sources. We mine this information through a variety of automated techniques, combined into a system for predicting missed synonymy. Result: Our model has a precision of 0.71, and a recall of 0.74. Applied to a set of candidates, it identified 515 potential missed synonyms. Conclusion: Our approach has sufficient precision to provide effective assistance to the UMLS editors. Some of the errors detected have already been reported to and corrected by the developers of source terminologies. Although limited to lexically close candidates in this study, our approach can be used in a more general context.

back to top