Skip to main content

NLM-Scrubber Project

Thursday, September 15, 2016 — Poster Session II

12:00 p.m. – 1:30 p.m.
FAES Terrace


  • MM Kayaalp
  • M Chen
  • PJ Sagan
  • AC Browne
  • CJ McDonald


Electronic health information has the potential to transform the way we provide clinical care and conduct clinical research. Ensuring patient privacy has been the most important barrier to the realization of this potential. Health Insurance Portability and Accountability Act (HIPAA) requires that clinical documents be stripped of personally identifying information prior to their secondary use for clinical research; however, manual clinical text de-identification is an arduous task. Although no automatic de-identifier is perfect, they can quickly produce de-identified text, which can then be easily reviewed and verified by the data providers for their de-identification accuracy. NLM-Scrubber is our approach to clinical text de-identification for protecting patient privacy and providing clinical scientists the data they need. In the NLM-Scrubber project, we develop (a) benchmarks for clinical report de-identification, (b) measurement tools to evaluate de-identification performance based on these benchmarks, (c) algorithmic methods to distinguish personal identifiers from clinical information, and (d) a software application called NLM-Scrubber that de-identifies clinical narrative reports. NLM-Scrubber is a freely available automatic clinical text de-identification tool with full support by its developers. Although we continuously add sophisticated functionalities to NLM-Scrubber, we strive to keep the user interface as simple as possible so that novice users can operate it easily.The user needs to fill out a short form stating mainly where the text files are located in the computer. Currently, NLM-Scrubber can accept only ASCII text reports formatted with proper capitalization. NLM-Scrubber is available for both Windows and Linux platforms. It can be downloaded from

Category: Research Support Services