NIH Research Festival
Medical researchers are increasing their use of routine healthcare data from Electronic Health Records to perform observational studies that can complement randomized clinical trials. To utilize unstructured data in textual reports, de-identification methods may have to be used to remove identifiers and other sensitive information. We report on a service available through the Biomedical Translational Research Information System (BTRIS) at NIH that enables the removal of personally identifiable information (PII) from text documents. The following types of identifiers can be removed: date, id (such as medical record number), last name, first name, organization, city, phone number, street, country, state, zip code, post office box or e-mail address. The resulting text may have the PII redacted, redacted and tagged with PII type (e.g. [NAME]) or randomized (real names replaced with other names from within the dataset). The service also allows specifying custom word dictionaries that further improve the de-identification. The underlying software supporting the service is “Parat Text” from Privacy Analytics (Ottawa, Canada). An example use case is sharing case report form data with portals (such as NIDA DataShare or NIAID TrialShare). The BTRIS service can be used to ensure PII is not exposed in the shared dataset using the same de-identification tool used by BTRIS. The tool has some limitations, such as inability to distinguish patient vs. provider context for last name redaction and the ability to consistently recognize foreign names.
Scientific Focus Area: Research Support Services
This page was last updated on Friday, March 26, 2021