Skip to main content

Analysis of public data dictionaries and study reports from human clinical trials: A case study in HIV/AIDS research

Thursday, September 13, 2018 — Poster Session III

12:00 p.m. – 1:30 p.m.
FAES Terrace


  • V Huser
  • KW Fung


Sharing of de-identified individual participant data from completed clinical trials or observational studies can facilitate additional research advances. Our study is focused on data dictionaries of published HIV studies with a goal to assess the use of common data elements (CDEs). An additional goal is to estimate the overlap with data collected during routine health care delivery. Project repository at includes detailed results. We used two sets of HIV studies. First, we reviewed data dictionaries included in record (9 studies, repository file S_A_CTG.csv). Second, we found 86 HIV studies (file S_B_CTG.csv) that answered ‘Yes’ to individual participant data sharing and attempted to obtain their data dictionaries. To include studies with less detailed record, we created PubMed strategy for HIV study results articles that would inform us of additional data elements. We analyzed the results with regard to the format of the data dictionaries and their content. We found great variability in the format – including PDF files, tables and csv files. We did not find any study using standard formats such as Define-XML (standard used by FDA) or REDCap format. We analyzed the content of the data dictionaries using a subset of data elements from the AllofUs study protocol that were relevant to the HIV domain, including patient identifier, sex, visit date, onset of HIV infection and CD4 count. Based on our observations, we drafted a set of recommendations for sharing data dictionaries for future human study datasets.

Category: Epidemiology