NIH Research Festival
–
Approximately 80% of the more than 7,000 rare diseases have a genetic association but most lack approved treatments. One challenge for researchers, patients, and other community partners is the ability to access and coordinate data from disparate sources where they are siloed, and not organized or harmonized, making it difficult to integrate in a useful manner. To address this need, the Therapeutic Development Branch in the intramural Division of Preclinical Innovation at NCATS conceptualized Rare-SOURCE™ and in collaboration with the Advanced Biomedical and Computational Science developed and launched this user-centric bioinformatic resource platform for rare disease information. The main objective is to facilitate data mining through a searchable interface that integrates bioinformatics databases and enables users to navigate rare disease information quickly and efficiently.
Determining genotype-phenotype connections is essential in both the preclinical and clinical research environments, as it helps in fully understanding the relationship between genetic variations and disease susceptibility. Advancements in BioNLP and text-mining, powered by machine learning and transformers models like BERT, have significantly improved information extraction from biomedical texts, driving forward named entity recognition, relation extraction, and document classification. RARe-SOURCE™ aims to make rare disease literature scalable, disease-agnostic and accessible through its Literature AI feature, which employs NLP to mine titles and abstracts for disease and gene mentions and integrates other resources for synonyms and aliases. Future enhancements plan to integrate tmVar and employ generative AI for extracting variants and clinical context accurately, potentially transforming variant pathogenicity predictions for the rare disease community.
Scientific Focus Area: Computational Biology
This page was last updated on Tuesday, August 6, 2024