Skip to main content
 

Introducing RefSeq Functional Elements: A New Dataset Annotated by NCBI

Wednesday, September 13, 2017 — Poster Session II

3:30 p.m. – 5:00 p.m.
FAES Terrace
NLM
GEN-20

Authors

  • CM Farrell
  • T Goldfarb
  • SH Rangwala
  • KD Pruitt
  • TD Murphy
  • RefSeq Development Team

Abstract

Conventional genes, which occupy less than 2% of higher eukaryotic genomes, have long been a major focus for biomedical research and genome annotation projects. However, other functional content is found in non-genic regions involved in processes such as gene regulation, chromosome organization, recombination, or replication. Mutations in those regions can have functional consequences, with many GWASs showing a predominance of disease-associated variation in non-genic regions. Large-scale epigenomic mapping projects (e.g., ENCODE, RoadMap) have predicted gene regulatory elements based on chromatin states, but those maps are not readily apparent to all users. Furthermore, the majority of those elements have not been experimentally validated or reconciled with known functional elements in the literature. In order to fill that gap and to provide accessible annotated data, NCBI is introducing a new dataset of known non-genic functional elements in human and mouse (www.ncbi.nlm.nih.gov/refseq/functionalelements/). This dataset includes: 1) functional elements that are experimentally validated in the literature; 2) element types that are not readily identifiable by large-scale epigenomic mapping projects, e.g., recombination hotspots; 3) NCBI annotation of each element on the human and mouse reference genomes; 4) richly curated RefSeq records with detailed feature annotation, including experimental evidence and publications; and 5) Gene records with detailed metadata, including summaries and literature-based nomenclature. This dataset is accessible to a wide array of users alongside NCBI's conventional gene annotation, and is expected to be highly useful for the interpretation of non-genic sequence variation. This presentation will provide further details on this new dataset and its current availability.

Category: Genetics and Genomics