RefSeq: Curation of stop codon recoding in vertebrates
Friday, September 16, 2016 — Poster Session IV
- B Rajput
- TD Murphy
- KD Pruitt
The Reference Sequence (RefSeq) database at NCBI is a collection of annotated genomic, transcript and protein sequence records for genomes across a wide taxonomic spectrum. The curated RefSeq transcript and protein dataset is a critical reagent for NCBI’s eukaryotic genome annotation pipeline, and is considered a gold standard by many in the scientific community. Targeted curation of genes, such as those with exceptional biology, is also one of the goals of the RefSeq project to serve the research community. Stop codon recoding is a co-translational event, where a stop codon is recoded as an amino acid stimulated by cis-acting regulatory elements. Two such examples in eukaryotes are the focus of this poster: selenoproteins, which contain the non-universal amino acid, selenocysteine (Sec) encoded by the UGA codon that normally signals translation termination; stop codon readthrough (SCR), where translation continues beyond the annotated stop codon to an in-frame downstream stop codon, generating a C-terminally extended protein isoform. SCR differs from Sec insertion in that readthrough has been observed across all 3 stop codons (UGA, UAG and UAA) and the amino acid specified at the readthrough site can be one of several universal amino acids. Conventional computational tools cannot distinguish between the dual functionality of the UGA codon or predict SCR; hence, manual curation is essential for accurate representation of these gene products. Manual review resulted in 443 curated selenoprotein records in 32 vertebrate species, and 65 SCR records in 8 model organisms: human, rhesus monkey, cow, mouse, rat, chicken, frog and zebrafish.
Category: Computational Biology