Annotation of structural motifs in the Conserved Domain Database (CDD)

Wednesday, September 24, 2014 — Poster Session IV
10:00 a.m. –12:00 p.m.	FAES Academic Center	NLM	STRUCTBIO-3

* FARE Award Winner

Authors

N.R. Gonzales
F. Chitsaz
M.K. Derbyshire
M. Gwadz
F. Lu
G.H. Marchler
J.S. Song
N. Thanki
J. Wang
R.A. Yamashita
C. Zheng
S.H. Bryant
A. Marchler-Bauer

Abstract

CDD is a collection of multiple sequence alignments that represent ancient conserved domains, organized into superfamily clusters that represent evolutionarily related domains. CDD includes NCBI-curated domain models (cd accessions) as well as models obtained from Pfam, SMART, TIGRFAM, NCBI Protein Clusters, and COGs. We define structural motifs as compositionally-biased and/or short repetitive regions in proteins, which are difficult to model as functional domains conserved in molecular evolution. These include transmembrane regions, coiled coils, and short repeats with variable copy numbers. We developed models for structural motifs that allow us to annotate these regions efficiently and accurately. In many cases, only a few PSSM models suffice to annotate more than 90% of known instances of a specific structural motif. Here, we present the development of structure motif models for several repeat types including LRR, ARM, HEAT, TPR, and zinc-fingers, among others. While the strategy works well for repetitive segments with characteristic sequence signatures, we find that a lack of sequence similarity within coiled-coil regions prohibits the development of only a few generic PSSM models. Instead, we have developed models for coiled coil regions in the context of specific families and will annotate these motifs in existing and future conserved domain models.

2014 program

Annotation of structural motifs in the Conserved Domain Database (CDD)

Authors

Abstract