NIH Research Festival
–
–
FAES Terrace
NLM
STRUCTBIO-7
Advances in modern sequencing techniques have resulted in an explosion of genomic data. Correctly classifying this new wealth of information can be daunting not only because of the sheer volume of sequence data, but also because the propagation of erroneous and less than-ideal names and functional characterizations in the current databases gets in the way of functional classification by mere sequence similarity. We are investigating the extent to which protein domain architecture can be utilized to define groups of proteins with similarities in molecular function, and whether we can derive corresponding functional “labels”, starting with some of the most common domain architectures found in bacteria. To this end, we have developed an in-house procedure called SPARCLE ("SPecific ARChitecture Labeling Engine") that lets us track and examine specific or sub-family domain architectures, resulting from annotating protein sequences with domain footprints provided by the Conserved Domain Database (CDD), which includes hierarchical classifications for many common domain families.
Scientific Focus Area: Structural Biology
This page was last updated on Friday, March 26, 2021