There seems to be no question that
biologists believe sequence comparison is useful. The
BLAST server at NCBI alone performs over 70,000 database
searches daily and over 120,000 scientific papers refer
to some aspect of biological sequence comparison. Furthermore,
one of the most compelling yet implicit justifications
for the investment in high throughput genome sequencing
projects has been the expectation that many of the gene
products within this growing inventory will match previously
studied proteins.
It was not always so - the first papers
describing useful discoveries from sequence database
searches often termed this detection of evolutionary
relationships as "serendipitous" or "unexpected".
Subsequent studies on protein sequences and structures
showed that detectable conservation over hundreds of
millions and even billions years of evolution is a rule,
rather than an exception, in biology. Extrapolations
made by several groups using different methods suggested
that there are only about 1000 basic protein folds and
a complete classification of all protein families is
a realistic goal for the near future.
Though we don't yet
know why most proteins evolve so slowly,
it is important to realize that the
conservative mode of protein evolution
determines our very ability to make
sense out of genome comparisons and
that theoretical and empirical studies
in molecular evolution are directly
relevant for the practical goals of
functional genomics. I will review some
notable case stories from the early
days of database searching and our growing
understanding of the universe of protein
families.
Molecular biology today
is being transformed by an explosive
growth of
data
emerging from laboratories worldwide.
The challenge is to transform data
into knowledge, knowledge that will
lead to a better understanding of
the
biological processes underlying both
health and disease. The mission of
the
National Center for Biotechnology Information
(NCBI) is to develop
new computational techniques for molecular
biology data and to conduct
basic research in the analysis of genes
and genomic data. NCBI also
serves as a national resource for the
dissemination of data and analytic
tools to the research and medical communities.
|