NIH Research Festival
–
–
FAES Terrace
NLM
COMPBIO-23
A sequence similarity search compares a query to a database of sequences in order to establish a relationship between the query and some of the database sequences. In many cases, the search can help identify the function of a query. Basic Local Alignment Search Tool (BLAST) is a popular program used to perform similarity searches. Here, we discuss a new implementation of the BLASTP program that compares a protein query to a protein database. BLASTP implements a number of heuristics that make it very fast, but the popular NCBI nr database doubles in size roughly every two years. Additionally, many of the top matches found with a BLASTP search of nr are very strong, which should make them easy to identify. At the same time, high-throughput sequencing of genomes has resulted in many predicted coding regions that need to be confirmed and/or annotated by alignment to very similar sequences. QuickBLASTP adds an initial step to identify database sequences ("candidates") that might be similar to the query. The new step makes use of an alignment free method that counts the number of 5-mers in the query sequence and quickly checks the database for sequences with a similar profile. QuickBLASTP then performs a standard BLASTP search against the candidate database sequences. We compare QuickBLASTP to BLASTP and show that QuickBLASTP performs well in finding the strongest BLASTP matches in a fraction of the time.
Scientific Focus Area: Computational Biology
This page was last updated on Friday, March 26, 2021