NIH Research Festival
FARE Award Winner
Inferred orthology (i.e., homology via speciation events) between or among genes is commonly used as a predictor of gene product function. Orthology is also a crucial consideration when classifying genes coherently and consistently across taxa, but the granularity of many popular ortholog prediction tools can be too coarse to properly resolve multiple clusters of closely related sequences in large gene families. Thus, classification is often at the discretion of curators following manual inspection of gene trees. In this work, we present a new effort to automate the classification of orthogroups from predefined sets of homologous sequences. In contrast to common ortholog prediction methods, AlignMe scores have replaced BLASTP E-values as the similarity metric between pairs of sequences. This provides a more refined input for Markov clustering (MCL), which is a popular method for grouping genes into orthogroups via weighted random walks through an all-by-all similarity graph. An issue with MCL, however, is its sensitivity to user-defined parameters. It is difficult to know a priori which parameters to apply and, if different groups of genes have undergone varying degrees of evolution, then it may not be possible to select appropriate parameters for the entire dataset. To overcome this, we have devised an MCL scoring method and use Metropolis-coupled Markov chain Monte Carlo (MCMCMC) to automate parameter selection. Furthermore, recursive analysis of clusters by subsequent rounds of MCMCMC-MCL accounts for varying evolutionary rates. This new method has been named Recursive Dynamic Markov Clustering (RD-MCL), and it shows improved performance over established methods.
Scientific Focus Area: Computational Biology
This page was last updated on Friday, March 26, 2021