NIH Research Festival
When a gene evolve to take on an important physiological role, purifying selection will often maintain that function through evolutionary time. As a result, orthology (i.e., homology via speciation) has become a well-accepted predictor of shared gene product function among species, and considerable effort has been made to develop computational methods to identify groups of orthologs and highly similar paralogs (i.e., 'orthogroups'). Graph-based methods which rely on pairwise sequence similarity metrics (e.g., Markov clustering) are excellent for rapidly processing genomic-scale datasets into orthogroups, but they generally suffer from a lack of precision. This can be compensated for if the phylogenetic relationships among all test species are well established and/or there is high quality genomic data available, but this is rarely the case when working with non-model organisms. In the current study, we present a new method called Recursive Dynamic Markov Clustering (RD-MCL), that increases the resolving power of de novo MCL-based orthogroup assignment using a number of novel enhancements; these include refinement of the pairwise similarity metrics, dynamic selection of MCL parameters, recursive subdivision of orthogroups, and testing putative orthogroups for best-hit cliques to maximize resolution. RD-MCL is freely available as an open source project on GitHub (http://tiny.cc/rd-mcl).
Scientific Focus Area: Computational Biology
This page was last updated on Friday, March 26, 2021