NIH Research Festival
Despite the large number of genome data banks, reference sequences are not always the best match for high-throughput analysis. This is especially true in studies of high heterogeneity such as in RNA viruses where increased mutation rates are observed; the use of a reference genome can complicate the interpretation of the results. Current techniques attempt to circumvent this obstacle by applying de novo assembly algorithms, but the process is slow and often without viable results. Here, we introduce a new technique to decrease the dependency of these pipelines on reference sequences. Our quasispecies spectrum reconstruction algorithm takes advantage of existing alignment data and correlates distant mutations, reported by an alignment algorithm. The algorithm makes no statistical assumption; it extracts information from the overlapping alignments in a step-wise fashion, performing a multiple reference assisted assembly. Due to the randomness of noise, the algorithm can be very sensitive and identify haplotypes of low frequency.
Scientific Focus Area: Computational Biology
This page was last updated on Friday, March 26, 2021