Skip to main content

Clone discovery algorithm for high-throughput sequencing data

Friday, November 08, 2013 — Poster Session III

10:00 a.m. – 12:00 p.m.

FAES Academic Center (Upper-Level Terrace)




  • K Karagiannis
  • V Simonyan
  • K Chumakov


Although high-throughput sequencing has many applications, alignment of the data is the major step of most related pipelines. Despite the large number of genome data banks, reference sequences are not always the best match for high-throughput analysis. Specifically, in virus-related projects where increased mutation rates are observed, the use of a reference genome can complicate the interpretation of the results. Current techniques attempt to circumvent this obstacle by applying de novo assembly algorithms, but due to the nature of the problem and the size of the input, the process is slow and often without viable results. Here, we introduce a new technique to decrease the dependency of these pipelines on reference sequences without tackling the issue of de novo assembly. Our clone discovery algorithm takes advantage of existing alignment data and correlates distant mutations, reported by an alignment algorithm. The algorithm makes no statistical assumption; it extracts information from the overlapping alignments in a step-wise fashion, using reference coordinates of aligned reads. Due to the randomness of the noise, the algorithm can be very sensitive and identify clones of low coverage. This technique is useful for discovering clones from biological samples and using them to refine existing alignment results.

back to top