NIH Research Festival
Single-molecule sequencing is now routinely used assemble complete, high-quality microbial genomes, but these assembly methods have not scaled well to large genomes. To address this problem, we previously introduced the MinHash Alignment Process for assembling single-molecule, noisy reads. This has enabled reference-grade assemblies of model organisms, revealed novel heterochromatic sequences, and filled low-complexity gaps. We have built on this work, creating a new assembler named Canu, optimized for single-molecule sequencing. Canu represents a complete refactorization of the Celera Assembler, shrinking the code base by over 70%, lowering coverage requirements, enhancing reconstruction of diploid genomes, and more efficiently assembling repetitive sequences. Canu has generated single-chromosome assemblies of E. coli, B. anthracis, B. cereus, and Y. pestis with over 99% identity using Oxford Nanopore sequencing. Using older chemistry, the eukaryote S. cerevisiae assembly exceeds 500Kb NG50 and we predict chromosome-scale assembly with recent chemistries. Canu is available at https://github.com/marbl/canu.
Scientific Focus Area: Computational Biology
This page was last updated on Friday, March 26, 2021