NIH Research Festival
High-throughput sequencing of PCR-amplified microbial "marker genes" (most commonly the prokaryotic 16S rRNA gene) is a powerful tool for studying the composition of microbial communities, which has provided insights into the key role of the microbiome in health and disease. However, errors generated during sample preparation and sequencing can inflate estimates of community richness, obscuring true microbial diversity. Though there are several algorithms for removing noise from data generated on the Roche 454 platform, there are no current tools for denoising marker gene data from other platforms, including the frequently used Illumina MiSeq and HiSeq platforms as well as emerging technologies such as PacBio SMRT. QuAAD (Quality-Aware Amplicon Denoiser) addresses this limitation by rapidly and accurately denoising marker gene data from a variety of high-throughput sequencing platforms. Unlike previous approaches, QuAAD takes advantage of quality score information provided by the base-calling algorithm. These scores, which encode estimated sequencing error rates, inform QuAAD's model of the random error-generating process. Using this model, QuAAD examines each sequence, matching it to possible originating sequences and determining if it is likely to be an error. Applying QuAAD to simulated data as well as real data from communities of known composition produced by the Illumina MiSeq, IonTorrent PGM, and PacBio SMRT technologies, we show that QuAAD effectively identifies and corrects sequencing errors in marker gene data from a variety of sequencing platforms.
Scientific Focus Area: Computational Biology
This page was last updated on Friday, March 26, 2021