Differentiating between sequence and methylation variability in genome methylation data
Friday, September 15, 2017 — Poster Session IV
- BA LaBarre
- VM Hayes
- LL Elnitski
DNA methylation levels are associated with subtle differences in nutrition, environmental exposures and gestation, providing a living history of factors associated with health and longevity. Measurements are confounded by SNPs at CpG sites, the standard sites of methylation, leading researchers to exclude data where SNPs and methylation coincide - limiting false interpretation, but ignoring thousands of data points. Here we show specific features allow identification of SNPs, including novel variants. This is useful in under-characterized populations, like the Kalahari-based San, for whom genotypes are scarce. Detecting DNA methylation requires bisulfite conversion of unmethylated cytosines into thymines. This creates ambiguity in native allele identity. Nonetheless, we find that SNP data has a distinct pattern of three tiers: full, no, or hemizygous methylation, while methylation values at non-SNPs show less variation. We compute putative SNP positions based on tiered methylation patterns, allowing known SNPS to be removed and novel sites prioritized for verification. Since our initial concept, many predicted SNPs have appeared in successive versions of dbSNP, validating our expectations. Identified sites fall in regions not covered by whole exome or SNP chips. Although thousands of methylation probe positions are routinely discarded (e.g., one group recommends excluding 190,672 sites from the Illumina 450K array, due mainly to SNPs), our method provides an orthogonal utility to identify SNPs absent from genotyping arrays that create allele-specific methylation. This method can also be used to characterize sites that are functionally important for regulation, and which may act differently regarding variable methylation or variable sequence.
Category: Computational Biology