Skip to main content

Quality Control of Variants identified in Exome Sequencing Data in a Study of oral Clefts

Tuesday, October 09, 2012 — Poster Session I

1:00 p.m. – 3:00 p.m

Natcher Conference Center, Building 45




  • S. Szymczak
  • H. Ling
  • M. M. Parker
  • Q. Li
  • C. D. Cropp
  • T. H. Beaty
  • A. F. Scott
  • J. E. Bailey-Wilson


Next generation DNA sequencing technologies are a promising tool to identify rare genetic variants controlling susceptibility of complex diseases. However, sequencing all exons of many individuals usually identifies thousands of rare variants. Very rare variants might actually be sequencing artifacts. Quality control to detect bad genotypes and variants of low quality is therefore mandatory. However, research is still needed to determine the best quality control algorithms and to establish cut-offs to control error without excluding more data than necessary. Whole exomes of 121 affected individuals from families with non-syndromic cleft lip with or without cleft palate were sequenced using Illumina HiSeq2000 sequencers. Sample duplicates and two HapMap controls were included. All samples were genotyped using Illumina OmniExpress BeadArrays. We present a quality control pipeline for single nucleotide variants called with SAMtools for each sample separately. Bad genotypes were set to missing based on measurements including genotype quality, read depth and strand bias. Variants were removed if they had a high number of non-reference discrepancy rates based on comparisons of duplicates within sequence data, or when comparing sequencing and array data. We compared different filtering thresholds to demonstrate how sample level statistics improve after filtering out potentially false-positive genotypes and variants.

back to top