NIH Research Festival
Since 2009, the NIH Undiagnosed Diseases Program has routinely carried out genome scale genotyping and exome, sequencing for patients and their nuclear families, and to date, has sequenced and analyzed over 400 families. However, for more than 100 families this data was not available for one or both parents, rendering analysis exponentially difficult. While these alternative families make up 25% of the UDP cases, they account for less than 5% of the diagnosed cases. It can therefore be surmised that missing parents result in reduced variant filtration power. The resulting data contains a greater number of false positives, which constrains the practical curation and subsequent diagnostic rate. Using genotype data, a minimum of 3 siblings, and the remaining parent, we can partially reconstruct the sequence of the missing parent. This approach relies on identification of regions within siblings that can only have been co-inherited from the missing parent. With each extra sibling, the portion of the missing parent that can be reconstructed increases asymptotically. The strategy uses two sets of discrete mathematical expressions to define informative loci and designate region of the genome as attributable to the missing parent, excludable from the missing parent, or inconclusive. A BED file defining these regions can then be incorporated into subsequent analysis steps. The resultant analysis allows for more complete phasing of compound heterozygous variant pairs and overall reduction of noise during filtration. This technique has the potential to salvage previously difficult cases by striving toward a maximum Mendelian filtration potential.
Scientific Focus Area: Computational Biology
This page was last updated on Friday, March 26, 2021