NIH Research Festival
Personalized medicine aims at using clinical and genomic data to enable informed medical decision making for disease treatment and prevention. This aim resonates with system biology that advocates a holistic approach for studying complex biological systems by integrating disparate data sets into predictive models. In this study, we took a systems approach and used decision trees for predictive modeling of an atherosclerosis marker, namely coronary artery calcification (CAC), utilizing multi-omic data from the ClinSeq® study. Using data from 16 control subjects and 16 cases with advanced coronary calcium levels, we first built classification models employing 43 clinical variables and the genotypes of 57 SNPs compiled from previous GWAS on CAC. The predictive accuracy was quantified by generating receiver operating characteristics (ROC) curves and computing the area under each curve (AUC). Models built using a combination of clinical and genotype data reached an average AUC of 0.84, whereas only clinical or genotype data based models reached AUC values of 0.75-0.79 demonstrating the predictive improvement using combined data sets. Next, we identified 56 additional SNPs that led to improved model performance. Finally, we utilized the RNA-Seq expression levels of a panel of gene markers and achieved an AUC of 0.95. By systematically perturbing our decision tree based models, we characterized the gene expression space and identified the most predictive genes and key biological processes leading to advanced coronary calcium levels. The biological insights generated by our analysis illustrate the potential of using decision tree based models as informative multi-omic diagnostic tools.
Scientific Focus Area: Systems Biology
This page was last updated on Friday, March 26, 2021