NIH Research Festival
We collected clinical information from a large dataset of patients with suspected or confirmed telomeropathies: 25 of them had TERC mutations, 67 had TERT mutations, and 46 had very short telomeres but unidentified mutations. On each patient, we defined a set of about 40 categorical or numerical predictors based on available clinical data. We implemented a machine learning approach based on decision trees and random forests in order to optimally classify patients among different disease subtypes. Using only 3 predictors, we were able to classify TERC vs Unidentified mutation patients with 83% balanced accuracy. The optimal classification for TERC vs TERT involved 7 predictors and yielded 77% accuracy, whereas that of TERT vs Unidentified mutation required 4 predictors and led to 69% accuracy. Beyond its relevance in the specific context of telomeropathies, this study illustrates the synergistic potential from applying sophisticated data mining and machine learning algorithms into the clinical realm.
Scientific Focus Area: Computational Biology
This page was last updated on Friday, March 26, 2021