Skip to main content

Classification of telomeropathies from clinical data: A machine learning approach

Wednesday, September 16, 2015 — Poster Session I

3:30 p.m. – 5:00 p.m.
FAES Terrace


  • J Candia
  • H Leuva
  • P Scheinberg
  • D Townsley
  • NS Young


We collected clinical information from a large dataset of patients with suspected or confirmed telomeropathies: 25 of them had TERC mutations, 67 had TERT mutations, and 46 had very short telomeres but unidentified mutations. On each patient, we defined a set of about 40 categorical or numerical predictors based on available clinical data. We implemented a machine learning approach based on decision trees and random forests in order to optimally classify patients among different disease subtypes. Using only 3 predictors, we were able to classify TERC vs Unidentified mutation patients with 83% balanced accuracy. The optimal classification for TERC vs TERT involved 7 predictors and yielded 77% accuracy, whereas that of TERT vs Unidentified mutation required 4 predictors and led to 69% accuracy. Beyond its relevance in the specific context of telomeropathies, this study illustrates the synergistic potential from applying sophisticated data mining and machine learning algorithms into the clinical realm.

Category: Computational Biology