Skip to main content

A novel computational deconvolution method to dissect heterogeneity using bulk RNA-seq data

Friday, September 14, 2018 — Poster Session V

12:00 p.m. – 1:30 p.m.
FAES Terrace


  • K Kang
  • Q Meng
  • I Shats
  • DM Umbach
  • L Li


Background: The cell-type composition of biological tissues varies widely across samples. Such heterogeneity hampers efforts to study tissue microenvironment. Current approaches that address heterogeneity have drawbacks. Experimental techniques, such as fluorescence-activated cell sorting, are expensive. Computational methods, though flexible and promising, often estimate either sample-specific proportions (SSP) for each cell type or cell-type-specific gene expression profiles (csGEP), not both, and require the other as input. Goal: We developed a deconvolution method to estimate both SSP and csGEP simultaneously using bulk RNA-seq data only. Method: Our Bayesian approach elaborates on the Latent Dirichlet Allocation model by incorporating features needed for modeling RNA-seq data from a mixture of cell types: dependence of gene expression on gene length and of RNA amount on cell size. We benchmarked our method in two ways using constructed mixtures with known SSP: 1) 40 in silico mixtures of six pure cell lines generated using RNA-seq data downloaded from the UCSC genome browser; 2) 32 experimental mixtures of mRNA isolated from four human cell lines. We evaluated the performance of our method in comparison to CIBERSORT and csSAM using the known SSP and measured csGEP. Results: Our method outperformed competitors in both studies: with, respectively, 77% and 17% lower root-mean-square error (RMSE) on SSP than CIBERSORT (only estimates SSP) and 64% and 16% lower RMSE on csGEP than csSAM (only estimates csGEP). Conclusion: Our method holds promise for computationally deciphering sample heterogeneity using RNA-seq data measured in bulk tissue.

Category: Computational Biology