A configurable pipeline for RNA-seq data analysis

Tuesday, October 09, 2012 — Poster Session I

1:00 p.m. – 3:00 p.m

Natcher Conference Center, Building 45




  • A.J. Oler
  • V. Gopalan
  • M. Narayanan
  • D.E. Hurt
  • Y. Huyen


With the rapid progression of high-throughput sequencing technology, there is an increasing need to automate repetitive computational tasks such as short read alignment. Combining many different tools into a unified framework to convert reads from the sequencing machine into differentially expressed genes is a challenge. We have developed, in collaboration with the Systems Genomics and Bioinformatics Unit (LSB/NIAID), a GNU Make-based pipeline framework to process one full Illumina HiSeq run (or equivalent), including 1) conversion of base-call files to FASTQ with CASAVA, 2) quality control of FASTQ files, 3) alignment of standards and RNA-seq samples with Bowtie and TopHat, respectively, and 4) assembly of transcripts with Cufflinks. The pipeline takes either base-call or FASTQ files and processes each sample as indicated in a user-defined sample sheet. All jobs run in a parallel fashion, either on a local computer or a Sun Grid Engine cluster. Dynamic graphical output allows visualization of the pipeline progression and navigation through the pipeline output. Defined commands allow the user to start and stop at particular steps of the pipeline, or to rerun the analysis for a subset of samples. Pipelines can be customized by integrating additional command-line applications to suit a variety of needs.

