HIVE-SEQ: Sequence Processing/Manipulation Tool of FastA/FastQ Files

Friday, November 08, 2013 — Poster Session IV
2:00 p.m. – 4:00 p.m.	FAES Academic Center (Upper-Level Terrace)	FDA/CBER	COMPBIO-14

Authors

LV Santana-Quintero
H Dingerdissen
K Karagiannis
V Simonyan

Abstract

In bioinformatics, it is very common to use a FastA or FastQ files to represent nucleotide sequences in a text-based format that is easy to understand and parse using almost any programming language. We have developed a new file processing tool to manipulate sequence files according to user preferences. HIVE-SEQ is composed of different tools like: filters for quality/complexity/primers, trimmers, cutters and contig assemblers. Also contained is a new tool to generate short random reads based on a genome sequence, in which the user can specify the type and amount of noise added to the newly generated sequences. This allows the user to generate a file with short reads derived from the original genome in order to measure the quality of different bioinformatics tools like: sequence alignment, genomic assembly or single nucleotide polymorphisms (SNPs) discovery. All these tools are fully integrated in the high-performance integrated virtual environment (HIVE), and therefore take advantage of other existant HIVE components like nonredundification of sequence files, sequence data compression, faster access to files and qualities that significantly improve the performance of the toolkit and efficiently deliver highly accurate results.