Skip to main content

Genome data at NCBI–easier access, more formats, improved presentation

Wednesday, September 16, 2015 — Poster Session I

3:30 p.m. – 5:00 p.m.
FAES Terrace


  • P Kitts
  • M DiCuccio
  • A Kimchi
  • T Murphy
  • K Pruitt
  • T Tatusova


The National Center for Biotechnology Information (NCBI) databases contain data for over 30,000 genome assemblies. NCBI has recently made several improvements that: make it easier for users to find and quickly access genome data of interest; provide more convenient data formats; and enrich the data presented in web pages and reports. We have added new panels to the NCBI Genome Resource ( for high profile organisms, such as human and Salmonella enterica, that provide quick access to links that allow users to easily execute common actions: download sequences in FASTA format for genome, transcript, or protein; download genome annotation in GFF, GenBank or tabular format; BLAST against genome, transcript, or protein sequences. We have also redesigned the NCBI genomes FTP site ( to expand content and facilitate data access through an predictable directory hierarchy that has consistent file names and formats. The updated FTP site provides greater support for downloading assembled genome sequences and/or corresponding annotation data. We now provide GFF format consistently for all genome assemblies that are annotated. We also instituted the use of accession.version as the primary sequence identifier for both GFF and FASTA files. Having the same identifier in both the FASTA and GFF files supports the use of these files in common RNA-Seq analysis packages and in other analysis pipelines that rely on simple string comparison to match sequence identifiers. Finally, we have enhanced the search functionality of the NCBI Assembly Resource ( so as to make it easier to find genome assemblies of interest.

Category: Computational Biology