Skip to main content
U.S. flag

An official website of the United States government

Here’s how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

This is a preview of updates coming to the Technical Bulletin's website in early December 2025. Return to current site.
Read more about the modernization release schedule in this announcement.
Comment via the yellow feedback button in the lower right hand corner of the page. Contact the NLM Help Desk with any questions or concerns.

This is archived content.

Links may have become inactive over time. Visit Archive-It   to find the original published layout.

GenBank Expanded Accession Formats Coming December 2018

GenBank Expanded Accession Formats Coming December 2018. NLM Tech Bull. 2018 Sep-Oct;(424):b9.

September 26, 2018 [posted]

[Editor's Note: This is a reprint of an announcement from the National Center for Biotechnology Information (NCBI). To automatically receive the latest news and announcements regarding major changes and updates to NCBI resources and tools please see the subscribe page.]

In December 2018, GenBank and other International Nucleotide Sequence Database Collaboration (INSDC) members will expand the accession formats used for sequencing projects. Nearly all possible accession numbers using the current, shorter formats have been assigned. Using these longer formats will allow expanded accession ranges and provide greater capacity.

The expanded format for Whole Genome Shotgun (WGS), Transcriptome Shotgun Assembly (TSA), and Targeted Locus Study (TLS) sequencing projects will use a six-letter Project Code prefix and a two-digit Assembly-Version number followed by 7, 8, or 9 digits (for example, AAAAAA020000001).

Non-WGS/TLS/TSA nucleotide sequences currently use a "2+6" format, two-letter prefix followed by six digits. This format will be expanded to eight digits.

Protein sequences currently use a "3+5" accession format. By the end of 2018, this format will use seven digits.

Please adjust any processing methods to accommodate these new identifier formats. If you have questions about the new formats, write to the NCBI help desk.