FASTA Format

FASTA is a text-based format for storing data about nucleotide or amino acid sequences. Each nucleotide or amino acid is represented by a single ASCII letter.

Sequences can be stored in a text file in FASTA format in the following way. A line beginning with > indicates that the sequence will start on the following line. (The line with the > may contain a name or unique identifier for the sequence.) The sequences themselves consist of single-letter codes, many of which are familiar. For example, A indicates the presence of adeninein the sequence.

Here is an example sequence in FASTA format:

>gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase
MNSERSDVTLYQPFLDYAIAYMRSRLDLEPYPIPTGFESNSAVVGKGKNQEEVVTTSYAFQTAKLRQIRA
AHVQGGNSLQVLNFVIFPHLNYDLPFFGADLVTLPGGHLIALDMQPLFRDDSAYQAKYTEPILPIFHAHQ
QHLSWGGDFPEEAQPFFSPAFLWTRPQETAVVETQVFAAFKDYLKAYLDFVEQAEAVTDSQNLVAIKQAQ
LRYLRYRAEKDPARGMFKRFYGAEWTEEYIHGFLFDLERKLTVVK

This post is part of a series. The most recent post in the series is “Nanopore Sequencing”. Learn when new posts appear by subscribing (RSS).