FASTA format description


A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended, although not necessary, that all lines of text be shorter than 80 characters in length.
An example of sequences in FASTA format is:

>gb|AE004091|AE004091:483-2027, PA0001
GTGTCCGTGGAACTTTGGCAGCAGTGCGTGGATCTTCTCCGCGATGAGCTGCCGTCCCAACAATTCAACA
CCTGGATCCGTCCCTTGCAGGTCGAAGCCGAAGGCGACGAATTGCGTGTGTATGCACCCAACCGTTTCGT
CCTCGATTGGGTGAACGAGAAATACCTCGGTCGGCTTCTGGAACTGCTCGGTGAACGCGGCGAGGGTCAG
TTGCCCGCGCTTTCCTTATTAATAGGCAGCAAGCGTAGCCGTACGCCGCGCGCCGCCATCGTCCCATCGC
AGACCCACGTGGCTCCCCCGCCTCCGGTTGCTCCGCCGCCGGCGCCAGTGCAGCCGGTATCGGCCGCGCC
CGTGGTAGTGCCACGTGAAGAGCTGCCGCCAGTGACGACGGCTCCCAGCGTGTCGAGCGATCCCTACGAG
CCGGAAGAACCCAGCATCGATCCGCTGGCCGCCGCCATGCCGGCTGGAGCAGCGCCTGCGGTGCGCACCG
AGCGCAACGTCCAGGTCGAAGGTGCGCTGAAGCACACCAGCTATCTCAACCGTACCTTCACCTTCGAGAA
CTTCGTCGAGGGCAAGTCCAACCAGTTGGCCCGCGCCGCCGCCTGGCAGGTGGCGGACAACCTCAAGCAC
GGCTACAACCCGCTGTTCCTCTACGGTGGCGTCGGTCTGGGCAAGACCCACCTGATGCATGCGGTGGGCA
ACCACCTGCTGAAGAAGAACCCGAACGCCAAGGTGGTCTACCTGCATTCGGAACGTTTCGTCGCGGACAT
GGTGAAGGCCTTGCAGCTCAACGCCATCAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCACTGTTG
ATCGACGACATCCAGTTCTTCGCCCGTAAGGAGCGCTCCCAGGAGGAGTTCTTCCACACCTTCAATGCCC
TTCTCGAAGGCGGCCAGCAGGTGATCCTCACCAGCGACCGCTATCCGAAGGAAATCGAAGGCCTGGAAGA
GCGGCTGAAATCCCGCTTCGGCTGGGGCCTGACGGTGGCCGTCGAGCCGCCGGAACTGGAAACCCGGGTG
GCGATCCTGATGAAGAAGGCCGAGCAGGCGAAGATCGAGCTGCCGCACGATGCGGCCTTCTTCATCGCCC
AGCGCATCCGTTCCAACGTGCGTGAACTGGAAGGTGCGCTGAAGCGGGTGATCGCCCACTCGCACTTCAT
GGGCCGGCCGATCACCATCGAGCTGATTCGCGAGTCGCTGAAGGACCTGTTGGCCCTTCAGGACAAGCTG
GTCAGCATCGACAACATCCAGCGCACCGTCGCCGAGTACTACAAGATCAAGATATCCGATCTGTTGTCCA
AGCGGCGTTCGCGCTCGGTGGCGCGCCCGCGCCAGGTGGCCATGGCGCTCTCCAAGGAGCTGACCAACCA
CAGCCTGCCGGAGATCGGCGTGGCCTTCGGCGGTCGGGATCACACCACGGTGTTGCACGCCTGTCGTAAG
ATCGCTCAACTTAGGGAATCCGACGCGGATATCCGCGAGGACTACAAGAACCTGCTGCGTACCCTGACAA
CCTGA
>gb|AE004091|AE004091:2056-3159, PA0002
ATGCATTTCACCATTCAACGCGAAGCCCTGTTGAAACCGCTGCAACTGGTCGCCGGCGTCGTGGAACGCC
GCCAGACATTGCCGGTTCTCTCCAACGTCCTGCTGGTGGTCGAAGGCCAGCAACTGTCGCTGACCGGCAC
CGACCTCGAAGTCGAGCTGGTTGGTCGCGTGGTACTGGAAGATGCCGCCGAACCCGGCGAGATCACCGTA
CCGGCGCGCAAGCTGATGGACATCTGCAAGAGCCTGCCGAACGACGTGCTGATCGACATCCGTGTCGAAG
AGCAGAAACTTCTGGTGAAGGCCGGGCGTAGCCGCTTCACCCTGTCCACCCTGCCGGCCAACGATTTCCC
CACCGTAGAGGAAGGTCCCGGCTCGCTGAACTTCAGCATTGCCCAGAGCAAGCTGCGTCGCCTGATCGAC
CGCACCAGCTTCGCCATGGCCCAGCAGGACGTGCGTTACTACCTCAACGGCATGCTGCTGGAAGTGAACG
GCGGCACCCTGCGCTCCGTCGCCACCGACGGCCACCGACTGGCCATGTGCTCGCTGGATGCGCAGATCCC
GTCGCAGGACCGCCACCAGGTGATCGTGCCGCGCAAAGGCATCCTCGAACTGGCTCGTCTGCTCACCGAG
CAGGACGGCGAAGTCGGCATCGTCCTGGGCCAGCACCATATCCGTGCCACCACTGGCGAATTCACCTTCA
CTTCGAAGCTGGTGGACGGCAAGTTCCCGGACTACGAGCGTGTACTGCCGCGCGGTGGCGACAAGCTGGT
GGTCGGTGACCGCCAGCAACTGCGCGAAGCCTTCAGCCGTACCGCGATCCTCTCCAACGAGAAGTACCGC
GGCATTCGCCTGCAGCTTTCCAACGGTTTGCTGAAAATCCAGGCGAACAACCCGGAGCAGGAAGAGGCCG
AGGAAGAAGTGCAGGTCGAGTACAACGGCGGCAACCTGGAGATAGGCTTCAACGTCAGTTACCTGCTCGA
CGTGCTGGGTGTGATCGGTACCGAGCAGGTCCGCTTCATCCTTTCCGATTCCAACAGCAGCGCCCTGGTC
CACGAGGCCGACAATGACGATTCTGCCTATGTCGTCATGCCGATGCGCCTCTAA
>gb|AE004091|AE004091:3169-4278, PA0003
ATGTCCCTGACCCGCGTTTCGGTCACCGCGGTGCGCAACCTGCACCCGGTGACCCTCTCCCCCTCCCCCC
GCATCAACATCCTCTACGGCGACAACGGCAGCGGCAAGACCAGCGTGCTCGAAGCCATCCACCTGCTGGG
CCTGGCGCGTTCATTCCGCAGTGCGCGCTTGCAGCCGGTGATCCAGTATGAGGAAGCGGCCTGCACCGTA
TTCGGCCAGGTGATGTTGGCCAACGGCATCGCCAGCAACCTGGGGATTTCCCGTGAGCGCCAGGGCGAGT
TCACCATCCGCATCGATGGGCAGAACGCCCGGAGTGCGGCTCAATTGGCGGAAACTCTCCCACTGCAACT
GATCAACCCGGACAGCTTTCGGTTGCTCGAGGGAGCGCCGAAGATCCGGCGACAGTTCCTCGATTGGGGA
GTGTTCCACGTGGAACCTCGGTTTCTGCCCGTCTGGCAGCGCCTGCAGAAGGCGCTGCGCCAGCGGAACT
CCTGGCTCCGGCATGGTAAACTGGACCCCGCGTCGCAAGCGGCCTGGGACCGGGAATTGAGCCTGGCCAG
CGATGAGATCGATGCCTACCGCAGAAGCTATATCCAGGCGTTGAAACCGGTATTCGAGGAAACACTCGCC
GAATTGGTTTCACTGGATGACCTGACCCTTAGCTACTACCGAGGCTGGGACAAGGACCGGGACCTCCTGG
AGGTTCTGGCTTCCAGCCTGTTGCGCGACCAGCAGATGGGCCACACCCAGGCGGGACCGCAGCGTGCGGA
TCTTCGCATACGGTTGGCAGGTCATAACGCCGCGGAGATTCTCTCGCGCGGTCAGCAGAAGCTGGTGGTA
TGCGCCCTGCGCATCGCCCAAGGCCATCTGATCAATCGCGCCAAGCGCGGACAGTGCGTCTACCTGGTGG
ACGACCTGCCCTCGGAACTGGATGAGCAGCATCGAATGGCTCTTTGCCGCTTGCTTGAAGATTTGGGTTG
CCAGGTATTCATCACCTGCGTGGACCCGCAACTATTGAAAGACGGCTGGCGCACGGATACGCCGGTATCC
ATGTTCCACGTGGAACATGGAAAAGTCTCTCAGACCACGACCATCGGGAGTGAAGCATGA

Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes. Lower-case letters are accepted and are mapped into upper-case.
The nucleic acid codes supported are:
        A --> adenosine
        C --> cytidine
        G --> guanine
        T --> thymidine