Skip to content
fasta language definition file for gedit and other related configurations
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


This is the gedit langauge definition xml file for FASTA sequence format. This is modified (almost completely remade) from another FASTA definition file by Thomas Krahn (2010). Files should be placed in the respective folders in /usr/share/gtksourceview-3.0/, styles/ and language-specs/.

Modifications include

  1. Using standard mapping definitions rather than defining colors used by alignment viewers

  2. Removing colors for individual bases to permit using both protein and nucleotide sequences indifferently

  3. Defining colors for protein stops, gaps, and irregular characters

  4. Use with multiple different FASTA extensions, .fa, .fasta, .faa

  5. X is defined as the only universally unused letter, as N can be asparagine in protein sequences. Other symbols and numbers that occur in sequences are also highlighted.

In the future one could implement flagging letters that are neither amino acids or ambiguous bases. Non-amino acid letters are BZJOUX, sort of like the French word for jewel. BZJ are used to double up for certain pairs while OU are non-canonical amino acids pyrrolysine selenocysteine. This leaves only X, so the utility is questionable. For nucleotides, the letters are EFIJLOPQYX, of which EFIJLPQY are amino acids, leaving O and X.

You can’t perform that action at this time.