This is the gedit langauge definition xml file for FASTA sequence format. This is modified (almost completely remade) from another FASTA definition file by Thomas Krahn (2010). Files should be placed in the respective folders in /usr/share/gtksourceview-3.0/, styles/ and language-specs/.
Using standard mapping definitions rather than defining colors used by alignment viewers
Removing colors for individual bases to permit using both protein and nucleotide sequences indifferently
Defining colors for protein stops, gaps, and irregular characters
Use with multiple different FASTA extensions, .fa, .fasta, .faa
X is defined as the only universally unused letter, as N can be asparagine in protein sequences. Other symbols and numbers that occur in sequences are also highlighted.
In the future one could implement flagging letters that are neither amino acids or ambiguous bases. Non-amino acid letters are BZJOUX, sort of like the French word for jewel. BZJ are used to double up for certain pairs while OU are non-canonical amino acids pyrrolysine selenocysteine. This leaves only X, so the utility is questionable. For nucleotides, the letters are EFIJLOPQYX, of which EFIJLPQY are amino acids, leaving O and X.