Skip to content
tajmone edited this page Oct 7, 2017 · 9 revisions
highlight v3.39 | Lua 5.3

Basic Lua Notions for Highlight Scripting

Some brief reminders about Lua syntax rules to be observed in scripting Highlight language definitions, themes and plugins.


Table of Contents


Identifiers

In Lua identifiers (ie: names for variables, functions, etc.) are case sensitive; they can contain letters, digits, and underscores, but they can’t begin with a digit or be a reserved word (eg: for, and).

By convention, in your Lua code you should avoid naming identifiers with a beginning underscore followed by one or more uppercase letters (eg: _VERSION). In Lua, this naming convention is reserved for internal variables of the Lua language.

Also, avoid naming your identifiers all in uppercase in your Highlight language definitions, themes and plugins since this naming convention is used to represent internal states of the parser.

Strings

There are two types of strings in Lua:

  1. Quoted literal strings
  2. Bracketed literal strings

Both types are useful in writing Highlight lang definitions, and you should understand the differences between them to choose the right type of string to use in each context.

Often Highlight language definitions fail to work as expected due to improper use of Lua literal strings when defining regular expressions.

Quoted Literal Strings

Quoted literal strings (aka short literal strings) are delimited by a pair of matching single (') or double (") quotes, and can contain the following escape sequences:

  • \a — Bell
  • \b — Backspace
  • \f — Form feed
  • \n — Newline
  • \r — Carriage return
  • \t — Horizontal tab
  • \v — Vertical tab
  • \\ — Backslash
  • \" — Double quote
  • \' — Single quote
  • \z — Skip following white-space characters (see below)

Furthermore:

A backslash followed by a line break results in a newline in the string.

The escape sequence \z skips the following span of white-space characters, including line breaks; it is particularly useful to break and indent a long literal string into multiple lines without adding the newlines and spaces into the string contents.

A short literal string cannot contain unescaped line breaks nor escapes not forming a valid escape sequence.

Bracketed Literal Strings

Bracketed literal string are delimited by long brackets — ie: enclosed in a pair of double square brackets, each pair containing an equal number (zero or more) of equal signs:

  • [[ … ]]
  • [=[ … ]=]
  • [====[ … ]====]

The number of equal signs in a long bracket pair defines its “level”:

… an opening long bracket of level 0 is written as [[, an opening long bracket of level 1 is written as [=[, and so on. A closing long bracket is defined similarly.

This type of literal string has a long format — ie: it can span across multiple lines. Any kind of end-of-line sequences (CR, LF, CRLF, LFCR) are converted to a simple newline. The only exception being a string starting by an EOL char, which is stripped away:

For convenience, when the opening long bracket is immediately followed by a newline, the newline is not included in the string.

Escape sequences are not interpreted in this type string (they are treated as raw strings).

Which Type of String Should I Use?

In the context of Highlight language definitions, you will notice that most existing language files employ bracketed strings for regular espression definition, and quoted strings for simple text (like Description, or Keywords List entries).

The reason bracketed strings are better for regular espressions strings is because this type of string doesn’t interpret escape sequences, which are used by regular expressions and are an integral part of their definition. Also, some characters need to be escaped by a \ inside regular expressions (eg: \(.*?\)), which in quoted strings would be interpreted as “escapes not forming a valid escape sequence” (see above).

Usually, long brackets with a level >= 1 (ie: containing one or more equal signs: [=[, [==[, etc) are safer than level-0 long brackets (ie: [[) because often a regular expression can contain two closing square brackets in a row (eg: [[ [\(\)[\]] ]]), which would be prematurely interpreted by Lua as the string’s closing delimiter, or it can end with a square bracket which might be wrongly interpreted as the first bracket of an unspaced closing pair-delimiter (eg: [[[a-zA-Z]]]) — to avoid this, remember to always pad the bracket delimiters with spaces; Highlight will strip the string of leading and trailing whitespaces:

If a raw string content starts with “[” or ends with “]”, pad the paranthesis with space to avoid a syntax error. Highlight will strip the string.

(Highlight Wiki » File format)

Beside avoiding the pitfalls just mentioned, padded strings with level >= 1 long brackets are also going to be easier to read for others who will examine your language definitions. Compare the previous examples with their padded level-1 counterparts:

  • [[ [\(\)[\]] ]] vs [=[ [\(\)[\]] ]=]
  • [[[a-zA-Z]]] vs [=[ [a-zA-Z] ]=]