Skip to content

Latest commit

 

History

History
279 lines (222 loc) · 6.87 KB

README.md

File metadata and controls

279 lines (222 loc) · 6.87 KB

sexpr GoDoc

xiam/s-expr unit tests status

The sexpr package is a Go library that provides a general purpose S-expression parser and a lexer. The lexer scans a stream of bytes for a finite set of patterns (tokens) and identifies lexemes, the parser takes those lexemes as input and builds abstract syntax trees (AST) with them.

What's this for?

ASTs can be used for several purposes:

  • Configuration files (like Emacs).
  • Representation of a program (try to build your own Lisp interpreter!).
  • Representation of lambda terms.
  • Build transformational grammar graphs.
  • ...or any expression that can be represented as a tree.

Example

The following bracketed expression:

[S [N John] [VP [V hit] [NP [D the] [N ball]]]]

that corresponds to this constituency-based parse tree:

Constituency-based parse trees

could be processed by sexpr, converted into an AST for easy manipulation and then transformed into something else, like XML:

<list>
  <list>
    <symbol>S</symbol>
    <list>
      <symbol>N</symbol>
      <symbol>John</symbol>
    </list>
    <list>
      <symbol>VP</symbol>
      <list>
        <symbol>V</symbol>
        <symbol>hit</symbol>
      </list>
      <list>
        <symbol>NP</symbol>
        <list>
          <symbol>D</symbol>
          <symbol>the</symbol>
        </list>
        <list>
          <symbol>N</symbol>
          <symbol>ball</symbol>
        </list>
      </list>
    </list>
  </list>
</list>

Subpackages

Lexer

When a stream of bytes matches any of the following patterns:

(, ), [, ], {, }, ", #, [a-zA-Z_], :, ., \

a new instance of that token is created and recorded along with the line and column where the match was found.

Example

The following example converts a byte slice into an slice of tokens and prints details about each one of these tokens:

package main

import (
  ...

  "github.com/xiam/s-expr/lexer"
)

func main() {
  input := `
    (fn_a # comment
      (fn_b [89 :A :B [67 3.27]])
      (fn_c 66 3 53 "Hello world!")
    )
  `

  tokens, err := lexer.Tokenize([]byte(input))
  if err != nil {
    // ...
  }

  for i, tok := range tokens {
    line, col := tok.Pos()
    lexeme := tok.Text()
    tt := tok.Type().String()

    // Example output:
    // token[53] (type: separator, line: 6, col: 1)
    //   -> "\t"
    fmt.Printf("token[%d] (type: %v, line: %d, col: %d)\n\t-> %q\n\n", i, tt, line, col, lexeme)
  }
}

For more information see the list of token types.

Parser

The parser analyzes input from the lexer and tries to build an AST. The AST begins with a root (of type list) and can branch out nodes of different types:

Pattern Type Description Example
(...) expression A set of items enclosed between paranthesis. (fib 1)
[...] list A set of items enclosed between square brackets. [1 2 3]
{...} map A set of item pairs enclosed between curly brackets. {:A 65}
-?[0-9]* int An integer. (64-bit) 123
-?[0-9]+\.[0-9]+ float A floating point number. (64-bit) 1.23
[a-zA-Z][a-zA-Z0-9_]+ symbol An alphanumeric word. hello
:[a-zA-Z][a-zA-Z0-9_]+ atom An alphanumeric word preceded by a column. :hello
"..." string Any stream of bytes enclosed between double quotes. "hello"

Some nodes can branch out children (vector nodes) and some others can only hold values (value nodes).

Example

package main

import (
  ...

  "github.com/xiam/s-expr/ast"
  "github.com/xiam/s-expr/parser"
)

func main() {
  input := `(fn_a
    (fn_b [89 :A :B [67 3.27]])
    (fn_c 66 3 53 "Hello world!" 😊))
  `

  root, err := parser.Parse([]byte(input))
  if err != nil {
    log.Fatal("parser.Parse:", err)
  }

  ast.Print(root)
}

See the list of node types.

AST

The following byte stream:

(fn_a (fn_b [89 :A :B [67 3.27]]) (fn_c 66 3 53 "Hello world!"))

gets transformed into an AST that looks like this:

AST

a text representation of this AST would look like:

[
  (
    fn_a
    (
      fn_b
      [
        89
        :A
        :B
        [
          67
          3.27
        ]
      ]
    )
    (
      fn_c
      66
      3
      53
      "Hello world!"
    )
  )
]

AST traversal

The AST always begins with a node of type list. In order to traverse an input stream like:

(fn_a (fn_b [89 :A :B [67 3.27]]) (fn_c 66 3 53 "Hello world!"))`

you could start with the root node and follow every node, like in the example below:

func printTree(node *ast.Node) {
  printIndentedTree(node, 0)
}

func printIndentedTree(node *ast.Node, indentationLevel int) {
  indent := strings.Repeat("  ", indentationLevel)
  if node.IsVector() {
    fmt.Printf("%s<%s>\n", indent, node.Type())
    children := node.List()
    for i := range children {
      printIndentedTree(children[i], indentationLevel+1)
    }
    fmt.Printf("%s</%s>\n", indent, node.Type())
    return
  }
  fmt.Printf("%s<%s>%v</%s>\n", indent, node.Type(), node.Value(), node.Type())
}

The example above prints a XML-like tree similar to:

<list>
  <expression>
    <symbol>fn_a</symbol>
    <expression>
      <symbol>fn_b</symbol>
      <list>
        <int>89</int>
        <atom>:A</atom>
        <atom>:B</atom>
        <list>
          <int>67</int>
          <float>3.27</float>
        </list>
      </list>
    </expression>
    <expression>
      <symbol>fn_c</symbol>
      <int>66</int>
      <int>3</int>
      <int>53</int>
      <string>Hello world!</string>
    </expression>
  </expression>
</list>

Examples