Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A META parser generator using LL(1) grammars with s-expressions.
Common Lisp
Tree: b0c5286e8c

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
README
atoms.lisp
meta-sexp.asd
meta-sexp.lisp
packages.lisp
specials.lisp
util.lisp

README

 ,----------.
 | OVERVIEW |
 `----------'

meta-sexp is a META parser generator using LL(1) grammars with
s-expressions. meta-sexp uses in-memory string vectors, instead of
commonly used streams, for efficiently stepping backward and forward
through the input. It is tested on SBCL but should be portable to
other implementations as well.

meta-sexp is implemented using modular transformation processors.
Therefore, besides builtin grammar transformators coming with
meta-sexp, you can easily add your own transformation keywords to
meta-sexp too. (See DEFINE-TRANSFORMATION macro in meta-sexp.lisp.)

Inspired by src/parser.lisp of core-stream project at
http://core.gen.tr/

Idea is based on the META language discussed in `Pragmatic Parsing
in Common Lisp' paper of Henry G. Baker
[ACM Lisp Pointers 4, 2 (Apr/Jun 1991), 3-15]


 ,--------------------.
 | QUICK INTRODUCTION |
 `--------------------'

In most of the time, you'll need to define your own parsers using
CREATE-PARSER-CONTEXT methods and DEFRULE, DEFRENDERER macros.

  create-parser-context ((input string) &key start end attachment)
  create-parser-context ((input string-stream) &key buffer-size start end attachment)

  defrule (name (&rest args) (&optional attachment) &body body)
  defrenderer (name (&rest args) (&optional attachment) &body body)

In a rule or renderer body, if supplied, ATTACHMENT argument will get
bound to ATTACHMENT keyword given to CREATE-PARSER-CONTEXT.

In some certain situations, you may also need to use DEFATOM too. See
atoms.lisp for DEFATOM examples.

Here is a tiny example:

  (defrule integer? (&aux (sign 1) d (num 0)) ()
    (:? (:or (:and "-" (:assign sign -1))
             "+"))
    (:+ (:assign d (:type digit?))
        (:assign num (+ (* num 10)
	                (- (char-code d) #.(char-code #\0)))))
    (:return (* sign num)))

  (integer? (create-parser-context "+123")) ==> 123
  (integer? (create-parser-context "-123")) ==> -123

Here is another example demonstrating the usage of META symbol.

  (defrule in-wonderland? () ()
    "META-SEXP"
    (progn
      (format t "META-SEXP in Wonderland!")
      (meta (:type space?)
	    "in Wonderland!"))
    (:return t))

  (in-wonderland?
   (create-parser-context :data "META-SEXP in Wonderland!"))
  META-SEXP in Wonderland!
  ==> T

  (in-wonderland?
   (create-parser-context :data "META-SEXP in Fooland!"))
  META-SEXP in Wonderland!
  ==> NIL

Here's a complete example with renderers and attachments.

  (defrenderer internal-link! (label &optional text) (attachment)
    (format attachment "<a href='~a'>~a</a>"
            label (if (empty-char-accum-p text) label text)))
  
  (defrule internal-link? (&aux (ref (make-char-accum)) (text (make-char-accum))) ()
    "[["
    (:+ (:not (:or "]]" (:type (or white-space? newline?))))
        (:char-push ref))
    (:? (:* (:type (or white-space? newline?)))
        (:+ (:not "]]")
	    (:char-push text)))
    "]]"
    (:render internal-link! ref text))
  
  (defrule wiki-markup? (&aux c) (attachment)
    (:* (:or (:rule internal-link?)
             (:and (:assign c (:read-atom))
	           (write-char c attachment))))
    (get-output-stream-string attachment))

  (wiki-markup?
   (create-parser-context
    "foo bar [[ref text]] and [[just-ref]] here."
    :attachment (make-string-output-stream)))
  ==> "foo bar <a href='ref'>text</a> and <a href='just-ref'>just-ref</a> here."

What's the role of ATTACHMENT slot given to CREATE-PARSER-CONTEXT (or
specified as a keyword while making an instance of PARSER-CONTEXT
class)? Think it as a state storage unit between passes to defined
rules and renderers. (For instance, in our above example, ATTACHMENT
used as a common output stream.) Yes, it is possible to let this
problem get solved by the programmer via global variables. But this
approach yields to another problem: thread safety. Anyway, that was
the best that I can come up with; if you have any other ideas, I'd be
happy to hear them.


 ,-------------------------.
 | AVAILABLE TYPE CHECKERS |
 `-------------------------'

These functions (and types) are routines introduced using DEFATOM
and operates on character codes. In case of need, you can add your
own type checkers. (See source for examples.)

ALNUM? ALPHA? GRAPHIC? ASCII? BIT? DIGIT? EXTENDED? LOWER? NEWLINE?
SPACE? TAB? UPPER? WHITE-SPACE?


 ,--------------------------.
 | BUILT-IN TRANSFORMATIONS |
 `--------------------------'

(:ICASE FORM FORM ...)
  Make case-insensitive atom comparison in supplied FORMs.

(:CHECKPOINT FORM)
  Sequentially evaluates supplied forms and if any of them fails,
  moves cursor back to its start position :CHECKPOINT began.

(:AND FORM FORM ...)
(:OR FORM FORM ...)

(:NOT FORM)
  Identical to (NOT FORM). (FORM is encapsulated within a :CHECKPOINT
  before getting evaluated.)

(:RETURN VALUE VALUE ...)
  Returns from the rule with supplied VALUEs.

(:RENDER RENDERER ARG ARG ...)
  Calls specified RENDERER (that is defined with DEFRENDERER) with
  supplied arguments.

(:? FORM FORM ...)
  Sequentially evaluates supplied FORMs within an AND scope and
  regardless of the return value of ANDed FORMs, block returns T.
  (Similar to `?' in regular expressions.)

(:* FORM FORM ...)
  Sequentially evaluates supplied FORMs within an AND scope until it
  returns NIL. Regardless of the return value of ANDed FORMs, block
  returns T. (Similar to `*' in regular expressions.)

(:+ FORM FORM ...)
  Sequentially evaluates supplied FORMs within an AND scope, and
  repeats this process till FORMs return NIL. Scope returns T if FORMs
  returned T once or more, otherwise returns NIL. (Similar to `{1,}'
  in regular expressions.)

(:TYPE TYPE-CHECKER)
(:TYPE (OR TYPE-CHECKER TYPE-CHECKER ...))
  Checks type of the atom at the current position through supplied
  function(s).

(:RULE RULE ARG ARG ...)
(:RULE (OR RULE RULE ...) ARG ARG ...)
  Tests input in the current cursor position using specified
  type/form. If any, supplied arguments will get passed to rule.

(:ASSIGN VAR FORM)
(:ASSIGN (VAR1 VAR2 ...) FORM)
  Assigns returned value of FORM to VAR, and returns assigned
  value. (Latter form works similar to MULTIPLE-VALUE-SETQ.)

(:LIST-PUSH ITEM-VAR LIST-ACCUM)
(:CHAR-PUSH CHAR-VAR CHAR-ACCUM)
(:CHAR-PUSH CHAR-ACCUM)
  Pushes supplied ITEM-VAR/CHAR-VAR into specified
  LIST-ACCUM/CHAR-ACCUM. If :CHAR-PUSH is called with only one
  argument, current character gets read and pushed into supplied
  accumulator. (You can use MAKE-LIST-ACCUM and MAKE-CHAR-ACCUM
  functions to initialize new accumulators. Moreover, you'll probably
  need EMPTY-LIST-ACCUM-P and EMPTY-CHAR-ACCUM-P predicates too.)

(:LIST-RESET LIST-ACCUM)
(:CHAR-RESET CHAR-ACCUM)
  Resets supplied accumulators.

(:EOF)
  Returns true when reached to the end of supplied input data.

(:READ-ATOM)
  Reads current atom at the cursor position and returns read atom.

(:DEBUG)
(:DEBUG VAR)
  Prints current character and its position in the input data. If VAR
  is specified, prints the value of the VAR.

If a form doesn't start with any of the above keywords, there're
three possiblities remaining:

  i. This can be a character.
 ii. This can be a string. (Will get expanded into an AND'ed character
     list with an outermost :CHECKPOINT.)
iii. Treat as a custom form. (Will get evaluated as is)

When you're in the third situation, to be able to get your META
s-expressions compiled again, use META keyword. (See the second
example in the Quick Introduction.)
Something went wrong with that request. Please try again.