Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

Initial import.

  • Loading branch information...
commit 124c97ca31217b855b7cad8d9917c1d3d5ff187f 0 parents
Volkan YAZICI authored
Showing with 532 additions and 0 deletions.
  1. +149 −0 README
  2. +45 −0 meta-sexp.asd
  3. +244 −0 meta-sexp.lisp
  4. +62 −0 packages.lisp
  5. +32 −0 rules.lisp
149 README
@@ -0,0 +1,149 @@
+ ,----------.
+ | OVERVIEW |
+ `----------'
+
+meta-sexp is a META parser generator using LL(1) grammars with
+s-expressions. meta-sexp uses in-memory string vectors, instead of
+commonly used streams, for efficiently stepping backward and forward
+through the input. It is tested on SBCL but should be portable to
+other implementations as well.
+
+Inspired by src/parser.lisp of core-stream project at
+http://core.gen.tr/
+
+Idea is based on the META language discussed in `Pragmatic Parsing
+in Common Lisp' paper of Henry G. Baker
+[ACM Lisp Pointers 4, 2 (Apr/Jun 1991), 3-15]
+
+
+ ,--------------------.
+ | QUICK INTRODUCTION |
+ `--------------------'
+
+In most of the time, you'll need to define your own parsers using
+DEFRULE. But in some certain situations, you may need to DEFATOM
+too. (See builtin functions for further examples.)
+
+(defrule wiki-link? (&aux c (href (make-char-accum)) (text (make-char-accum)))
+ "[["
+ (:+ (:not (:or "]]" (:type white-space?)))
+ (:assign c (:type graphic?))
+ (:char-push c href))
+ (:? (:rule lwsp?)
+ (:+ (:not "]]")
+ (:assign c (:type graphic?))
+ (:char-push c text)))
+ "]]"
+ (:return href text))
+
+(wiki-link? (make-instance 'parser-context :data "[[http://foo.com/]]"))
+==> "http://foo.com/", ""
+
+(wiki-link? (make-instance 'parser-context :data "[[http://foo.com/ bar baz]]"))
+==> "http://foo.com/", "bar baz"
+
+(wiki-link? (make-instance 'parser-context :data "[[]]"))
+==> NIL
+
+(defrule in-wonderland? ()
+ "META-SEXP"
+ (progn
+ (format t "META-SEXP in Wonderland!")
+ (meta (:type space?)
+ "in Wonderland!"))
+ (:return t))
+
+(in-wonderland? (make-instance 'parser-context :data "META-SEXP in Wonderland!"))
+==> META-SEXP in Wonderland!
+T
+(in-wonderland? (make-instance 'parser-context :data "META-SEXP in Fooland!"))
+==> META-SEXP in Wonderland!
+NIL
+
+
+ ,-------------------------.
+ | AVAILABLE TYPE CHECKERS |
+ `-------------------------'
+
+These functions (and types) are routines introduced using DEFATOM
+and operates on character codes. In case of need, you can add your
+own type checkers. (See source for examples.)
+
+ALNUM? ALPHA? GRAPHIC? ASCII? BIT? DIGIT? EXTENDED? LOWER? NEWLINE?
+SPACE? TAB? UPPER? WHITE-SPACE?
+
+
+ ,-----------------.
+ | AVAILABLE RULES |
+ `-----------------'
+
+Rules are parser grammars compiled by COMPILE-GRAMMAR.
+
+LWSP? (Linear White-Space)
+
+
+ ,---------------------------.
+ | AVAILABLE SYNTAX KEYWORDS |
+ `---------------------------'
+
+(:CHECKPOINT FORM)
+ If form returns NIL, cursor will be back-positioned to its old
+ location :CHECKPOINT keyword was used.
+
+(:AND FORM FORM ...)
+(:OR FORM FORM ...)
+
+(:NOT FORM)
+ Besides its normal behaviour, (:NOT ...) expressions
+ automatically get encapsulated in (:CHECKPOINT ...) clauses.
+
+(:RETURN VAR VAR ...)
+ Returns supplied variables using VALUES function.
+
+(:? FORM FORM ...)
+ May appear once. (Similar to `?' in regular expressions.)
+(:* FORM FORM ...)
+ May appear none or more. (Similar to `*' in regular expressions.)
+(:+ FORM FORM ...)
+ Must appear at least once. (Similar to `{1,}' in regular expressions.)
+
+(:TYPE TYPE-CHECKER)
+(:TYPE (OR TYPE-CHECKER TYPE-CHECKER ...))
+(:RULE RULE)
+(:RULE (OR RULE RULE ...))
+ Tests current input from the current cursor position using
+ specified type/form.
+
+(:ASSIGN VAR FORM)
+ Assigns returned value of FORM to VAR, and returns assigned
+ value.
+
+(:LIST-PUSH ITEM-VAR LIST-ACCUM)
+(:CHAR-PUSH CHAR-VAR CHAR-ACCUM)
+(:CHAR-PUSH CHAR-ACCUM)
+ Pushes supplied ITEM-VAR/CHAR-VAR into specified
+ LIST-ACCUM/CHAR-ACCUM. If :CHAR-PUSH is called with only one
+ argument, current character gets read and pushed into supplied
+ accumulator. (You can use MAKE-LIST-ACCUM and MAKE-CHAR-ACCUM
+ functions to initialize new accumulators.)
+
+(:LIST-RESET LIST-ACCUM)
+(:CHAR-RESET CHAR-ACCUM)
+ Resets supplied accumulators.
+
+(:DEBUG)
+(:DEBUG VAR)
+ Prints current character and its position in the input data. If VAR
+ is specified, prints the value of the VAR.
+
+If a form doesn't start with any of the above keywords, there're
+three possiblities remaining:
+
+ i. This can be a character.
+ ii. This can be a string. (Will get expanded into an AND'ed character
+ list with an outermost :CHECKPOINT.)
+iii. Treat as a custom form. (Will get evaluated as is.)
+
+When you're in the third situation, to be able to get your META
+s-expressions compiled again, use META keyword. (See the second
+example in the Quick Introduction.)
45 meta-sexp.asd
@@ -0,0 +1,45 @@
+;;; Copyright (c) 2007, Volkan YAZICI <yazicivo@ttnet.net.tr>
+;;; All rights reserved.
+
+;;; Redistribution and use in source and binary forms, with or without
+;;; modification, are permitted provided that the following conditions
+;;; are met:
+
+;;; - Redistributions of source code must retain the above copyright
+;;; notice, this list of conditions and the following disclaimer.
+;;; - Redistributions in binary form must reproduce the above
+;;; copyright notice, this list of conditions and the following
+;;; disclaimer in the documentation and/or other materials provided
+;;; with the distribution.
+
+;;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+;;; CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+;;; INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+;;; MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+;;; DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+;;; BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+;;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+;;; TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+;;; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+;;; ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+;;; TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+;;; THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+;;; SUCH DAMAGE.
+
+(in-package :cl-user)
+
+(defpackage :meta-sexp-asd
+ (:use :cl :asdf))
+
+(in-package :meta-sexp-asd)
+
+(defconstant +meta-sexp-version+ "0.1")
+(export '+meta-sexp-version+)
+
+(asdf:defsystem :meta-sexp
+ :serial t
+ :version +meta-sexp-version+
+ :depends-on (:cl-utilities)
+ :components ((:file "packages")
+ (:file "meta-sexp")
+ (:file "rules")))
244 meta-sexp.lisp
@@ -0,0 +1,244 @@
+;;; Copyright (c) 2007, Volkan YAZICI <yazicivo@ttnet.net.tr>
+;;; All rights reserved.
+
+;;; Redistribution and use in source and binary forms, with or without
+;;; modification, are permitted provided that the following conditions
+;;; are met:
+
+;;; - Redistributions of source code must retain the above copyright
+;;; notice, this list of conditions and the following disclaimer.
+;;; - Redistributions in binary form must reproduce the above
+;;; copyright notice, this list of conditions and the following
+;;; disclaimer in the documentation and/or other materials provided
+;;; with the distribution.
+
+;;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+;;; CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+;;; INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+;;; MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+;;; DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+;;; BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+;;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+;;; TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+;;; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+;;; ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+;;; TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+;;; THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+;;; SUCH DAMAGE.
+
+(in-package :meta-sexp)
+
+
+;;; Parser Context Class & Routines
+
+(defclass parser-context ()
+ ((data
+ :initarg :data
+ :accessor parser-context-data
+ :documentation "Input data getting parsed.")
+ (size
+ :initarg :size
+ :initform nil
+ :documentation "Size of the input data.")
+ (cursor
+ :initarg :cursor
+ :initform 0
+ :accessor parser-context-cursor
+ :documentation "Current location on the input data.")
+ (checkpoints
+ :initform nil
+ :accessor parser-context-checkpoints
+ :documentation "Reversed list of declared checkpoints."))
+ (:documentation "Information about current state of the parsing process."))
+
+(defgeneric parser-context-size (ctx))
+
+(defmethod parser-context-size ((ctx parser-context))
+ (or (slot-value ctx 'size)
+ (setf (slot-value ctx 'size)
+ (length (parser-context-data ctx)))))
+
+(defgeneric peek-atom (ctx))
+(defgeneric read-atom (ctx))
+(defgeneric checkpoint (ctx))
+(defgeneric rollback (ctx))
+(defgeneric commit (ctx))
+
+(define-condition parser-context-error ()
+ ((operation :initarg :operation :accessor parser-context-error-operation)))
+
+(defmethod peek-atom ((ctx parser-context))
+ (if (< (parser-context-cursor ctx) (parser-context-size ctx))
+ (elt (parser-context-data ctx) (parser-context-cursor ctx))))
+
+(defmethod read-atom ((ctx parser-context))
+ (when (< (parser-context-cursor ctx) (parser-context-size ctx))
+ (incf (parser-context-cursor ctx))
+ (elt (parser-context-data ctx) (1- (parser-context-cursor ctx)))))
+
+(defmethod checkpoint ((ctx parser-context))
+ (push (parser-context-cursor ctx) (parser-context-checkpoints ctx)))
+
+(defmethod rollback ((ctx parser-context))
+ (let ((prev-pos (pop (parser-context-checkpoints ctx))))
+ (if prev-pos
+ (setf (parser-context-cursor ctx) prev-pos)
+ (error 'parser-context-error :operation 'rollback))))
+
+(defmethod commit ((ctx parser-context))
+ (if (not (pop (parser-context-checkpoints ctx)))
+ (error 'parser-context-error :operation 'commit)))
+
+
+;;; Atom, Rule & Type Matching
+
+(defun match-atom (ctx atom &aux (c (peek-atom ctx)))
+ (if (and c (char= atom c))
+ (read-atom ctx)))
+
+(defmacro match-type (ctx type)
+ `(if (typep (peek-atom ,ctx) ',type)
+ (read-atom ,ctx)))
+
+(defmacro match-rule (ctx rule args)
+ `(,rule ,@(nconc (list ctx) args)))
+
+
+;;; Accumulators
+
+(defun make-char-accum (&key (size 512))
+ (make-array size :element-type 'character :adjustable t :fill-pointer 0))
+
+(defun char-accum-push (char accum)
+ (vector-push-extend char accum))
+
+(defun reset-char-accum (accum)
+ (setf (fill-pointer accum) 0))
+
+(defun make-list-accum ()
+ nil)
+
+(defmacro list-accum-push (item accum)
+ `(push ,item ,accum))
+
+(defmacro reset-list-accum (accum)
+ `(setf ,accum nil))
+
+
+;;; Grammar Compiler
+
+(defun compile-grammar (ctx form)
+ (labels ((compile-exprs (form &optional (in-meta t))
+ (mapcar #'(lambda (form) (compile-expr form in-meta)) form))
+ (compile-expr (form &optional (in-meta t))
+ (if in-meta
+ (cond
+ ((and (consp form) (keywordp (car form)))
+ (ecase (car form)
+ (:checkpoint
+ (with-gensyms (ret)
+ `(progn
+ (checkpoint ,ctx)
+ (let ((,ret ,(compile-expr (cadr form))))
+ (if ,ret
+ (commit ,ctx)
+ (rollback ,ctx))
+ ,ret))))
+ (:and `(and ,@(compile-exprs (cdr form))))
+ (:or `(or ,@(compile-exprs (cdr form))))
+ (:not (compile-expr `(:checkpoint (not ,(compile-expr (cadr form))))))
+ (:return `(return-from rule-block (values ,@(cdr form))))
+ (:? `(prog1 t ,(compile-expr `(:and ,@(cdr form)))))
+ (:* `(not (do () ((not ,(compile-expr `(:and ,@(cdr form))))))))
+ (:+ (compile-expr `(:and ,@(cdr form) (:* ,@(cdr form)))))
+ (:type `(match-type ,ctx ,(cadr form)))
+ (:rule
+ (if (and (consp (cadr form))
+ (eql 'or (caadr form)))
+ (compile-expr
+ `(:or ,@(mapcar #'(lambda (form) `(:rule ,form)) (cdadr form))))
+ `(match-rule ,ctx ,(cadr form) ,(cddr form))))
+ (:assign `(setq ,(cadr form) ,(compile-expr (caddr form))))
+ (:list-push `(list-accum-push ,(cadr form) ,(caddr form)))
+ (:list-reset `(reset-list-accum ,(cadr form)))
+ (:char-push
+ (if (cddr form)
+ `(char-accum-push ,(cadr form) ,(caddr form))
+ `(char-accum-push (read-atom ,ctx) ,(cadr form))))
+ (:char-reset `(reset-char-accum ,(cadr form)))
+ (:debug
+ `(prog1 t
+ ,(if (cadr form)
+ `(format t "DEBUG: ~a: ~a~%" ',(cadr form) ,(cadr form))
+ `(format t "DEBUG: cursor: [~a] `~a'~%"
+ (parser-context-cursor ,ctx)
+ (elt (parser-context-data ,ctx)
+ (parser-context-cursor ,ctx))))))))
+ ((characterp form) `(match-atom ,ctx ,form))
+ ((stringp form) (compile-expr `(:checkpoint (:and ,@(coerce form 'list)))))
+ (t (compile-expr form nil)))
+ (cond
+ ((and (consp form) (eql 'meta (car form)))
+ (format t "will get compiled: ~a~%" `(:and ,@(cdr form)))
+ (compile-expr `(:and ,@(cdr form))))
+ ((consp form) (compile-exprs form nil))
+ (t form)))))
+ (compile-expr form)))
+
+
+;;; Atom Definitions
+
+(defmacro defatom (name &body body)
+ `(progn
+ (defun ,name (c) (when c ,@body))
+ (deftype ,name () `(satisfies ,',name))))
+
+(defatom ascii?
+ (typep c 'standard-char))
+
+(defatom extended?
+ (typep c 'extended-char))
+
+(defatom alpha?
+ (alpha-char-p c))
+
+(defatom alnum?
+ (alphanumericp c))
+
+(defatom graphic?
+ (graphic-char-p c))
+
+(defatom upper?
+ (upper-case-p c))
+
+(defatom lower?
+ (lower-case-p c))
+
+(defatom digit?
+ (digit-char-p c))
+
+(defatom bit?
+ (or (char= c #\0)
+ (char= c #\1)))
+
+(defatom space?
+ (char= c #\space))
+
+(defatom newline?
+ (char= c #\newline))
+
+(defatom tab?
+ (char= c #\tab))
+
+(defatom white-space?
+ (or (space? c)
+ (tab? c)))
+
+
+;;; Rule Definitions
+
+(defmacro defrule (name (&rest args) &body body)
+ (with-gensyms (ctx)
+ `(defun ,name ,(nconc (list ctx) args)
+ (block rule-block
+ ,(compile-grammar ctx `(:checkpoint (:and ,@body)))))))
62 packages.lisp
@@ -0,0 +1,62 @@
+;;; Copyright (c) 2007, Volkan YAZICI <yazicivo@ttnet.net.tr>
+;;; All rights reserved.
+
+;;; Redistribution and use in source and binary forms, with or without
+;;; modification, are permitted provided that the following conditions
+;;; are met:
+
+;;; - Redistributions of source code must retain the above copyright
+;;; notice, this list of conditions and the following disclaimer.
+;;; - Redistributions in binary form must reproduce the above
+;;; copyright notice, this list of conditions and the following
+;;; disclaimer in the documentation and/or other materials provided
+;;; with the distribution.
+
+;;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+;;; CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+;;; INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+;;; MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+;;; DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+;;; BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+;;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+;;; TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+;;; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+;;; ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+;;; TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+;;; THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+;;; SUCH DAMAGE.
+
+(in-package :cl-user)
+
+(defpackage :meta-sexp
+ (:documentation "LL(1) parser generator in META using s-expressions.")
+ (:use :cl :cl-utilities)
+ (:export :defatom
+ :defrule
+ :parser-context
+ :parser-context-error
+ :make-char-accum
+ :char-accum-push
+ :reset-char-accum
+ :make-list-accum
+ :list-accum-push
+ :reset-list-accum
+ :compile-grammar
+ :grammar-error
+ :meta
+ ;; Builtin Type-Checkers
+ :alnum?
+ :alpha?
+ :ascii?
+ :bit?
+ :digit?
+ :extended?
+ :graphic?
+ :lower?
+ :newline?
+ :space?
+ :tab?
+ :upper?
+ :white-space?
+ ;; Builtin Rules
+ :lwsp?))
32 rules.lisp
@@ -0,0 +1,32 @@
+;;; Copyright (c) 2007, Volkan YAZICI <yazicivo@ttnet.net.tr>
+;;; All rights reserved.
+
+;;; Redistribution and use in source and binary forms, with or without
+;;; modification, are permitted provided that the following conditions
+;;; are met:
+
+;;; - Redistributions of source code must retain the above copyright
+;;; notice, this list of conditions and the following disclaimer.
+;;; - Redistributions in binary form must reproduce the above
+;;; copyright notice, this list of conditions and the following
+;;; disclaimer in the documentation and/or other materials provided
+;;; with the distribution.
+
+;;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+;;; CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+;;; INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+;;; MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+;;; DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+;;; BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+;;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+;;; TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+;;; DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+;;; ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+;;; TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+;;; THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+;;; SUCH DAMAGE.
+
+(in-package :meta-sexp)
+
+(defrule lwsp? ()
+ (:+ (:type white-space?)))
Please sign in to comment.
Something went wrong with that request. Please try again.