Permalink
Switch branches/tags
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
212 lines (130 sloc) 5.62 KB

REX: Ruby Lex for Racc

About

Lexical Scanner Generator used with Racc for Ruby

Usage

rex [options] grammarfile
-o  --output-file  filename   designate output filename
-s  --stub                    append stub main for debugging
-i  --ignorecase              ignore character case
-C  --check-only              only check syntax
    --independent             independent mode
-d  --debug                   print debug information
-h  --help                    print usage
    --version                 print version
    --copyright               print copyright

Default Output Filename

The destination file for foo.rex is foo.rex.rb.
To use, include the following in the Ruby source code file.
  require 'foo.rex'

Grammar File Format

A definition consists of a header section, a rule section, 
and a footer section.  The rule section includes one or more clauses.
Each clause starts with a keyword.
Summary:
  [Header section]
  "class" Foo
  ["option"
    [options] ]
  ["inner"
    [methods] ]
  ["macro"
    [macro-name  regular-expression] ]
  "rule"
    [start-state]  pattern  [actions]
  "end"
  [Footer section]

Grammar File Description Example

class Foo
macro
  BLANK         \s+
  DIGIT         \d+
rule
  {BLANK}
  {DIGIT}       { [:NUMBER, text.to_i] }
  .             { [text, text] }
end

Header Section ( Optional )

All of the contents described before the definitions in the rule section are 
copied to the beginning of the output file.

Footer Section ( Optional )

All the contents described after the definitions in the rule section are 
copied to the end of the output file.

Rule Section

The rule section starts at the line beginning with the "class" keyword 
and ends at the line beginning with the "end" keyword.
The class name is specified after the "class" keyword.
If a module name is specified, the class will be included in the module.
A class that inherits Racc::Parser is generated.

Example of Rule Section Definition

class Foo
class Bar::Foo

Option Section ( Optional )

This section begins with the "option" keyword.
  "ignorecase"  ignore the character case when pattern matching
  "stub"        append stub main for debugging
  "independent" independent mode, do not inherit Racc.

Inner Section for User Code ( Optional )

This section begins with the "inner" keyword.
The contents defined here are defined by the contents of the class 
of the generated scanner.

Macro Section ( Optional )

This section begins with the "macro" keyword.
A name is assigned to one regular expression.
A space character (0x20) can be included by using a backslash \ to escape.

Example of Macro Definition

DIGIT         \d+
IDENT         [a-zA-Z_][a-zA-Z0-9_]*
BLANK         [\ \t]+
REMIN         \/\*
REMOUT        \*\/

Rule Section

This section begins with the "rule" keyword.
[state]  pattern  [actions]

state: Start State ( Optional )

A start state is indicated by an identifier beginning with ":", a Ruby symbol.
If uppercase letters follow the ":", the state becomes an exclusive start state.
If lowercase letters follow the ":", the state becomes an inclusive start state.
The initial value and the default value of a start state are nil.

pattern: String Pattern

A regular expression specifies a character string.
A regular expression description may include a macro definition enclosed
by curly braces { }.
A macro definition is used when the regular expression includes whitespace.

actions: Processing Actions ( Optional )

An action is executed when the pattern is matched.
The action defines the process for creating the appropriate token.
A token is a two-element array containing a type and a value, or is nil.
The following elements can be used to create a token.
  lineno    Line number     ( Read Only )
  text      Matched string  ( Read Only )
  state     Start state     ( Read/Write )
The action is a block of Ruby code enclosed by { }.
Do not use functions that exit the block and change the control flow.
( return, exit, next, break, ... )
If the action is omitted, the matched character string is discarded, 
and the process advances to the next scan.

Example of Rule Section Definition

      {REMIN}                 { state = :REM ; [:REM_IN, text] }
:REM  {REMOUT}                { state = nil ; [:REM_OUT, text] }
:REM  (.+)(?={REMOUT})        { [:COMMENT, text] }
      {BLANK}
      -?{DIGIT}               { [:NUMBER, text.to_i] }
      {WORD}                  { [:word, text] }
      .                       { [text, text] }

Comments ( Optional )

Any text following a "#" to the end of the line becomes a comment.

Using the Generated Class

scan_setup( str )

Initializes the scanner with the str string argument.
This is redefined and used.

scan_str( str )

Parses the string described in the defined grammar.
The tokens are stored internally.

scan_file( filename )

Reads in a file described in the defined grammar.
The tokens are stored internally.

next_token

The tokens stored internally are extracted one by one.
When there are no more tokens, nil is returned.

Notice

This specification is provisional and may be changed without prior notice.