Skip to content

torch2424/assemblyscript-regex

 
 

Repository files navigation

assemblyscript-regex

A regex engine for AssemblyScript.

AssemblyScript is a new language, based on TypeScript, that runs on WebAssembly. AssemblyScript has a lightweight standard library, but lacks support for Regular Expression. The project fills that gap!

This project exposes an API that mirrors the JavaScript RegExp class:

const regex = new RegExp("fo*", "g");
const str = "table football, foul";

let match: Match | null = regex.exec(str);
while (match != null) {
  // first iteration
  //   match.index = 6
  //   match.matches[0] = "foo"

  // second iteration
  //   match.index = 16
  //   match.matches[0] = "fo"
  match = regex.exec(str);
}

Project status

The initial focus of this implementation has been feature support and functionality over performance. It currently supports a sufficient number of regex features to be considered useful, including most character classes, common assertions, groups, alternations, capturing groups and quantifiers.

The next phase of development will focussed on more extensive testing and performance. The project currently has reasonable unit test coverage, focussed on positive and negative test cases on a per-feature basis. It also includes a more exhaustive test suite with test cases borrowed from another regex library.

Feature support

Based on the classfication within the MDN cheatsheet

Character classes

  • .
  • \d
  • \D
  • \w
  • \W
  • \s
  • \S
  • \t
  • \r
  • \n
  • \v
  • \f
  • [\b]
  • \0
  • \cX
  • \xhh
  • \uhhhh
  • \u{hhhh} or \u{hhhhh}
  • \

Assertions

  • ^
  • $
  • \b
  • \B

Other assertions

  • x(?=y) Lookahead assertion
  • x(?!y) Negative lookahead assertion
  • (?<=y)x Lookbehind assertion
  • (?<!y)x Negative lookbehind assertion

Groups and ranges

  • x|y
  • [xyz][a-c]
  • [^xyz][^a-c]
  • (x) capturing group
  • \n back reference
  • (?x) named capturing group
  • (?:x) Non-capturing group

Quantifiers

  • x*
  • x+
  • x?
  • x{n}
  • x{n,}
  • x{n,m}
  • x*? / x+? / ...

RegExp

  • global
  • case insensitive
  • multiline

Testing

Currently passes 190 of the 217 tests from the Rust regex test suite:

https://raw.githubusercontent.com/att/ast/2012-08-01-master/src/cmd/re/basic.dat

About

A regex engine for AssemblyScript

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 68.9%
  • JavaScript 31.1%