matchbox

A tiny regular expression evaluator based on Abstract Syntax Tree (AST) in Python.

This project provides a simple regex engine that first parses a regular expression into an AST and then evaluates it against a given text.

Supported Syntax

. : Any single character
* : Zero or more repetitions of the preceding character
+ : One or more repetitions of the preceding character
? : Zero or one repetition of the preceding character
| : Alternation (OR)
(): Grouping
^ : Start of the string anchor
$ : End of the string anchor

How to Run

1. Command Line Interface (Recommended)

You can execute the regex match from the command line using the matchbox module. It supports both full matching and searching.

# Full match (the entire text must match the pattern)
python -m matchbox <pattern> <text>

# Search (find a substring that matches the pattern)
python -m matchbox <pattern> <text> --search

Examples:

# Full match
python -m matchbox "a(b|c)*d" "acccbd"
# ✅ Matched

# Search
python -m matchbox "a*b" "xxaaabyy" --search
# ✅ Matched

Tracing the Evaluation

Use the --trace option to visualize the backtracking process of the AST evaluator. This is useful for debugging your patterns. The output is colorized using the rich library.

python -m matchbox "a(b|c)d" "acd" --trace

2. Interactive Mode

You can also run an interactive prompt.

python main.py

Example:

Mini Regex Engine (matchbox)
Enter regex: a*b
Enter text: aaab
✅ MATCH!

Key Concepts

This regex engine is composed of three main components:

Lexer (lexer.py): The lexer takes a regular expression string and breaks it down into a sequence of tokens (e.g., CHAR, *, |).
Parser (parser.py): The parser consumes the token stream from the lexer and builds an Abstract Syntax Tree (AST). The nodes of this tree (ast_nodes.py) represent the structure of the regular expression (e.g., ConcatNode, OrNode, RepeatNode).
Evaluator (evaluator.py): The evaluator traverses the AST to determine if the pattern matches the input text. The fullmatch and search functions are the entry points, which recursively call the match method on each node. This AST-based approach allows for handling complex nested structures and provides clear evaluation paths, which can be traced for debugging.

Directory Structure

mini-regex/
├── main.py               # Entry point for interactive mode
├── requirements.txt      # Dependencies
├── matchbox/
│   ├── __main__.py       # Entry point for CLI mode
│   ├── lexer.py          # Lexical analysis
│   ├── parser.py         # AST construction
│   ├── ast_nodes.py      # AST node definitions
│   └── evaluator.py      # AST-based regex evaluation
└── tests/
    ├── test_evaluator.py # Tests for the AST-based evaluator
    ├── test_parser.py    # Tests for the parser
    └── test_matchbox.py  # Legacy tests (can be merged or removed)

How to Test

To run the test suite, install the dependencies and run pytest.

pip install -r requirements.txt
pytest

Code Quality

This project uses black for code formatting and ruff for linting to ensure code quality and consistency.

Running Manually

# Format code with black
.venv/bin/black .

# Lint and auto-fix with ruff
.venv/bin/ruff check --fix .

Configuration

Currently, both tools run with their default settings as there is no specific configuration file.

To customize their behavior, you can create a pyproject.toml file in the project root. For example:

# pyproject.toml

[tool.black]
line-length = 88

[tool.ruff]
line-length = 88
select = ["E", "F", "I"] # Enable specific linting rules

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
matchbox		matchbox
tests		tests
.gitignore		.gitignore
README.md		README.md
comment.md		comment.md
learning.md		learning.md
main.py		main.py
package-lock.json		package-lock.json
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

matchbox

Supported Syntax

How to Run

1. Command Line Interface (Recommended)

Tracing the Evaluation

2. Interactive Mode

Key Concepts

Directory Structure

How to Test

Code Quality

Running Manually

Configuration

About

Uh oh!

Releases

Packages

Languages

velengel/mini-regex

Folders and files

Latest commit

History

Repository files navigation

matchbox

Supported Syntax

How to Run

1. Command Line Interface (Recommended)

Tracing the Evaluation

2. Interactive Mode

Key Concepts

Directory Structure

How to Test

Code Quality

Running Manually

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages