MathConstraint

MathConstraint is a solver-verified benchmark for evaluating language models on combinatorial constraint-satisfaction problems. Each instance is generated from a formal PyCSP3 or SAT-modulo-symmetries backend, and submitted answers are checked by the same solver stack used during generation.

Datasets

Our datasets are available on Hugging Face:

This repository also includes frozen local copies:

mathconstraint/: harder benchmark split
mathconstraint_easy/: smaller/easier split with hinted and unhinted instances

Each dataset contains instances/*.json, problems.jsonl, and manifest.json.

Installation

Clone with submodules:

git clone --recurse-submodules git@github.com:vireshpati/Math-Constraint.git
cd Math-Constraint

If you already cloned without submodules:

git submodule update --init --recursive

Create an environment with Python 3.10+:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]" -e "third_party/sat-modulo-symmetries"

PySMS graph problems also require the smsg binary:

cd third_party/sat-modulo-symmetries
./build-and-install.sh --local
export PATH="$HOME/.local/bin:$PATH"
cd ../..

The smsg build requires CMake, a C++ compiler, and Boost.

Evaluate A Model

Set an API key for your provider:

export OPENROUTER_API_KEY=...

Run a no-tool evaluation:

python -m math_constraint eval \
  --provider openrouter \
  --model openai/gpt-oss-120b:free \
  --dataset mathconstraint_easy \
  --output results/gpt-oss-120b/easy/no_tools.json \
  --workers 4 \
  --max-tokens 4096 \
  --temperature 0

Run with tool use:

python -m math_constraint eval \
  --provider openrouter \
  --model openai/gpt-oss-120b:free \
  --dataset mathconstraint \
  --output results/gpt-oss-120b/hard/tools.json \
  --enable-tools \
  --max-tool-rounds 8 \
  --workers 4 \
  --max-tokens 4096 \
  --temperature 0

Evaluation writes an aggregate JSON file plus per-instance traces under the output directory.

Python API

from math_constraint import load_dataset
from math_constraint.eval import EvalConfig, Evaluator, compute_metrics
from math_constraint.llm import create_client

problems = load_dataset("mathconstraint_easy")[:10]

config = EvalConfig.from_provider(
    provider="openrouter",
    model="openai/gpt-oss-120b:free",
    temperature=0.0,
    max_tokens=4096,
    enable_tools=False,
)

client = create_client(config.provider, config.model, base_url=config.base_url)
evaluator = Evaluator(config, client)

results = [evaluator.evaluate(problem) for problem in problems]
metrics = compute_metrics(results)

print(metrics.accuracy)

Answer Format

Models should answer with JSON:

{
  "satisfiable": true,
  "solution": [],
  "reasoning": "brief explanation"
}

For UNSAT instances, use null for the solution:

{
  "satisfiable": false,
  "solution": null,
  "reasoning": "brief explanation"
}

When --enable-tools is set, the evaluator exposes two tools:

execute_python(code, timeout_seconds): run short helper Python in an isolated subprocess.
submit_answer(satisfiable, solution, reasoning): submit the final structured answer.

Tests

Lightweight tests:

MCNST_SKIP_SOLVER_TESTS=1 python -m pytest -q

Full tests require PyCSP3, PySMS, and smsg on PATH:

python -m pytest -q

License

This repository and the MathConstraint datasets are released under the Creative Commons Attribution 4.0 International License.

Acknowledgments

MathConstraint builds on PyCSP3, pycsp3-models, and sat-modulo-symmetries.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
mathconstraint		mathconstraint
mathconstraint_easy		mathconstraint_easy
results		results
scripts		scripts
src/math_constraint		src/math_constraint
tests		tests
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MathConstraint

Datasets

Installation

Evaluate A Model

Python API

Answer Format

Tests

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MathConstraint

Datasets

Installation

Evaluate A Model

Python API

Answer Format

Tests

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages