HyperGuide

A two-stage method that steers a frozen LLM through reasoning trees with hyperbolic geometric guidance.

Stage 1 trains a small head that embeds reasoning-tree states into a Poincaré ball so that distance-to-origin tracks solution-proximity (target reachable in fewer remaining steps ⇔ point closer to origin).
Stage 2 trains a fresh LoRA + a small up-projector on top of the frozen base LLM using DAgger with a tree oracle. The current policy rolls out trajectories; the oracle labels winning operations at each reached state; cross-entropy supervises the LoRA on those labels. The frozen head's geometric z is injected as a virtual token at every step boundary.

Setup

pip install -r requirements.txt

End-to-end pipelines

1. Game-of-24

1.1 Build the data

python data/generate_24_splits.py
python data/download_24_tot_test.py

1.2 Pre-cache the hidden-state trees (Stage-1 input)

bash scripts/run_gen_tree_data_g24.sh

1.3 Stage 1 — train + evaluate the hyperbolic head

bash scripts/run_train_head.sh configs/head_qwen14b_origin_ranking.yaml

1.4 Stage 2 — DAgger trainer + inference + accuracy eval

bash scripts/run_train_stage2_dagger_g24.sh z 1234

# Ablations
bash scripts/run_train_stage2_dagger_g24.sh noz   1234 
bash scripts/run_train_stage2_dagger_g24.sh randz 1234

1.5 ToT baseline (Yao et al. 2023, paper-faithful)

bash scripts/run_tot_g24.sh

Defaults: n_generate=1, n_evaluate=3, n_select=5, T=0.7, single-model (same base for propose + evaluate), chat-template prompts.

1.6 Planning Tokens baseline (Wang et al. 2023)

python data/prepare_pt_g24_data.py   

bash scripts/run_train_pt_sft.sh configs/sft_pt_24_qwen14b.yaml

python -m src.eval_pt_g24 \
    --lora_adapter checkpoints/sft_pt_24_qwen14b \
    --test_data data/24_test_tot.jsonl \
    --output results/pt_sft_24/generations.jsonl
python -m src.score_ood --task g24 --input results/pt_sft_24/generations.jsonl

2. Rulechain

2.1 Build the data

python data/generate_data_rulechain.py

2.2 Pre-cache rulechain trees

bash scripts/run_gen_tree_data_rulechain.sh

Writes per-problem tree metadata + hidden-state memmaps to data/rulechain_trees_qwen14b/{train,val,test}/.

2.3 Stage 1 — train the rulechain head

bash scripts/run_train_head.sh configs/head_rulechain_qwen14b.yaml

2.4 Stage 2 — DAgger trainer + inference + scoring

bash scripts/run_train_stage2_dagger_rulechain.sh z      # main HyperGuide
bash scripts/run_train_stage2_dagger_rulechain.sh noz    # ablation
bash scripts/run_train_stage2_dagger_rulechain.sh randz  # noise control

The launcher trains DDP across detected GPUs, runs inference via src.eval_ood_generic, and scores with src.score_ood.

2.5 ToT baseline

bash scripts/run_tot_rulechain.sh
python -m src.score_ood --task rulechain --input results/tot_rulechain/generations.jsonl

The rulechain ToT adapter proposes one forward-chaining rule application per step, scores each candidate as sure/likely/impossible, and selects the top-5 by value sum.

2.6 Planning Tokens baseline

# 1. Build PT-augmented training trajectories
python data/prepare_pt_rulechain_data.py

# 2. SFT a LoRA
bash scripts/run_train_pt_sft.sh configs/sft_pt_rulechain_qwen14b.yaml

# 3. Evaluate + score
python -m src.eval_pt_ood \
    --task rulechain \
    --lora_adapter checkpoints/sft_pt_rulechain_qwen14b \
    --test_data data/rulechain_test.jsonl \
    --output results/pt_sft_rulechain/generations.jsonl
python -m src.score_ood --task rulechain --input results/pt_sft_rulechain/generations.jsonl

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data		data
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HyperGuide

Setup

End-to-end pipelines

1. Game-of-24

1.1 Build the data

1.2 Pre-cache the hidden-state trees (Stage-1 input)

1.3 Stage 1 — train + evaluate the hyperbolic head

1.4 Stage 2 — DAgger trainer + inference + accuracy eval

1.5 ToT baseline (Yao et al. 2023, paper-faithful)

1.6 Planning Tokens baseline (Wang et al. 2023)

2. Rulechain

2.1 Build the data

2.2 Pre-cache rulechain trees

2.3 Stage 1 — train the rulechain head

2.4 Stage 2 — DAgger trainer + inference + scoring

2.5 ToT baseline

2.6 Planning Tokens baseline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HyperGuide

Setup

End-to-end pipelines

1. Game-of-24

1.1 Build the data

1.2 Pre-cache the hidden-state trees (Stage-1 input)

1.3 Stage 1 — train + evaluate the hyperbolic head

1.4 Stage 2 — DAgger trainer + inference + accuracy eval

1.5 ToT baseline (Yao et al. 2023, paper-faithful)

1.6 Planning Tokens baseline (Wang et al. 2023)

2. Rulechain

2.1 Build the data

2.2 Pre-cache rulechain trees

2.3 Stage 1 — train the rulechain head

2.4 Stage 2 — DAgger trainer + inference + scoring

2.5 ToT baseline

2.6 Planning Tokens baseline

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages