RRRL — Structured Reasoning & Step-Aware Selection for Best-of-N Inference

Research project investigating whether structured reasoning generation combined with step-aware reward-based selection improves best-of-N inference on dual-head language models (DHRD-style), across both classification and mathematical reasoning tasks.

Repository layout

.
├── docs/
│   ├── proposal.md             # Project proposal (title / description / main goal)
│   └── experiment_design.md    # Phase 1 experiment design (2×2 factorial) + Phase 2 sketch
├── paper/
│   ├── c2.pdf                  # C2 paper
│   ├── dual-head.pdf           # DHRD paper
│   ├── step_wise.pdf           # SRaR paper
│   ├── c2_code/C2/             # C2 reference code
│   └── step_wise_code/SRaR/    # SRaR reference code (fork of verl)
└── README.md

Reference code & licenses

This repository includes two third-party codebases, both Apache-2.0 (their original LICENSE files are preserved in place):

C2 — Cooperative-Critical Reward Modeling — upstream: https://github.com/asahi-research/C2
SRaR — Step-wise Rubrics as Rewards — upstream: https://github.com/akarinmoe/SRaR

Status

Phase 1 (minimum viable demo): see docs/experiment_design.md. Phase 1 trains no models — the generator is frozen and only inference + selection are evaluated in a 2×2 factorial (structured/unstructured CoT × outcome-only/step-aware selection).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
paper		paper
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RRRL — Structured Reasoning & Step-Aware Selection for Best-of-N Inference

Repository layout

Reference code & licenses

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RRRL — Structured Reasoning & Step-Aware Selection for Best-of-N Inference

Repository layout

Reference code & licenses

Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages