Skip to content

wenqi-l/rrrm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

RRRL — Structured Reasoning & Step-Aware Selection for Best-of-N Inference

Research project investigating whether structured reasoning generation combined with step-aware reward-based selection improves best-of-N inference on dual-head language models (DHRD-style), across both classification and mathematical reasoning tasks.

Repository layout

.
├── docs/
│   ├── proposal.md             # Project proposal (title / description / main goal)
│   └── experiment_design.md    # Phase 1 experiment design (2×2 factorial) + Phase 2 sketch
├── paper/
│   ├── c2.pdf                  # C2 paper
│   ├── dual-head.pdf           # DHRD paper
│   ├── step_wise.pdf           # SRaR paper
│   ├── c2_code/C2/             # C2 reference code
│   └── step_wise_code/SRaR/    # SRaR reference code (fork of verl)
└── README.md

Reference code & licenses

This repository includes two third-party codebases, both Apache-2.0 (their original LICENSE files are preserved in place):

Status

Phase 1 (minimum viable demo): see docs/experiment_design.md. Phase 1 trains no models — the generator is frozen and only inference + selection are evaluated in a 2×2 factorial (structured/unstructured CoT × outcome-only/step-aware selection).

About

rubric-based reasoning reward model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors