Bug Description
When I try solving the problem adaptive-rejection-sampler, I could not even reproduce 1/5 pass rate on this question compared to the result on leaderboard with 5/5 pass rate. Please help me to understand do I use it wrongly?
The failure is often due to ars function signature is vague, how can the model be sure that each run could get exactly the same signature with the one inside test folder.
Is there any chance this agent accessing the test file from internet? Will that be fair for the test if so? And how can we ensure we do not have such leakage during evaluation. Thanks!
Steps to Reproduce
- create instruction.md
- forge --verbose -p "$(cat instruction.md)" 2> forge-verbose.log`
Expected Behavior
Pass the verifier.
Actual Behavior
test often fails for the generated app/ars.R.
Signature of ForgeCode's output:
function (f, n, lower = -Inf, upper = Inf, init = NULL, max_points = 500L,
max_iter = 1e+06)
Call from test's code:
ars(normal_density, c(-5, 5), n = 1000)
Forge Version
2.12.7
Operating System & Version
macOS
AI Provider
OpenRouter
Model
anthropic/claude-opus-4-6
Installation Method
npx forgecode@latest
Configuration
Bug Description
When I try solving the problem adaptive-rejection-sampler, I could not even reproduce 1/5 pass rate on this question compared to the result on leaderboard with 5/5 pass rate. Please help me to understand do I use it wrongly?
The failure is often due to
arsfunction signature is vague, how can the model be sure that each run could get exactly the same signature with the one inside test folder.Is there any chance this agent accessing the test file from internet? Will that be fair for the test if so? And how can we ensure we do not have such leakage during evaluation. Thanks!
Steps to Reproduce
Expected Behavior
Pass the verifier.
Actual Behavior
test often fails for the generated
app/ars.R.Signature of ForgeCode's output:
Call from test's code:
Forge Version
2.12.7
Operating System & Version
macOS
AI Provider
OpenRouter
Model
anthropic/claude-opus-4-6
Installation Method
npx forgecode@latest
Configuration