Skip to content

Pull requests: EleutherAI/lm-evaluation-harness

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Add LM-SynEval Benchmark
#3184 opened Jul 24, 2025 by jmichaelov Loading…
Update MMLU-ProX task
#3174 opened Jul 22, 2025 by weihao1115 Loading…
3 of 6 tasks
feat: Add CLIcK task
#3173 opened Jul 22, 2025 by shing100 Loading…
Add eqbench tasks in Spanish and Catalan
#3168 opened Jul 21, 2025 by priverabsc Loading…
Add EsBBQ and CaBBQ tasks
#3167 opened Jul 21, 2025 by valleruizf Loading…
Bugfix: update hellaswag ds path
#3158 opened Jul 18, 2025 by marawangamal Loading…
Add task dynamic_ifeval
#3149 opened Jul 15, 2025 by davideguidobene Loading…
Add DETAILS.md for improved documentation
#3141 opened Jul 12, 2025 by ginylil-tech Loading…
Add tasklist
#3133 opened Jul 11, 2025 by baberabb Loading…
Update README.md for mlqa
#3117 opened Jul 7, 2025 by newme616 Loading…
set repeat metrics from config
#3109 opened Jul 5, 2025 by baberabb Loading…
add metric configs
#3105 opened Jul 4, 2025 by baberabb Loading…
add strip_thinking param
#3087 opened Jun 26, 2025 by baberabb Loading…
Fix Typo in Answer Explanation
#3071 opened Jun 19, 2025 by leopardracer Loading…
improve include-path precedence handling
#3068 opened Jun 18, 2025 by parkhs21 Loading…
ProTip! Exclude everything labeled bug with -label:bug.