EleutherAI / lm-evaluation-harness Public

Notifications
Fork 2.6k
Star 9.7k

Code
Issues 448
Pull requests 146
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: EleutherAI/lm-evaluation-harness

Labels 10 Milestones 1

New pull request New

146 Open 1,570 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Leverage vllm's tokenizer_info endpoint to avoid manual duplication

#3185 opened Jul 25, 2025 by m-misiura

Loading…

Add LM-SynEval Benchmark

#3184 opened Jul 24, 2025 by jmichaelov

Loading…

Update MMLU-ProX task

#3174 opened Jul 22, 2025 by weihao1115

Loading…

3 of 6 tasks

feat: Add CLIcK task

#3173 opened Jul 22, 2025 by shing100

Loading…

Remove generate_until (multiple_target and doc_to_choice indexing) logic in ConfigurableTask.process_results

#3169 opened Jul 21, 2025 by baberabb

Loading…

Add eqbench tasks in Spanish and Catalan

#3168 opened Jul 21, 2025 by priverabsc

Loading…

Add EsBBQ and CaBBQ tasks

#3167 opened Jul 21, 2025 by valleruizf

Loading…

Bugfix: set default SamplingParams based on generation_config

#3160 opened Jul 18, 2025 by cuttle-fish-my

Loading…

Bugfix: update hellaswag ds path

#3158 opened Jul 18, 2025 by marawangamal

Loading…

Feat/add permutation benchmark/task to lm-evaluation-harness

#3157 opened Jul 18, 2025 by BeeGass

Loading…

Add task dynamic_ifeval

#3149 opened Jul 15, 2025 by davideguidobene

Loading…

Add DETAILS.md for improved documentation

#3141 opened Jul 12, 2025 by ginylil-tech

Loading…

Fix mmlu_continuation subgroup names to fit Readme and other variants

#3137 opened Jul 11, 2025 by lamalunderscore

Loading…

Add tasklist

#3133 opened Jul 11, 2025 by baberabb

Loading…

Update README.md for mlqa

#3117 opened Jul 7, 2025 by newme616

Loading…

set repeat metrics from config

#3109 opened Jul 5, 2025 by baberabb

Loading…

add metric configs

#3105 opened Jul 4, 2025 by baberabb

Loading…

Add support for OpenVINO text2text generation models

#3101 opened Jul 3, 2025 by nikita-savelyevv • Draft

Adds Anthropic/discrim-eval to lm-evaluation-harness

#3091 opened Jun 27, 2025 by Helw150

Loading…

feat(api_models): add enable_thinking param in chat_template_kwargs

#3088 opened Jun 27, 2025 by johnsonafool

Loading…

add strip_thinking param

#3087 opened Jun 26, 2025 by baberabb

Loading…

Refactor ConfigurableTask.process_results into modular helpers

#3085 opened Jun 25, 2025 by mfisher35

Loading…

[Proposal] Change hyphens in n-shot and n-samples to underscores

#3084 opened Jun 24, 2025 by kiersten-stokes

Loading…

Fix Typo in Answer Explanation

#3071 opened Jun 19, 2025 by leopardracer

Loading…

improve include-path precedence handling

#3068 opened Jun 18, 2025 by parkhs21

Loading…

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!