EleutherAI / lm-evaluation-harness Public

Notifications
Fork 2.6k
Star 9.7k

Code
Issues 452
Pull requests 149
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: EleutherAI/lm-evaluation-harness

Labels 10 Milestones 1

New pull request New

149 Open 1,581 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Fix add_bos_token not updated for Gemma tokenizer

#3206 opened Aug 4, 2025 by DarkLight1337

Loading…

Support for DDP+MP with native torch and no accelerate

#3205 opened Aug 3, 2025 by xgal

Loading…

feat: COT trace response handling in evaluator and model classes

#3204 opened Aug 3, 2025 by hhh2210

Loading…

Add new task: kmmlu_pro, kmmlu_redux

#3198 opened Aug 1, 2025 by jeonghodot

Loading…

Add xnli_va dataset

#3194 opened Jul 30, 2025 by FranValero97

Loading…

Fixed #2552: Improve answer extraction for hendrycks_math

#3192 opened Jul 30, 2025 by JoonYong-Park

Loading…

refactor registry

#3189 opened Jul 28, 2025 by baberabb

Loading…

Leverage vllm's tokenizer_info endpoint to avoid manual duplication

#3185 opened Jul 25, 2025 by m-misiura

Loading…

Add LM-SynEval Benchmark

#3184 opened Jul 24, 2025 by jmichaelov

Loading…

Update MMLU-ProX task

#3174 opened Jul 22, 2025 by weihao1115

Loading…

3 of 6 tasks

feat: Add CLIcK task

#3173 opened Jul 22, 2025 by shing100

Loading…

Remove generate_until (multiple_target and doc_to_choice indexing) logic in ConfigurableTask.process_results

#3169 opened Jul 21, 2025 by baberabb

Loading…

Add eqbench tasks in Spanish and Catalan

#3168 opened Jul 21, 2025 by priverabsc

Loading…

Add EsBBQ and CaBBQ tasks

#3167 opened Jul 21, 2025 by valleruizf

Loading…

Bugfix: set default SamplingParams based on generation_config

#3160 opened Jul 18, 2025 by cuttle-fish-my

Loading…

Bugfix: update hellaswag ds path

#3158 opened Jul 18, 2025 by marawangamal

Loading…

Feat/add permutation benchmark/task to lm-evaluation-harness

#3157 opened Jul 18, 2025 by BeeGass

Loading…

Add task dynamic_ifeval

#3149 opened Jul 15, 2025 by davideguidobene

Loading…

Add DETAILS.md for improved documentation

#3141 opened Jul 12, 2025 by ginylil-tech

Loading…

Add tasklist

#3133 opened Jul 11, 2025 by baberabb

Loading…

set repeat metrics from config

#3109 opened Jul 5, 2025 by baberabb

Loading…

add metric configs

#3105 opened Jul 4, 2025 by baberabb

Loading…

Add support for OpenVINO text2text generation models

#3101 opened Jul 3, 2025 by nikita-savelyevv • Draft

Adds Anthropic/discrim-eval to lm-evaluation-harness

#3091 opened Jun 27, 2025 by Helw150

Loading…

feat(api_models): add enable_thinking param in chat_template_kwargs

#3088 opened Jun 27, 2025 by johnsonafool

Loading…

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!