Skip to content

Pull requests: EleutherAI/lm-evaluation-harness

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

docs: Fix typos in README.md
#2778 opened Mar 8, 2025 by ruivieira Loading…
Add GSM8K Platinum
#2771 opened Mar 7, 2025 by Qubitium Loading…
Add INCLUDE tasks
#2769 opened Mar 6, 2025 by agromanou Loading…
Fix for mc2 calculation
#2768 opened Mar 6, 2025 by kdymkiewicz Loading…
paws-x fix formatting
#2759 opened Mar 5, 2025 by baberabb Loading…
New benchmark: CaselawQA
#2739 opened Feb 26, 2025 by RicardoDominguez Loading…
Allow writing config to wandb
#2736 opened Feb 25, 2025 by ksurya Loading…
Capture gen_kwargs from CLI in squad_completion
#2727 opened Feb 23, 2025 by ksurya Loading…
Add support for sequence labeling
#2718 opened Feb 20, 2025 by jogonba2 Loading…
New healthcare benchmark: careqa
#2714 opened Feb 19, 2025 by PabloAgustin Loading…
Add AIBE task and utilities
#2712 opened Feb 18, 2025 by parimalthakre01 Loading…
Add Task (Financial mmlu ko)
#2699 opened Feb 14, 2025 by choics2623 Loading…
add audio modality (qwen2 audio only)
#2689 opened Feb 12, 2025 by artemorloff Loading…
Add generation variants of some tasks
#2688 opened Feb 11, 2025 by baberabb Loading…
Convert gen tasks to multiple_choice
#2670 opened Feb 4, 2025 by baberabb Draft
[hf-multimodal] pass kwargs to self.processor
#2667 opened Jan 31, 2025 by baberabb Loading…
Add from dataframe
#2655 opened Jan 25, 2025 by AMindToThink Loading…
humaneval instruct
#2650 opened Jan 22, 2025 by baberabb Loading…
Include all test files in sdist
#2634 opened Jan 19, 2025 by booxter Loading…
Add loncxt tasks
#2629 opened Jan 17, 2025 by baberabb Draft
ProTip! What’s not been updated in a month: updated:<2025-02-09.