Pulse · EleutherAI/lm-evaluation-harness

March 2, 2025 – March 9, 2025

Please try again later

fix verbosity typo
#2765 merged Mar 6, 2025
Bugfix
#2762 merged Mar 5, 2025
Sae steered
#2750 merged Mar 5, 2025
fix mmlu (generative) metric aggregation
#2761 merged Mar 5, 2025
increment version to 0.4.8
#2760 merged Mar 5, 2025
add debug log
#2757 merged Mar 4, 2025
Add test for a simple Unitxt task
#2742 merged Mar 4, 2025
Enable steering HF models
#2749 merged Mar 4, 2025
fix doc: generate_until only outputs the generated text!
#2755 merged Mar 3, 2025
Groundcocoa
#2724 merged Mar 3, 2025

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

New benchmark: CaselawQA
#2739 commented on Mar 7, 2025 • 2 new comments
Smooth landing errors during post processing
#2751 commented on Mar 3, 2025 • 0 new comments
'NoneType' object is not callable!
#2752 commented on Mar 4, 2025 • 0 new comments
An error occurred: 'choices' (in openai chat completion)
#2740 commented on Mar 5, 2025 • 0 new comments
HOW TO ADD NEW TASK?
#2745 commented on Mar 5, 2025 • 0 new comments
Infer time by use library's external api is much longer than script
#2291 commented on Mar 8, 2025 • 0 new comments
Error loading MMLU 'prehistory' config: BuilderConfig not found (available: ['default'])
#2743 commented on Mar 8, 2025 • 0 new comments
add llama3 tasks
#2556 commented on Mar 3, 2025 • 0 new comments
Add loncxt tasks
#2629 commented on Mar 7, 2025 • 0 new comments
humaneval instruct
#2650 commented on Mar 6, 2025 • 0 new comments
Convert gen tasks to multiple_choice
#2670 commented on Mar 3, 2025 • 0 new comments
New healthcare benchmark: careqa
#2714 commented on Mar 3, 2025 • 0 new comments