-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Insights: EleutherAI/lm-evaluation-harness
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v0.4.8
published
Mar 5, 2025
10 Pull requests merged by 6 people
-
fix verbosity typo
#2765 merged
Mar 6, 2025 -
Bugfix
#2762 merged
Mar 5, 2025 -
Sae steered
#2750 merged
Mar 5, 2025 -
fix mmlu (generative) metric aggregation
#2761 merged
Mar 5, 2025 -
increment version to 0.4.8
#2760 merged
Mar 5, 2025 -
add debug log
#2757 merged
Mar 4, 2025 -
Add test for a simple Unitxt task
#2742 merged
Mar 4, 2025 -
Enable steering HF models
#2749 merged
Mar 4, 2025 -
fix doc: generate_until only outputs the generated text!
#2755 merged
Mar 3, 2025 -
Groundcocoa
#2724 merged
Mar 3, 2025
7 Pull requests opened by 7 people
-
paws-x fix formatting
#2759 opened
Mar 5, 2025 -
Fix for mc2 calculation
#2768 opened
Mar 6, 2025 -
Add INCLUDE tasks
#2769 opened
Mar 6, 2025 -
Add GSM8K Platinum
#2771 opened
Mar 7, 2025 -
Consistency Fix: Filter new leaderboard_math_hard dataset to "Level 5" only
#2773 opened
Mar 7, 2025 -
improvement: Use yaml.CLoader to load yaml files when available.
#2777 opened
Mar 8, 2025 -
docs: Fix typos in README.md
#2778 opened
Mar 8, 2025
1 Issue closed by 1 person
-
verbosity or verbostiy?
#2763 closed
Mar 6, 2025
11 Issues opened by 10 people
-
mmlu_pro bug in fewshot + chat_template
#2780 opened
Mar 9, 2025 -
CUDA Out of Memory
#2779 opened
Mar 9, 2025 -
Multi-NPU evaluation supported?
#2776 opened
Mar 7, 2025 -
Evaluating Pretrained LM always need few-shot example?
#2775 opened
Mar 7, 2025 -
Deviation in HumanEval Benchmark Results
#2774 opened
Mar 7, 2025 -
MMLU COT Giving less accuracy
#2770 opened
Mar 7, 2025 -
Layer-by-layer inference evaluation of large models.
#2767 opened
Mar 6, 2025 -
Add AIME 2024 and LiveCodeBenchmark to the gold standard evaluation harness
#2766 opened
Mar 6, 2025 -
cluade sonnet 3.5 and 3.7 humaneval 0%
#2764 opened
Mar 6, 2025 -
HF `batch_size=auto` unreliable
#2758 opened
Mar 4, 2025 -
API Model: Custom handling of refused prompt
#2756 opened
Mar 4, 2025
12 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
New benchmark: CaselawQA
#2739 commented on
Mar 7, 2025 • 2 new comments -
Smooth landing errors during post processing
#2751 commented on
Mar 3, 2025 • 0 new comments -
'NoneType' object is not callable!
#2752 commented on
Mar 4, 2025 • 0 new comments -
An error occurred: 'choices' (in openai chat completion)
#2740 commented on
Mar 5, 2025 • 0 new comments -
HOW TO ADD NEW TASK?
#2745 commented on
Mar 5, 2025 • 0 new comments -
Infer time by use library's external api is much longer than script
#2291 commented on
Mar 8, 2025 • 0 new comments -
Error loading MMLU 'prehistory' config: BuilderConfig not found (available: ['default'])
#2743 commented on
Mar 8, 2025 • 0 new comments -
add llama3 tasks
#2556 commented on
Mar 3, 2025 • 0 new comments -
Add loncxt tasks
#2629 commented on
Mar 7, 2025 • 0 new comments -
humaneval instruct
#2650 commented on
Mar 6, 2025 • 0 new comments -
Convert gen tasks to multiple_choice
#2670 commented on
Mar 3, 2025 • 0 new comments -
New healthcare benchmark: careqa
#2714 commented on
Mar 3, 2025 • 0 new comments