Skip to content

Actions: EleutherAI/lm-evaluation-harness

All workflows

Actions

Loading...
Loading

Showing runs from all workflows
4,964 workflow runs
4,964 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

docs: Fix typos in README.md
Unit Tests #4385: Pull request #2778 opened by ruivieira
March 8, 2025 20:46 Action required ruivieira:patch-1
March 8, 2025 20:46 Action required
docs: Fix typos in README.md
Tasks Modified #4413: Pull request #2778 opened by ruivieira
March 8, 2025 20:46 Action required ruivieira:patch-1
March 8, 2025 20:46 Action required
New benchmark: CaselawQA
Tasks Modified #4411: Pull request #2739 synchronize by RicardoDominguez
March 7, 2025 14:15 Action required socialfoundations:caselawqa
March 7, 2025 14:15 Action required
New benchmark: CaselawQA
Unit Tests #4383: Pull request #2739 synchronize by RicardoDominguez
March 7, 2025 14:15 Action required socialfoundations:caselawqa
March 7, 2025 14:15 Action required
Consistency Fix: Filter new leaderboard_math_hard dataset to "Level 5" only
Unit Tests #4382: Pull request #2773 opened by perlitz
March 7, 2025 09:29 Action required perlitz:patch-1
March 7, 2025 09:29 Action required
Consistency Fix: Filter new leaderboard_math_hard dataset to "Level 5" only
Tasks Modified #4410: Pull request #2773 opened by perlitz
March 7, 2025 09:29 Action required perlitz:patch-1
March 7, 2025 09:29 Action required
FIX: Filter new MATH replacement dataset to "Level 5" only
Tasks Modified #4409: Pull request #2772 opened by perlitz
March 7, 2025 09:02 Action required perlitz:patch-1
March 7, 2025 09:02 Action required
FIX: Filter new MATH replacement dataset to "Level 5" only
Unit Tests #4381: Pull request #2772 opened by perlitz
March 7, 2025 09:02 Action required perlitz:patch-1
March 7, 2025 09:02 Action required
Add GSM8K Platinum
Unit Tests #4380: Pull request #2771 synchronize by Qubitium
March 7, 2025 08:26 8m 57s ModelCloud:gms8k-platinum
March 7, 2025 08:26 8m 57s
Add GSM8K Platinum
Tasks Modified #4408: Pull request #2771 synchronize by Qubitium
March 7, 2025 08:26 2m 8s ModelCloud:gms8k-platinum
March 7, 2025 08:26 2m 8s
Add GSM8K Platinum
Tasks Modified #4407: Pull request #2771 synchronize by Qubitium
March 7, 2025 08:06 2m 15s ModelCloud:gms8k-platinum
March 7, 2025 08:06 2m 15s
Add GSM8K Platinum
Unit Tests #4379: Pull request #2771 synchronize by Qubitium
March 7, 2025 08:06 8m 58s ModelCloud:gms8k-platinum
March 7, 2025 08:06 8m 58s
Add GSM8K Platinum
Tasks Modified #4406: Pull request #2771 synchronize by Qubitium
March 7, 2025 08:05 1m 37s ModelCloud:gms8k-platinum
March 7, 2025 08:05 1m 37s
Add GSM8K Platinum
Unit Tests #4378: Pull request #2771 synchronize by Qubitium
March 7, 2025 08:05 7m 54s ModelCloud:gms8k-platinum
March 7, 2025 08:05 7m 54s
Add GSM8K Platinum
Tasks Modified #4405: Pull request #2771 opened by Qubitium
March 7, 2025 07:38 1m 46s ModelCloud:gms8k-platinum
March 7, 2025 07:38 1m 46s
Add GSM8K Platinum
Unit Tests #4377: Pull request #2771 opened by Qubitium
March 7, 2025 07:38 8m 41s ModelCloud:gms8k-platinum
March 7, 2025 07:38 8m 41s
Add loncxt tasks
Tasks Modified #4404: Pull request #2629 synchronize by baberabb
March 7, 2025 03:29 1m 58s longcxt
March 7, 2025 03:29 1m 58s
Add loncxt tasks
Unit Tests #4376: Pull request #2629 synchronize by baberabb
March 7, 2025 03:29 4m 16s longcxt
March 7, 2025 03:29 4m 16s
Add loncxt tasks
Tasks Modified #4403: Pull request #2629 synchronize by baberabb
March 6, 2025 16:51 1m 43s longcxt
March 6, 2025 16:51 1m 43s
Add loncxt tasks
Unit Tests #4375: Pull request #2629 synchronize by baberabb
March 6, 2025 16:51 9m 20s longcxt
March 6, 2025 16:51 9m 20s
Add INCLUDE tasks
Tasks Modified #4402: Pull request #2769 opened by agromanou
March 6, 2025 12:22 20m 55s agromanou:include
March 6, 2025 12:22 20m 55s
Add INCLUDE tasks
Unit Tests #4374: Pull request #2769 opened by agromanou
March 6, 2025 12:22 8m 50s agromanou:include
March 6, 2025 12:22 8m 50s
Fix for mc2 calculation
Tasks Modified #4401: Pull request #2768 opened by kdymkiewicz
March 6, 2025 11:52 2m 15s kdymkiewicz:truthfulqa_mc2_fix_v2
March 6, 2025 11:52 2m 15s