feat: Add LMEval Tier 1 tasks #68

christinaexyou · 2025-07-16T18:55:33Z

Adds list of Tier 1 tasks and their group description

ruivieira

LGTM, thanks!

ruivieira · 2025-07-16T19:52:57Z

docs/modules/ROOT/pages/component-lm-eval.adoc

+These tasks are fully supported by TrustyAI with guaranteed fixes and maintenance. They have been tested, validated, and monitored in the CI for reliability and reproducibility. They are selected according their presence on the OpenLLM leaderboard or their popularity (>10,0000 downloads on HuggingFace).
+[cols="1,2a", options="header"]
+|===
+|Name |https://github.com/opendatahub-io/lm-evaluation-harness/tree/release-0.4.9rc0/lm_eval/tasks[Task Group Description]


Suggested change

|Name |https://github.com/opendatahub-io/lm-evaluation-harness/tree/release-0.4.9rc0/lm_eval/tasks[Task Group Description]

|Name |https://github.com/opendatahub-io/lm-evaluation-harness/tree/incubation/lm_eval/tasks[Task Group Description]

ruivieira · 2025-07-16T19:55:19Z

docs/modules/ROOT/pages/component-lm-eval.adoc

+TrustyAI supports a subset of LMEval tasks to ensure reproducibility and reliability of the evaluation results. Tasks are categorized into three tiers based on our level of support: *Tier 1*, *Tier 2*, and *Tier 3*.
+
+=== Tier 1 Tasks
+These tasks are fully supported by TrustyAI with guaranteed fixes and maintenance. They have been tested, validated, and monitored in the CI for reliability and reproducibility. They are selected according their presence on the OpenLLM leaderboard or their popularity (>10,0000 downloads on HuggingFace).


Suggested change

These tasks are fully supported by TrustyAI with guaranteed fixes and maintenance. They have been tested, validated, and monitored in the CI for reliability and reproducibility. They are selected according their presence on the OpenLLM leaderboard or their popularity (>10,0000 downloads on HuggingFace).

These tasks are fully supported by TrustyAI with guaranteed fixes and maintenance. They have been tested, validated, and monitored in the CI for reliability and reproducibility. (footnote:[Tier 1 tasks were selected according their presence on the OpenLLM leaderboard or their popularity (>10,0000 downloads on HuggingFace).])

christinaexyou requested a review from ruivieira July 16, 2025 18:55

ruivieira approved these changes Jul 16, 2025

View reviewed changes

feat: Add LMEval Tier 1 tasks

35f68ea

christinaexyou force-pushed the add-lmeval-tier1-tasks branch from 99f4290 to 35f68ea Compare July 16, 2025 22:15

christinaexyou merged commit 5da46d8 into trustyai-explainability:main Jul 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add LMEval Tier 1 tasks #68

feat: Add LMEval Tier 1 tasks #68

Uh oh!

christinaexyou commented Jul 16, 2025

Uh oh!

ruivieira left a comment

Uh oh!

ruivieira Jul 16, 2025

Uh oh!

ruivieira Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	\|Name \|https://github.com/opendatahub-io/lm-evaluation-harness/tree/release-0.4.9rc0/lm_eval/tasks[Task Group Description]
	\|Name \|https://github.com/opendatahub-io/lm-evaluation-harness/tree/incubation/lm_eval/tasks[Task Group Description]

	These tasks are fully supported by TrustyAI with guaranteed fixes and maintenance. They have been tested, validated, and monitored in the CI for reliability and reproducibility. They are selected according their presence on the OpenLLM leaderboard or their popularity (>10,0000 downloads on HuggingFace).
	These tasks are fully supported by TrustyAI with guaranteed fixes and maintenance. They have been tested, validated, and monitored in the CI for reliability and reproducibility. (footnote:[Tier 1 tasks were selected according their presence on the OpenLLM leaderboard or their popularity (>10,0000 downloads on HuggingFace).])

feat: Add LMEval Tier 1 tasks #68

feat: Add LMEval Tier 1 tasks #68

Uh oh!

Conversation

christinaexyou commented Jul 16, 2025

Uh oh!

ruivieira left a comment

Choose a reason for hiding this comment

Uh oh!

ruivieira Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

ruivieira Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants