Skip to content

Conversation

@christinaexyou
Copy link
Contributor

Adds list of Tier 1 tasks and their group description

@christinaexyou christinaexyou requested a review from ruivieira July 16, 2025 18:55
Copy link
Member

@ruivieira ruivieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

These tasks are fully supported by TrustyAI with guaranteed fixes and maintenance. They have been tested, validated, and monitored in the CI for reliability and reproducibility. They are selected according their presence on the OpenLLM leaderboard or their popularity (>10,0000 downloads on HuggingFace).
[cols="1,2a", options="header"]
|===
|Name |https://github.com/opendatahub-io/lm-evaluation-harness/tree/release-0.4.9rc0/lm_eval/tasks[Task Group Description]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|Name |https://github.com/opendatahub-io/lm-evaluation-harness/tree/release-0.4.9rc0/lm_eval/tasks[Task Group Description]
|Name |https://github.com/opendatahub-io/lm-evaluation-harness/tree/incubation/lm_eval/tasks[Task Group Description]

TrustyAI supports a subset of LMEval tasks to ensure reproducibility and reliability of the evaluation results. Tasks are categorized into three tiers based on our level of support: *Tier 1*, *Tier 2*, and *Tier 3*.

=== Tier 1 Tasks
These tasks are fully supported by TrustyAI with guaranteed fixes and maintenance. They have been tested, validated, and monitored in the CI for reliability and reproducibility. They are selected according their presence on the OpenLLM leaderboard or their popularity (>10,0000 downloads on HuggingFace).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These tasks are fully supported by TrustyAI with guaranteed fixes and maintenance. They have been tested, validated, and monitored in the CI for reliability and reproducibility. They are selected according their presence on the OpenLLM leaderboard or their popularity (>10,0000 downloads on HuggingFace).
These tasks are fully supported by TrustyAI with guaranteed fixes and maintenance. They have been tested, validated, and monitored in the CI for reliability and reproducibility. (footnote:[Tier 1 tasks were selected according their presence on the OpenLLM leaderboard or their popularity (>10,0000 downloads on HuggingFace).])

@christinaexyou christinaexyou force-pushed the add-lmeval-tier1-tasks branch from 99f4290 to 35f68ea Compare July 16, 2025 22:15
@christinaexyou christinaexyou merged commit 5da46d8 into trustyai-explainability:main Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants