Skip to content

Add QA support to Evals #210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 26, 2025
Merged

Add QA support to Evals #210

merged 6 commits into from
Jun 26, 2025

Conversation

vballoli
Copy link
Contributor

No description provided.

@vballoli vballoli changed the title Add QA support w/ SimpleQA Add QA support to Evals Jun 20, 2025
@vballoli vballoli force-pushed the QA branch 2 times, most recently from 5e0b5e3 to bcb0c90 Compare June 23, 2025 22:11
@vballoli vballoli requested a review from husseinmozannar June 23, 2025 22:14
@vballoli vballoli marked this pull request as ready for review June 23, 2025 22:14
@vballoli vballoli force-pushed the QA branch 2 times, most recently from f29fda2 to 76ff94d Compare June 23, 2025 23:19
Copy link
Contributor

@husseinmozannar husseinmozannar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to make quite a few changes to get it to run, but all good now

@husseinmozannar
Copy link
Contributor

@tylerpayne fyi, adding GPQA and SimpleQA

@vballoli
Copy link
Contributor Author

@husseinmozannar This looks great, thank you for the changes!

@husseinmozannar husseinmozannar merged commit 2db1d95 into main Jun 26, 2025
9 checks passed
@husseinmozannar husseinmozannar deleted the QA branch June 26, 2025 18:21
stefanoamorelli pushed a commit to stefanoamorelli/magentic-ui that referenced this pull request Jun 29, 2025
Co-authored-by: Hussein Mozannar <hssein.mzannar@gmail.com>
Co-authored-by: Hussein Mozannar <hmozannar@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants