-
Notifications
You must be signed in to change notification settings - Fork 3
POC: endpoint to get evals info #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Sentry detected 2 potential issues in your recent changesSuspicion of an AttributeError in apps/codecov-api/api/public/v2/evals/views.py:99. Calling .get() on the testrun.properties JSONField could fail if the field is None, potentially causing unexpected behavior.
Suspicion of a potential `AttributeError` crash in the evals viewset if `testrun.properties` is `None` when attempting to access its attributes.
Did you find this useful? React with a 👍 or 👎 |
These changes add 2 new endpoints in the REST api (hidden behing a feature flag) with the objective of being a POC in reporting Eval info saved in Test Analytics. Probably not what anyone else had in mind, but a POC mind you. I didn't want to deal with building a frontend. Also the current way _we_ use evals is not in the CI, so I didn't think it made much sense to mess around the PR comment either. Anyway the endpoints are: * summary - aggregates eval data (that is saved per test-run). * compare - aggregate per commit and compare.
0e49c05
to
f4b990f
Compare
Apparently there was a custom field lookup in the Database router that was being used in the api. I moved to the shared one because the api one didn't know about the TA_TIMESERIES database the shared one didn't know about the special field I think it's OK to move to just 1, and that should probably be the shared one? Hopefully. At least tests pass now :E
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #246 +/- ##
==========================================
+ Coverage 94.21% 94.23% +0.01%
==========================================
Files 1214 1214
Lines 45022 45089 +67
Branches 1435 1435
==========================================
+ Hits 42418 42488 +70
+ Misses 2303 2300 -3
Partials 301 301
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
📢 Thoughts on this report? Let us know! |
CodSpeed Performance ReportMerging #246 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but I would also recommend a review from someone else who knows this db router things better than me.
|
||
|
||
@Field.register_lookup | ||
class IsNot(Lookup): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m surprised this does not exist out of the box?
|
||
def get_queryset(self): | ||
return Testrun.objects.filter( | ||
repo_id=self.repo.repoid, properties__isnull=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks reasonable to be, but assumes we only want to return testruns that have something to do with evals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well... the name of the endpoint is "evals" after all :P
repo_id=self.repo.repoid, properties__isnull=False | ||
) | ||
|
||
def _aggregate_testruns(self, testruns): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type this as list[Testrun]
please? that should make it clearer that you are indeed fetching all the testruns into memory first.
I think thats fine for a first POC, but otherwise we should maybe offload this computation to postgres. this might be a bit fiddly to write, but should perform better.
score_avgs = {} | ||
score_counts = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having two dicts might perform a bit better than having one dict and nested tuples, but nested tuples would be easier to read I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always happy to see that we consolidate duplicated code
OpenApiParameter.QUERY, | ||
description="commit SHA for which to return evaluation summary", | ||
), | ||
OpenApiParameter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the classname
param mean?
failedItems: int | ||
scores: dict[str, float] | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the eval reports sharing the same table Testrun
as TA related reports? How does the queryset only pull eval reports?
else 0 | ||
) | ||
|
||
# Calculate score averages for passed items |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to have the failed items have a score of 0 instead of being ignored. Maybe a separate metric for that. Because just looking at a 100% score without looking at number of failed items seems deceiving, unless it failed because the evals code errored. Ideally we have 1 number to paint as much picture as possible.
Looks like the report builder from the evals takes in a score-function, then I think we shouldn't add logic in here to do any form of computing and just leave it to the report builder to do i.
These changes add 2 new endpoints in the REST api (hidden behing a feature flag) with the objective of being a POC in reporting Eval info saved in Test Analytics.
Probably not what anyone else had in mind, but a POC mind you. I didn't want to deal with building a frontend. Also the current way we use evals is not in the CI, so I didn't think it made much sense to mess around the PR comment either.
Anyway the endpoints are: