POC: endpoint to get evals info #246

giovanni-guidini · 2025-06-18T07:54:43Z

These changes add 2 new endpoints in the REST api (hidden behing a feature flag) with the objective of being a POC in reporting Eval info saved in Test Analytics.

Probably not what anyone else had in mind, but a POC mind you. I didn't want to deal with building a frontend. Also the current way we use evals is not in the CI, so I didn't think it made much sense to mess around the PR comment either.

Anyway the endpoints are:

summary - aggregates eval data (that is saved per test-run).
compare - aggregate per commit and compare.

⚠️ I had to change the DB router for the API for tests to pass.
It included a custom field lookup that was moved to the shared DB router.

seer-by-sentry · 2025-06-18T07:59:04Z

Sentry detected 2 potential issues in your recent changes

Suspicion of an AttributeError in apps/codecov-api/api/public/v2/evals/views.py:99. Calling .get() on the testrun.properties JSONField could fail if the field is None, potentially causing unexpected behavior.

Description: The Testrun model's 'properties' field is defined as JSONField(null=True), allowing it to store NULL in the database, which Django represents as None in Python. The code at apps/codecov-api/api/public/v2/evals/views.py:99 attempts to call .get("eval", {}) directly on testrun.properties. If testrun.properties is None, calling .get() on it will raise an AttributeError: 'NoneType' object has no attribute 'get'. This specific pattern of calling .get() directly on a potentially None JSONField appears to be unique to this new code path. While the queryset filters for properties__isnull=False, there might be edge cases or race conditions where a None value could still be processed, leading to an unexpected server crash when this view is accessed for such a Testrun object.
Code location: apps/codecov-api/api/public/v2/evals/views.py:99~104
Suggested fix: Add a check to ensure testrun.properties is not None before attempting to call .get() on it. For example, use eval_data = (testrun.properties or {}).get("eval", {}).

Suspicion of a potential `AttributeError` crash in the evals viewset if `testrun.properties` is `None` when attempting to access its attributes.

Description: The code in _aggregate_testruns accesses testrun.properties.get("eval", {}) without checking if testrun.properties is None. The Testrun model defines properties as a JSONField(null=True), meaning it can store NULL in the database, which translates to None in Python. If a Testrun object is retrieved where properties is None, attempting to call .get() on it will raise an AttributeError: 'NoneType' object has no attribute 'get'. This would halt the request processing and lead to unexpected behavior for the user. While the get_queryset method includes a filter properties__isnull=False, there's a technical risk that due to race conditions, ORM caching, or other edge cases, a Testrun object with properties=None could still potentially reach the aggregation logic, triggering the AttributeError. For example, if a record's properties were set to NULL between the query execution and the data processing loop, or if caching returned a stale object state. This specific code path in the evals viewset is affected.
Code location: apps/codecov-api/api/public/v2/evals/views.py:98~119
Suggested fix: Add a check if testrun.properties is not None: before accessing testrun.properties.get("eval", {}) in the _aggregate_testruns method.

_{Did you find this useful? React with a 👍 or 👎}

These changes add 2 new endpoints in the REST api (hidden behing a feature flag) with the objective of being a POC in reporting Eval info saved in Test Analytics. Probably not what anyone else had in mind, but a POC mind you. I didn't want to deal with building a frontend. Also the current way _we_ use evals is not in the CI, so I didn't think it made much sense to mess around the PR comment either. Anyway the endpoints are: * summary - aggregates eval data (that is saved per test-run). * compare - aggregate per commit and compare.

Apparently there was a custom field lookup in the Database router that was being used in the api. I moved to the shared one because the api one didn't know about the TA_TIMESERIES database the shared one didn't know about the special field I think it's OK to move to just 1, and that should probably be the shared one? Hopefully. At least tests pass now :E

codecov · 2025-06-18T10:15:28Z

Codecov Report

Attention: Patch coverage is 95.49550% with 5 lines in your changes missing coverage. Please review.

Project coverage is 94.23%. Comparing base (90251ff) to head (30767a3).
Report is 3 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...s/shared/shared/django_apps/db_routers/__init__.py	55.55%	4 Missing ⚠️
apps/codecov-api/api/public/v2/evals/views.py	98.97%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
+ Coverage   94.21%   94.23%   +0.01%     
==========================================
  Files        1214     1214              
  Lines       45022    45089      +67     
  Branches     1435     1435              
==========================================
+ Hits        42418    42488      +70     
+ Misses       2303     2300       -3     
  Partials      301      301

Flag	Coverage Δ
apiunit	`96.53% <99.01%> (+0.05%)`	⬆️
sharedintegration	`39.75% <55.55%> (+0.01%)`	⬆️
sharedunit	`88.13% <55.55%> (-0.03%)`	⬇️
workerintegration	`61.61% <ø> (ø)`
workerunit	`90.62% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codecov-notifications · 2025-06-18T10:15:30Z

Codecov Report

Attention: Patch coverage is 95.49550% with 5 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...s/shared/shared/django_apps/db_routers/__init__.py	55.55%	4 Missing ⚠️
apps/codecov-api/api/public/v2/evals/views.py	98.97%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

codspeed-hq · 2025-06-18T10:18:06Z

CodSpeed Performance Report

Merging #246 will not alter performance

_{Comparing gio/ta-evals-poc (30767a3) with main (6efc689)}

Summary

✅ 9 untouched benchmarks

Swatinem

lgtm, but I would also recommend a review from someone else who knows this db router things better than me.

Swatinem · 2025-06-18T12:54:00Z

libs/shared/shared/django_apps/db_routers/__init__.py

+
+
+@Field.register_lookup
+class IsNot(Lookup):


I’m surprised this does not exist out of the box?

Swatinem · 2025-06-18T12:55:57Z

apps/codecov-api/api/public/v2/evals/views.py

+
+    def get_queryset(self):
+        return Testrun.objects.filter(
+            repo_id=self.repo.repoid, properties__isnull=False


this looks reasonable to be, but assumes we only want to return testruns that have something to do with evals

well... the name of the endpoint is "evals" after all :P

Swatinem · 2025-06-18T12:58:36Z

apps/codecov-api/api/public/v2/evals/views.py

+            repo_id=self.repo.repoid, properties__isnull=False
+        )
+
+    def _aggregate_testruns(self, testruns):


can you type this as list[Testrun] please? that should make it clearer that you are indeed fetching all the testruns into memory first.

I think thats fine for a first POC, but otherwise we should maybe offload this computation to postgres. this might be a bit fiddly to write, but should perform better.

Swatinem · 2025-06-18T13:01:06Z

apps/codecov-api/api/public/v2/evals/views.py

+        score_avgs = {}
+        score_counts = {}


having two dicts might perform a bit better than having one dict and nested tuples, but nested tuples would be easier to read I believe.

Swatinem · 2025-06-18T13:05:00Z

apps/codecov-api/codecov/db/__init__.py

always happy to see that we consolidate duplicated code

JerrySentry · 2025-06-18T14:52:43Z

apps/codecov-api/api/public/v2/evals/views.py

+                OpenApiParameter.QUERY,
+                description="commit SHA for which to return evaluation summary",
+            ),
+            OpenApiParameter(


what does the classname param mean?

JerrySentry · 2025-06-18T14:54:25Z

apps/codecov-api/api/public/v2/evals/views.py

+    failedItems: int
+    scores: dict[str, float]
+
+


are the eval reports sharing the same table Testrun as TA related reports? How does the queryset only pull eval reports?

JerrySentry · 2025-06-18T14:58:10Z

apps/codecov-api/api/public/v2/evals/views.py

+            else 0
+        )
+
+        # Calculate score averages for passed items


I think it would be useful to have the failed items have a score of 0 instead of being ignored. Maybe a separate metric for that. Because just looking at a 100% score without looking at number of failed items seems deceiving, unless it failed because the evals code errored. Ideally we have 1 number to paint as much picture as possible.

Looks like the report builder from the evals takes in a score-function, then I think we shouldn't add logic in here to do any form of computing and just leave it to the report builder to do i.

giovanni-guidini force-pushed the gio/ta-evals-poc branch from 0e49c05 to f4b990f Compare June 18, 2025 09:22

giovanni-guidini added 2 commits June 18, 2025 11:45

fix imports because tests exploded in CI

09700b3

giovanni-guidini requested review from matt-codecov and joseph-sentry June 18, 2025 10:11

Swatinem approved these changes Jun 18, 2025

View reviewed changes

typing, docstrings and readability

30767a3

JerrySentry reviewed Jun 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

POC: endpoint to get evals info #246

POC: endpoint to get evals info #246

giovanni-guidini commented Jun 18, 2025 •

edited

Loading

Uh oh!

seer-by-sentry bot commented Jun 18, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jun 18, 2025 •

edited

Loading

Uh oh!

codecov-notifications bot commented Jun 18, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jun 18, 2025 •

edited

Loading

Uh oh!

Swatinem left a comment

Uh oh!

Swatinem Jun 18, 2025

Uh oh!

Swatinem Jun 18, 2025

Uh oh!

giovanni-guidini Jun 18, 2025

Uh oh!

Swatinem Jun 18, 2025

Uh oh!

Swatinem Jun 18, 2025

Uh oh!

Swatinem Jun 18, 2025

Uh oh!

JerrySentry Jun 18, 2025

Uh oh!

JerrySentry Jun 18, 2025

Uh oh!

JerrySentry Jun 18, 2025 •

edited

Loading

Uh oh!

Uh oh!



		@Field.register_lookup
		class IsNot(Lookup):

POC: endpoint to get evals info #246

Are you sure you want to change the base?

POC: endpoint to get evals info #246

Conversation

giovanni-guidini commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seer-by-sentry bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sentry detected 2 potential issues in your recent changes

Uh oh!

codecov bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-notifications bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging #246 will not alter performance

Summary

Uh oh!

Swatinem left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JerrySentry Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

giovanni-guidini commented Jun 18, 2025 •

edited

Loading

seer-by-sentry bot commented Jun 18, 2025 •

edited

Loading

codecov bot commented Jun 18, 2025 •

edited

Loading

codecov-notifications bot commented Jun 18, 2025 •

edited

Loading

codspeed-hq bot commented Jun 18, 2025 •

edited

Loading

JerrySentry Jun 18, 2025 •

edited

Loading