Skip to content

Conversation

@jaredoconnell
Copy link
Collaborator

Summary

This PR fixes type and quality errors in the data package.

This brings the total type error count down to 124.

Some assumptions had to be made, so it would be a good idea to look for accidental changes in logic during your reviews.

Testing

Run benchmarks with various data input types to ensure they all work.


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
@jaredoconnell jaredoconnell force-pushed the features/refactor/type-fixes-5 branch from 7564645 to ad41a66 Compare October 21, 2025 20:28
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
sjmonson
sjmonson previously approved these changes Oct 21, 2025
Copy link
Collaborator

@sjmonson sjmonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not test but looks good

break # Found one that works. Continuing could overwrite it.

if dataset is None and len(errors) > 0:
raise DataNotSupportedError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message can get a bit unruly here, can we distinguish between not supported errors and dataset errors?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

) + num_chars
if (num_words := stats.get("num_words")) is not None:
input_metrics.text_words = (input_metrics.text_words or 0) + num_words
input_metrics.text_characters = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep the counting logic unified in the text_stats function. We could update it so that it pushes to a UsageMetrics instance directly / make that a functoin of the UsageMetrics class, but this code is duplicated and if we add anymore stats, it will make it hard to keep updated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I think the new code is a much more elegant solution.

Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
@markurtz markurtz merged commit 4d6f9d4 into main Oct 23, 2025
12 of 17 checks passed
@markurtz markurtz deleted the features/refactor/type-fixes-5 branch October 23, 2025 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants