New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error message when decoding a non-item revision from Wikidata #517
Conversation
revscoring/errors.py
Outdated
def __init__(self, text, arg=None): | ||
super().__init__("Expected entity JSON " | ||
"but the following content can't be parsed: {}" | ||
.format(text)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best not to pre-fill the text of the error to be about JSON given that this is named as a general error for unexpected content. Or you could make an "EntityContentTypeError" class that extends "UnexpectedContentType" and pre-fills this error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! One more note. text
can be very long so it would be good to trim it down to the first N chars. 20 or 50 chars should be plenty for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, lemme know if the new version is better!
try: | ||
return json.loads(text) | ||
except json.decoder.JSONDecodeError: | ||
raise UnexpectedContentType(text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I recommend saying something specific about "Entity JSON" here.
OR raising an error that is more specific to Entity JSON parsing that extends UnexpectedContentType
Given that I think this is the only place we might throw such an error, I'm more in favor of the former, but I could see value in the latter. It's nice to have the consistent parts of error text defined in errors.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, I added a specific parameter to the Exception, lemme know what you think about it.
843943f
to
f513d39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Nice work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just caught something that might be confusing down the road.
revscoring/errors.py
Outdated
@@ -75,6 +75,12 @@ def __init__(self, datasources, info=None, arg=None): | |||
super().__init__("Query failed ({0}:{1})" | |||
.format(datasources, info)) | |||
|
|||
class UnexpectedContentType(DependencyError): | |||
def __init__(self, text, content_type, arg=None): | |||
super().__init__("Expected entity of content type {} " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approved this but I just noticed that "Entity" in this message. Entity is jargon for a specific type of wiki content. Entity JSON is the format that Items, Properties, and Lexemes are built on top of for Wikidata. Maybe this could be "Expected content of type {}" where the value could be "Entity JSON".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, amended, let me know if it is better now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a minor issue when I ran pytest for this
@@ -75,6 +75,12 @@ def __init__(self, datasources, info=None, arg=None): | |||
super().__init__("Query failed ({0}:{1})" | |||
.format(datasources, info)) | |||
|
|||
class UnexpectedContentType(DependencyError): | |||
def __init__(self, text, content_type, arg=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to fix the issue when testing exceptions picklability:
uct = UnexpectedContentType("Test", "JSON")
> pickle.loads(pickle.dumps(uct))
E TypeError: __init__() missing 1 required positional argument: 'content_type'
tests/test_errors.py:75: TypeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep the content_type as required field, if we default to None then there could be the possibility to have exception msgs like "Expected content of type None, etc..". Is it a limitation of pickle? Or maybe something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the testing code should be:
uct = UnexpectedContentType(Dependent("Test"), "JSON")
I just updated the code change, lemme know if it fixes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't work. It shows a new error:
self = UnexpectedContentType(<dependent.Test>, 'JSON'), text = <dependent.Test>
content_type = 'JSON', arg = None
def __init__(self, text, content_type, arg=None):
super().__init__("Expected content of type {0}, "
"but the following can't be parsed "
"(max 50 chars showed): {1}"
> .format(content_type, text[:50]))
E TypeError: 'Dependent' object is not subscriptable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be a problem of pickle.. if I remove the line pickle.loads(pickle.dumps(uct))
and run the test, it will pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok makes sense, there may be some error in that Dependent("Test") class, we should figure out what's best to put in the test use case to make everything working.
11cdb64
to
a04c4e5
Compare
Revscoring returns a very verbose and cryptic JSONDecodeError stacktrace when it tries to score a non-item Wikidata revision with the item quality model (for example, a User/Talk page change, etc..). This change adds a new exception meant to capture this kind of error, to wrap it in a clearer error message. Co-authored-by: halfak <aaron.halfaker@gmail.com>, Aiko Chou <achou@wikimedia.org> Bug: T302851
@elukey I found a solution to keep the
I tested it and it works! |
Closing in favor of #518 |
Revscoring returns a very verbose and cryptic JSONDecodeError stacktrace
when it tries to score a non-item Wikidata revision with the item
quality model (for example, a User/Talk page change, etc..).
This change adds a new exception meant to capture this kind of error,
to wrap it in a clearer error message.
Co-authored-by: halfak aaron.halfaker@gmail.com, Aiko Chou achou@wikimedia.org
Bug: T302851