Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message when decoding a non-item revision from Wikidata #517

Closed
wants to merge 1 commit into from

Conversation

elukey
Copy link
Contributor

@elukey elukey commented Mar 10, 2022

Revscoring returns a very verbose and cryptic JSONDecodeError stacktrace
when it tries to score a non-item Wikidata revision with the item
quality model (for example, a User/Talk page change, etc..).
This change adds a new exception meant to capture this kind of error,
to wrap it in a clearer error message.

Co-authored-by: halfak aaron.halfaker@gmail.com, Aiko Chou achou@wikimedia.org
Bug: T302851

def __init__(self, text, arg=None):
super().__init__("Expected entity JSON "
"but the following content can't be parsed: {}"
.format(text))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best not to pre-fill the text of the error to be about JSON given that this is named as a general error for unexpected content. Or you could make an "EntityContentTypeError" class that extends "UnexpectedContentType" and pre-fills this error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! One more note. text can be very long so it would be good to trim it down to the first N chars. 20 or 50 chars should be plenty for debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, lemme know if the new version is better!

try:
return json.loads(text)
except json.decoder.JSONDecodeError:
raise UnexpectedContentType(text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I recommend saying something specific about "Entity JSON" here.

OR raising an error that is more specific to Entity JSON parsing that extends UnexpectedContentType

Given that I think this is the only place we might throw such an error, I'm more in favor of the former, but I could see value in the latter. It's nice to have the consistent parts of error text defined in errors.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, I added a specific parameter to the Exception, lemme know what you think about it.

@elukey elukey force-pushed the master branch 3 times, most recently from 843943f to f513d39 Compare March 15, 2022 09:20
Copy link
Member

@halfak halfak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Nice work.

Copy link
Member

@halfak halfak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just caught something that might be confusing down the road.

@@ -75,6 +75,12 @@ def __init__(self, datasources, info=None, arg=None):
super().__init__("Query failed ({0}:{1})"
.format(datasources, info))

class UnexpectedContentType(DependencyError):
def __init__(self, text, content_type, arg=None):
super().__init__("Expected entity of content type {} "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved this but I just noticed that "Entity" in this message. Entity is jargon for a specific type of wiki content. Entity JSON is the format that Items, Properties, and Lexemes are built on top of for Wikidata. Maybe this could be "Expected content of type {}" where the value could be "Entity JSON".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, amended, let me know if it is better now :)

Copy link
Contributor

@AikoChou AikoChou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a minor issue when I ran pytest for this

@@ -75,6 +75,12 @@ def __init__(self, datasources, info=None, arg=None):
super().__init__("Query failed ({0}:{1})"
.format(datasources, info))

class UnexpectedContentType(DependencyError):
def __init__(self, text, content_type, arg=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to fix the issue when testing exceptions picklability:

        uct = UnexpectedContentType("Test", "JSON")
>       pickle.loads(pickle.dumps(uct))
E       TypeError: __init__() missing 1 required positional argument: 'content_type'

tests/test_errors.py:75: TypeError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the content_type as required field, if we default to None then there could be the possibility to have exception msgs like "Expected content of type None, etc..". Is it a limitation of pickle? Or maybe something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the testing code should be:

uct = UnexpectedContentType(Dependent("Test"), "JSON")

I just updated the code change, lemme know if it fixes!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't work. It shows a new error:

self = UnexpectedContentType(<dependent.Test>, 'JSON'), text = <dependent.Test>
content_type = 'JSON', arg = None

    def __init__(self, text, content_type, arg=None):
        super().__init__("Expected content of type {0}, "
                         "but the following can't be parsed "
                         "(max 50 chars showed): {1}"
>                        .format(content_type, text[:50]))
E       TypeError: 'Dependent' object is not subscriptable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be a problem of pickle.. if I remove the line pickle.loads(pickle.dumps(uct)) and run the test, it will pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok makes sense, there may be some error in that Dependent("Test") class, we should figure out what's best to put in the test use case to make everything working.

revscoring/errors.py Outdated Show resolved Hide resolved
revscoring/errors.py Outdated Show resolved Hide resolved
@elukey elukey force-pushed the master branch 2 times, most recently from 11cdb64 to a04c4e5 Compare March 21, 2022 14:47
Revscoring returns a very verbose and cryptic JSONDecodeError stacktrace
when it tries to score a non-item Wikidata revision with the item
quality model (for example, a User/Talk page change, etc..).
This change adds a new exception meant to capture this kind of error,
to wrap it in a clearer error message.

Co-authored-by: halfak <aaron.halfaker@gmail.com>,
Aiko Chou <achou@wikimedia.org>
Bug: T302851
@AikoChou
Copy link
Contributor

@elukey I found a solution to keep the content_type as a required argument and remain pickleable. According to https://stackoverflow.com/questions/16244923/how-to-make-a-custom-exception-class-with-multiple-init-args-pickleable, we could override the __reduce__ method in the exception:

def __reduce__(self):
    return (UnexpectedContentType, (self.text, self.content_type))

I tested it and it works!

@elukey
Copy link
Contributor Author

elukey commented Mar 30, 2022

Closing in favor of #518

@elukey elukey closed this Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants