-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdk-core: separate error logging, error reporting and error classification #2666
Conversation
63f8d1d
to
58d0644
Compare
projects/vdk-core/src/vdk/internal/builtin_plugins/ingestion/ingester_base.py
Show resolved
Hide resolved
45e573f
to
355e35a
Compare
3238b5f
to
a4f095e
Compare
bd6ca9f
to
741c8fd
Compare
Details: #2666 Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
Details: #2666 Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
741c8fd
to
297e191
Compare
Details: #2666 Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
297e191
to
d9ff95a
Compare
Details: #2666 Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
ad1c057
to
72f6e3b
Compare
Details: #2666 Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I made some comments about the code but they are minor and non-blocking.
Details: #2666 Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
e4633fe
to
907acae
Compare
Details: #2666 Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
907acae
to
4e55cbe
Compare
Details: #2666 Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
4e55cbe
to
ab183c7
Compare
Details: #2666 Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
ab183c7
to
88145e3
Compare
Details: #2666 Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
88145e3
to
9de1e8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! I put some comments but they are not much important
Details: #2666 Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
9de1e8a
to
502a891
Compare
Details: #2666 Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
502a891
to
2a70bff
Compare
Why? Tests started failing after #2666 Most likely due to not wrapping errors in UserCodeError anymore What? Pass the dag name to the dag validation test fixture Expect just one request in the dag, as we do a GET request to fetch the job info before doing validation How was this tested? Ran test suite locally What kind of change is this? Bugfix Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
Why? Tests started failing after #2666 Most likely due to not wrapping errors in UserCodeError anymore What? Pass the dag name to the dag validation test fixture Expect just one request in the dag, as we do a GET request to fetch the job info before doing validation How was this tested? Ran test suite locally What kind of change is this? Bugfix Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
Why? Tests started failing after #2666 Most likely due to not wrapping errors in UserCodeError anymore What? Pass the dag name to the dag validation test fixture Expect just one request in the dag, as we do a GET request to fetch the job info before doing validation How was this tested? Ran test suite locally What kind of change is this? Bugfix Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
Why? Tests started failing after #2666 Most likely due to not wrapping errors in UserCodeError anymore What? Pass the dag name to the dag validation test fixture Expect just one request in the dag, as we do a GET request to fetch the job info before doing validation How was this tested? Ran test suite locally What kind of change is this? Bugfix Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
## Why? Tests started failing after #2666 Most likely due to not wrapping errors in UserCodeError anymore ## What? Pass the dag name to the dag validation test fixture Expect just one request in the dag, as we do a GET request to fetch the job info before doing validation ## How was this tested? Ran test suite locally ## What kind of change is this? Bugfix Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
Why
As part of the run logs initiative, we should add classification attributes to exceptions that are handled in vdk. These attributes are used for easier error classification and will help us enrich the vdk exception class family, so we can pass more info to our users.
This is not possible without separating the error logging and error reporting mechanisms, so some refactoring was required.
What
The main changes are in
errors.py
. All other changes (tests, other modules) are a result of changes inerrors.py
. The changes for specific plugins are out of scope for this PR due to compatibility issues.Add attributes to exceptions
Base vdk exceptions now have two new attributes -
to_be_fixed_by
andtype
(names subject to change). These help the error classification functions determine who to blame when an exception is raised. They're also displayed when pretty-printing the vdk exceptions. UserCodeError, VdkConfigurationError and PlatformService error are preserved as convenience classes that hardcode these attributes, based on who we decide is responsible for these exceptions. They can be extended into more specific exceptions in subsequent PRs, related to #2572Introduce a reporting API
The report functions adds any base exception (not necessarily a vdk one) to the resolvable context. It determines who to blame for the error and sets the
to_be_fixed_by
andtype
attributes of the exception based on the error classification.report_and_throw is used for convenience. It adds the vdk exception that was passed to the resolvable context and throws it.
report_and_rethrow just calls report and throws the exception that was passed.
Finally, we have the
log_exception(log, exception: BaseException, *lines)
function. It builds a message from the lines we pass and logs it at the warn level. It then logs the exception that was passed.The result is that we've effectively decoupled logging from the error reporting and error classification functions. We can choose to report an exception and not log it, or log and exception and not report it, depending on how we want to handle it. This provides a certain amount of freedom if we want to eliminate log statements we consider unnecessary.
Change the way vdk exceptions are represented
Override the str and repr methods in BaseVdkException. The constructor takes a variable number of lines that are used to create the error message and exception representation. We print a nice box that tells us what kind of exception occurred and who should fix it, along with the details.
The constructor can still take an ErrorMessage or dict instance for compatibility reasons. That logic should be removed when we remove ErrorMessage from the plugins.
What was removed
Passing
ErrorMessage
to vdk exceptions is not necessary anymore, so the pattern has been removed wherever it's used for instantiation in vdk-core.Formatting the error message also had separate functions, which are no longer needed, so they've been removed.
The decorator for error messages for some special cases has been removed and the logic added to the only function that actually used it.
DomainError,
log_and_throw
andlog_and_rethrow
are kept for compatibility reasons, but they should be phased out in all plugins that call them as separate PRs.Resulting user experience
Before: https://pastebin.com/raw/YsgzpUj4
Configuration error is printed with verbose stack trace
After: https://pastebin.com/raw/MUsLcjB2
Configuration error is printed with shorter stack trace
Before: https://pastebin.com/raw/qbfRfAUn
Exception coming from user code is wrapped in UserCodeError
After: https://pastebin.com/raw/1HGfDqye
Exception coming from user code is not wrapped, but attributes are set correctly, e.g. blamee is displayed as user error in the subsequent result.
Alternative: https://pastebin.com/raw/AsjDeyHZ
Alternatively, we can wrap non-vdk exceptions on re-throw, but this means we should re-visit some of our logging statements, because the logs become too verbose. Since logging exceptions is now separate from reporting them, we can dive deeper into the issue.
Folow-up
#2677
#2678
#2679
#2680
#2681
#2682
#2683
#2684
How was this tested
Ran locally with jobs that cause errors, ran the vdk-core tests locally as well. For plugin tests, we rely on CI.
What kind of change is this
Potentially breaking change