Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-core: separate error logging, error reporting and error classification #2666

Merged
merged 1 commit into from
Sep 21, 2023

Conversation

DeltaMichael
Copy link
Contributor

@DeltaMichael DeltaMichael commented Sep 15, 2023

Why

As part of the run logs initiative, we should add classification attributes to exceptions that are handled in vdk. These attributes are used for easier error classification and will help us enrich the vdk exception class family, so we can pass more info to our users.

This is not possible without separating the error logging and error reporting mechanisms, so some refactoring was required.

What

The main changes are in errors.py. All other changes (tests, other modules) are a result of changes in errors.py. The changes for specific plugins are out of scope for this PR due to compatibility issues.

Add attributes to exceptions

Base vdk exceptions now have two new attributes - to_be_fixed_by and type (names subject to change). These help the error classification functions determine who to blame when an exception is raised. They're also displayed when pretty-printing the vdk exceptions. UserCodeError, VdkConfigurationError and PlatformService error are preserved as convenience classes that hardcode these attributes, based on who we decide is responsible for these exceptions. They can be extended into more specific exceptions in subsequent PRs, related to #2572

Introduce a reporting API

report(error_type, exception: BaseException)
report_and_throw(exception: BaseVdkException)
report_and_rethrow(error_type, exception: BaseException)

The report functions adds any base exception (not necessarily a vdk one) to the resolvable context. It determines who to blame for the error and sets the to_be_fixed_by and type attributes of the exception based on the error classification.

report_and_throw is used for convenience. It adds the vdk exception that was passed to the resolvable context and throws it.

report_and_rethrow just calls report and throws the exception that was passed.

Finally, we have the log_exception(log, exception: BaseException, *lines) function. It builds a message from the lines we pass and logs it at the warn level. It then logs the exception that was passed.

The result is that we've effectively decoupled logging from the error reporting and error classification functions. We can choose to report an exception and not log it, or log and exception and not report it, depending on how we want to handle it. This provides a certain amount of freedom if we want to eliminate log statements we consider unnecessary.

Change the way vdk exceptions are represented

Override the str and repr methods in BaseVdkException. The constructor takes a variable number of lines that are used to create the error message and exception representation. We print a nice box that tells us what kind of exception occurred and who should fix it, along with the details.

The constructor can still take an ErrorMessage or dict instance for compatibility reasons. That logic should be removed when we remove ErrorMessage from the plugins.

What was removed

Passing ErrorMessage to vdk exceptions is not necessary anymore, so the pattern has been removed wherever it's used for instantiation in vdk-core.

Formatting the error message also had separate functions, which are no longer needed, so they've been removed.

The decorator for error messages for some special cases has been removed and the logic added to the only function that actually used it.

DomainError, log_and_throw and log_and_rethrow are kept for compatibility reasons, but they should be phased out in all plugins that call them as separate PRs.

Resulting user experience

Before: https://pastebin.com/raw/YsgzpUj4

Configuration error is printed with verbose stack trace

After: https://pastebin.com/raw/MUsLcjB2

Configuration error is printed with shorter stack trace

Before: https://pastebin.com/raw/qbfRfAUn

Exception coming from user code is wrapped in UserCodeError

After: https://pastebin.com/raw/1HGfDqye

Exception coming from user code is not wrapped, but attributes are set correctly, e.g. blamee is displayed as user error in the subsequent result.

Alternative: https://pastebin.com/raw/AsjDeyHZ

Alternatively, we can wrap non-vdk exceptions on re-throw, but this means we should re-visit some of our logging statements, because the logs become too verbose. Since logging exceptions is now separate from reporting them, we can dive deeper into the issue.

Folow-up

#2677
#2678
#2679
#2680
#2681
#2682
#2683
#2684

How was this tested

Ran locally with jobs that cause errors, ran the vdk-core tests locally as well. For plugin tests, we rely on CI.

What kind of change is this

Potentially breaking change

@DeltaMichael DeltaMichael force-pushed the person/mdilyan/error-attr-fields branch 5 times, most recently from 63f8d1d to 58d0644 Compare September 15, 2023 13:18
@DeltaMichael DeltaMichael force-pushed the person/mdilyan/error-attr-fields branch 2 times, most recently from 45e573f to 355e35a Compare September 16, 2023 00:49
@DeltaMichael DeltaMichael changed the title vdk-core: introduce attributes to error handling vdk-core: separate error logging, error reporting and error classification Sep 16, 2023
@DeltaMichael DeltaMichael force-pushed the person/mdilyan/error-attr-fields branch 2 times, most recently from 3238b5f to a4f095e Compare September 17, 2023 10:49
@DeltaMichael DeltaMichael changed the base branch from person/mdilyan/remove-log-and-rethrow to main September 17, 2023 10:50
@DeltaMichael DeltaMichael force-pushed the person/mdilyan/error-attr-fields branch 4 times, most recently from bd6ca9f to 741c8fd Compare September 17, 2023 12:43
DeltaMichael pushed a commit that referenced this pull request Sep 17, 2023
Details: #2666

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
DeltaMichael pushed a commit that referenced this pull request Sep 17, 2023
Details: #2666

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
@DeltaMichael DeltaMichael marked this pull request as ready for review September 17, 2023 13:06
DeltaMichael pushed a commit that referenced this pull request Sep 18, 2023
Details: #2666

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
DeltaMichael pushed a commit that referenced this pull request Sep 18, 2023
Details: #2666

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
DeltaMichael pushed a commit that referenced this pull request Sep 18, 2023
Details: #2666

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
Copy link
Collaborator

@antoniivanov antoniivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I made some comments about the code but they are minor and non-blocking.

projects/vdk-core/src/vdk/internal/core/errors.py Outdated Show resolved Hide resolved
projects/vdk-core/src/vdk/internal/core/errors.py Outdated Show resolved Hide resolved
projects/vdk-core/src/vdk/internal/core/errors.py Outdated Show resolved Hide resolved
projects/vdk-core/src/vdk/internal/core/errors.py Outdated Show resolved Hide resolved
projects/vdk-core/src/vdk/internal/core/errors.py Outdated Show resolved Hide resolved
DeltaMichael pushed a commit that referenced this pull request Sep 19, 2023
Details: #2666

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
DeltaMichael pushed a commit that referenced this pull request Sep 19, 2023
Details: #2666

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
DeltaMichael pushed a commit that referenced this pull request Sep 19, 2023
Details: #2666

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
DeltaMichael pushed a commit that referenced this pull request Sep 19, 2023
Details: #2666

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
DeltaMichael pushed a commit that referenced this pull request Sep 21, 2023
Details: #2666

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
Copy link
Collaborator

@duyguHsnHsn duyguHsnHsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! I put some comments but they are not much important

DeltaMichael pushed a commit that referenced this pull request Sep 21, 2023
Details: #2666

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
Details: #2666

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
@DeltaMichael DeltaMichael enabled auto-merge (squash) September 21, 2023 11:53
@DeltaMichael DeltaMichael merged commit cdb51dc into main Sep 21, 2023
8 checks passed
@DeltaMichael DeltaMichael deleted the person/mdilyan/error-attr-fields branch September 21, 2023 12:33
DeltaMichael pushed a commit that referenced this pull request Sep 28, 2023
Why?

Tests started failing after #2666
Most likely due to not wrapping errors in UserCodeError anymore

What?

Pass the dag name to the dag validation test fixture

Expect just one request in the dag, as we do a GET request
to fetch the job info before doing validation

How was this tested?

Ran test suite locally

What kind of change is this?

Bugfix

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
DeltaMichael pushed a commit that referenced this pull request Sep 28, 2023
Why?

Tests started failing after #2666
Most likely due to not wrapping errors in UserCodeError anymore

What?

Pass the dag name to the dag validation test fixture

Expect just one request in the dag, as we do a GET request
to fetch the job info before doing validation

How was this tested?

Ran test suite locally

What kind of change is this?

Bugfix

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
DeltaMichael pushed a commit that referenced this pull request Sep 28, 2023
Why?

Tests started failing after #2666
Most likely due to not wrapping errors in UserCodeError anymore

What?

Pass the dag name to the dag validation test fixture

Expect just one request in the dag, as we do a GET request
to fetch the job info before doing validation

How was this tested?

Ran test suite locally

What kind of change is this?

Bugfix

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
DeltaMichael pushed a commit that referenced this pull request Sep 28, 2023
Why?

Tests started failing after #2666
Most likely due to not wrapping errors in UserCodeError anymore

What?

Pass the dag name to the dag validation test fixture

Expect just one request in the dag, as we do a GET request
to fetch the job info before doing validation

How was this tested?

Ran test suite locally

What kind of change is this?

Bugfix

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
DeltaMichael added a commit that referenced this pull request Sep 28, 2023
## Why?

Tests started failing after
#2666 Most likely due
to not wrapping errors in UserCodeError anymore

## What?

Pass the dag name to the dag validation test fixture

Expect just one request in the dag, as we do a GET request to fetch the
job info before doing validation

## How was this tested?

Ran test suite locally

## What kind of change is this?

Bugfix

Signed-off-by: Dilyan Marinov <mdilyan@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants