Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of proposed XrdOss advanced error interface usage #1915

Closed
wants to merge 3 commits into from

Conversation

bbockelm
Copy link
Contributor

This PR builds on #1913 to demonstrate how it might be used. With this, the PSS returns the following when the cache is unable to authenticate with the origin:

[ERROR] Server responded with an error: [3030] Unable to open /user/ligo/test_access/access_ligo; invalid exchange (Failed to open file in cache: invalid exchange)

Note that it generates a lousy message ("invalid exchange") because EAUTH maps to a EBADE on Linux. I'll fix that in a follow-up PR.

With this approach, we could continue digging through the XrdPosix layer to generate a better error message there as well. However, I just wanted to illustrate what the new XrdOss API usage would look like.

@osschar

While XrdOss has a simple POSIX-like interface, in reality the
underlying implementation can be quite complicated -- XrdPfc, for
example.  Accordingly, we should allow for a richer error message
interface between the OSS and its callers.  This adds a new virtual
function which the SFS interface can invoke to get additional error
information.

The intent is this will be used by XrdPfc and allow clients to
differentiate between "the cache gave you a permission denied" and
"the origin gave the cache a permission denied".

*A note on ABI compatibility*: Technically, this is an ABI change
and hence would need to wait until XRootD 6 to be deployed.  However,
this takes advantage of a quirk of GCC: the ordering of the virtual
functions are preserved.  Hence, the new functions are placed out-of-order.
This adds an example of using the new OSS advanced error reporting
for an open call.  In this case, the underlying error message is
concatenated to the standard OFS error message.
This adds an example of using the new advanced error message generation
for the PSS object.  With this, opening a file in the cache where the
file is not found by the origin will result in something like this:

```
[ERROR] Server responded with an error: [3011] Unable to open /test_access;
  no such file or directory (Failed to open file in cache: No such file or directory)
```

(line break added for readability)
Copy link
Member

@abh3 abh3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea here is good, though somewhat limited as we can only get extended information from file objects not from generic methods. I suppose that's better than not but not completely satisfying. Of course, all of this is historic so at the moment it seems the best that can be done. All that said, in our meeting today we decided that this change really needs to go into R6 as it breaks ABI. Fortunately, that allows simplification because the whole XRDOSS_HASAERR checking can be eliminated. It's expected that the "extended information" is either implemented or returns nothing. So, code should simply check for extended error information. We are also worried on what the composite message would look like. Can you give some examples? If it becomes largely unreadable (think composite OpenSSL messages) this this patch would be of limited value. I know there is some concern about when R6 will be cut but it looks very likely to be in 2023. In the mean time this does open up the possibility to perhaps address the non-file based methods that return nothing more than an errno type message to make this feature complete.

@amadio amadio added this to the 6.0 milestone Mar 3, 2023
@abh3 abh3 marked this pull request as draft April 14, 2023 05:19
@abh3
Copy link
Member

abh3 commented Oct 18, 2023

Just an update. We've accepted the whole concept but the particular pull request here does not address the underlying problem. Most of the issues come from the POSIX interface which has practically no smarts when reporting errors. We already have an architectural design in place and will be implementing this as quickly as possible. Suffice it to say, your example was very helpful in spurring this activity as well as saving us much time. I'll close this pull request when we have the complete changes in place.

@bbockelm
Copy link
Contributor Author

Hi Andy!

Just to help clean things up -- I'll go ahead and close out this PR. Will work to revise it once the new error propagation framework is available.

Brian

@bbockelm bbockelm closed this Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants