Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSC doesn't work well with XrdOss plugins #830

Closed
bbockelm opened this issue Sep 26, 2018 · 6 comments
Closed

POSC doesn't work well with XrdOss plugins #830

bbockelm opened this issue Sep 26, 2018 · 6 comments

Comments

@bbockelm
Copy link
Contributor

When there are uploads-in-progress when Xrootd is started, the POSC subsystem will attempt to clean them up when the server restarts. The code is here:

https://github.com/xrootd/xrootd/blob/master/src/XrdOfs/XrdOfsPoscq.cc#L141

There's two problems here:

  1. I'm not sure that POSC really applies here -- seems that if Xrootd shares a distributed filesystem, there's no way to guarantee that some other Xrootd server didn't attempt this transfer on the same filesystem.
  2. There is no environment or security entity object passed. This causes XrdOss to interact with the underlying filesystem as the default user, which is nobody for HDFS. Needless to say, nobody doesn't have permission to unlink the file.
  • Failure to unlink the file subsequently causes the daemon startup to fail.

I'm reluctant to have the default user be root, just from a least privilege point-of-view. What's the best way to proceed here? Should I patch the OFS to make a "fake" security object?

@abh3
Copy link
Member

abh3 commented Sep 27, 2018

I guess I don' understand how you got yourself in this predicament in the first place unless you are using a setfsuid plugin. That said,

  1. We don't support multiple servers writing into he same namespace residing in a distributed file system. The reasons should be obvious as we an't guarantee consistency. We expect people to partition the namespace in some way between the servers to avoid servers clobbering other server's files. So, my guess is that this isn't being done. Yes, it will work most of the time but not all the time.

  2. We don't recommend using nobody for the xrootd userid since the xrootd must be able to control it's own files. Of course, if you are using setfsuid and the xrootd has setfsuid capabilities then it would work but not in the POSC case without some additional work (note we don't support setfsuid, well not yet). If you use sefsuid then you certainly know who owns the file to be deleted and could assume that identity in order to unlink the file. As you note, the code is not there to do that because because there is no native support for setfsuid.

Failing initialization when a POSC file cannot be deleted is he correct response in order to guarantee "poscness".

I would not give xrootd root privileges. Even if you did you would have to change the start-up options and the code would work anyway as it reverts to a non-root uid when it starts running. Passing a fake security object wouldn't work also as the clean-up happens during initialization and there is no reference to a security object that point (xrootd assumes it can cleanup files it created notwithstanding a setfsuid plugin).

Here are some options (assuming I understand what is going on here):
a) If a setfsuid is being used, the plugin should check if the file will be created using POSC. If so, the file's ownership needs to belong to xrootd until a successful close occurs. At that point the ownership can be changed. Of course, that introduces a small window where the server may crash and the files ownership is wrong.
b) If he file systems supports ACL's that set the ACL to allow the normal xrootd userid full privileges. This is only workable if ACL's are recursive in nature.
c) Run an external program prior to start-up or launch one before POSC cleanup is invoked to do the actual cleanup. While that makes you dependent on he current implementation, at least it's isolated to one external program.

If you give me details on how this situation occurs n the first place, perhaps I can provide additional suggestions.

@bbockelm
Copy link
Contributor Author

This is actually the HDFS backend - but, logically, it works the same way as setfsuid. The HDFS client is setup to be able to impersonate any HDFS user.

Honestly, I don't want to support POSC since, as you point out, it really isn't plausible to "do it right" in this configuration.

Rather than trying to fix something that is fundamentally unfixable, is it possible to return an error or otherwise disable POSC? For me, the bigger problem is someone who starts a POSC transfer can cause my server to be unable to restart.

@abh3
Copy link
Member

abh3 commented Oct 1, 2018 via email

@bbockelm
Copy link
Contributor Author

bbockelm commented Oct 2, 2018

Ah, great!

It's not clear from the docs: when I do ofs.persist off, what happens when a client requests POSC behavior? Is it silently ignored or does the server return an error?

@abh3
Copy link
Member

abh3 commented Oct 3, 2018 via email

@abh3
Copy link
Member

abh3 commented Oct 7, 2018

I believe this issue has been addressed with the persists configuration option. If not, please reopen it.

@abh3 abh3 closed this as completed Oct 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants