Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebDAV XML parsing error due to ampersand in file names #1263

Closed
InfinityTotality opened this issue Jul 24, 2020 · 14 comments
Closed

WebDAV XML parsing error due to ampersand in file names #1263

InfinityTotality opened this issue Jul 24, 2020 · 14 comments

Comments

@InfinityTotality
Copy link

InfinityTotality commented Jul 24, 2020

I apologize if this is already addressed somewhere, but I'm not having any luck searching the issues or for documentation on the http protocol plugin. I'm attempting to use the XrdHttp protocol as a WebDAV server, but every client I use reports an XML parsing error. WinSCP in particular reports line numbers of the error, and I was able to track the issue down to file names that include an ampersand being returned verbatim between the <D:href></D:href> tags. Is it possibly a configurable option to escape these somewhere? I am using the xrootd package on CentOS 8.1, version 4.12.2

@abh3
Copy link
Member

abh3 commented Jul 25, 2020 via email

@InfinityTotality
Copy link
Author

I've looked through the code a bit, and the only reference to any sort of URL encoding I can find is the quote function in XrdHttpUtils.cc, but that doesn't handle ampersands, and it doesn't seem to be applied to the directory names in the XML responses as none of the other replacements are done either. This seems to match what I see in the code that builds the responses. I can't say I fully understand how iovP or resource are built, but the paths seem to be copied directly out of them into the XML response without any further processing.

@bbockelm
Copy link
Contributor

@ffurano - can you take a look at this? I haven't had the time to look into the details here but missing some escaping in filenames sounds plausible.

Thanks!

@ffurano
Copy link
Contributor

ffurano commented Jul 27, 2020

Hi,

having a look. However in general the compatibility with "popular" clients is pretty limited, as these
clients do not support 30x responses.

@InfinityTotality
Copy link
Author

Is there a recommendation regarding what client should be used? Or is the intent here not to be a general purpose WebDAV server? I'm looking for a solution for high performance bulk file transfer over WAN a la GridFTP or bbcp. It seemed like this project was the correct solution with what's going on with GridFTP, but I'm having difficulty finding any good client utilities with support for simple file browsing and easy multi-file transfers. Particularly for Windows clients. That is understandable, of course; neither of the other two have Windows support, but standards like WebDAV are generally well supported across platforms.

On a different note, I decided to test what would happen if I named a file "<D:href", and the text returned in the XML replaced the "<" with ef 80 a3 and the ":" with ef 80 a2. WinSCP simply ignores that file with no error. Windows explorer displays it as "Dhref". There is apparently some code attempting to ensure the response is valid XML, but it does not touch the ampersands.

@bbockelm
Copy link
Contributor

Is there a recommendation regarding what client should be used? Or is the intent here not to be a general purpose WebDAV server?

Personally, I haven't tested this much. I use various command-line clients (e.g., davix-cp) and not view it as a filesystem.

That said, providing invalid XML seems, erm, a pretty simple fix.

@InfinityTotality - if you avoid filenames with metacharacters in them, does everything else work (I live in a special environment where no one does such a thing)? As Fabrizio notes, if they don't support 30x-style codes (redirects), it may not be super-useful for your use case.

@ffurano
Copy link
Contributor

ffurano commented Jul 28, 2020

Hi,

I tried quoting the filenames in the listing response, and the browser (and davix-ls) clearly does not expect them to be quoted, so I revert.

Maybe it just needs the XML quoting, with &amp, this is going to be my next try.

To answer to the question about clients, we built davix to have a client with just the right bits into it, including the redirection and TPC support, so I suggest davix too for simplicity. Of course everything can be done with curl and more complexity.

About popular clients or mounting with davfs... they work only if the xrootd service is a single server (i.e. no redirections), so their usefulness is pretty limited.

More news later...

@ffurano
Copy link
Contributor

ffurano commented Jul 28, 2020

I think it's OK now. For the records, here's a significative response:

$ davix-ls -l 'http://littlexrdhttp.cern.ch:1094/tmp/stupid&dir/' --trace body
DAVIX(body): Body block (303 bytes):
[<D:propfind xmlns:D="DAV:" xmlns:L="LCGDM:"><D:prop><D:displayname/><D:getlastmodified/><D:creationdate/><D:getcontentlength/><D:quota-used-bytes/><D:resourcetype><D:collection/></D:resourcetype><L:mode/><D:owner></D:owner><D:group></D:group></D:prop></D:propfind>]

DAVIX(body): Read block (1102 bytes):
[
<D:multistatus xmlns:D="DAV:" xmlns:ns1="http://apache.org/dav/props/" xmlns:ns0="DAV:">
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/" xmlns:lp3="LCGDM:">
<D:href>/tmp/stupid&dir/</D:href>
<D:propstat>
<D:prop>
lp1:getcontentlength33</lp1:getcontentlength>
lp1:getlastmodifiedTue, 28 Jul 2020 08:12:40 GMT</lp1:getlastmodified>
lp1:resourcetype<D:collection/></lp1:resourcetype>
lp1:iscollection1</lp1:iscollection>
lp1:executableT</lp1:executable>
lp1:iscollection1</lp1:iscollection>
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/" xmlns:lp3="LCGDM:">
<D:href>/tmp/stupid&dir/double>stuupid&file</D:href>
<D:propstat>
<D:prop>
lp1:getcontentlength0</lp1:getcontentlength>
lp1:getlastmodifiedTue, 28 Jul 2020 08:12:40 GMT</lp1:getlastmodified>
lp1:iscollection0</lp1:iscollection>
lp1:executableF</lp1:executable>
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
</D:multistatus>
]

-rwxrwxrwx 0 0 2020-07-28 10:12:40 double>stuupid&file

@ffurano
Copy link
Contributor

ffurano commented Jul 28, 2020

Fixed by #1264

... please let me know

@ffurano ffurano closed this as completed Jul 29, 2020
@ffurano
Copy link
Contributor

ffurano commented Jul 29, 2020

Just for the records... I realized I have pasted the wrong example. Here it's how it behaves now, and it seems to be correct to me

$ davix-ls 'http://littlexrdhttp.cern.ch:1094/tmp/stupid&dir' --trace body
DAVIX(body): Body block (303 bytes):
[<D:propfind xmlns:D="DAV:" xmlns:L="LCGDM:"><D:prop><D:displayname/><D:getlastmodified/><D:creationdate/><D:getcontentlength/><D:quota-used-bytes/><D:resourcetype><D:collection/></D:resourcetype><L:mode/><D:owner></D:owner><D:group></D:group></D:prop></D:propfind>]

DAVIX(body): Read block (1101 bytes):
[
<D:multistatus xmlns:D="DAV:" xmlns:ns1="http://apache.org/dav/props/" xmlns:ns0="DAV:">
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/" xmlns:lp3="LCGDM:">
<D:href>/tmp/stupid&dir</D:href>
<D:propstat>
<D:prop>
lp1:getcontentlength33</lp1:getcontentlength>
lp1:getlastmodifiedTue, 28 Jul 2020 08:12:40 GMT</lp1:getlastmodified>
lp1:resourcetype<D:collection/></lp1:resourcetype>
lp1:iscollection1</lp1:iscollection>
lp1:executableT</lp1:executable>
lp1:iscollection1</lp1:iscollection>
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/" xmlns:lp3="LCGDM:">
<D:href>/tmp/stupid&dir/double>stuupid&file</D:href>
<D:propstat>
<D:prop>
lp1:getcontentlength0</lp1:getcontentlength>
lp1:getlastmodifiedTue, 28 Jul 2020 08:12:40 GMT</lp1:getlastmodified>
lp1:iscollection0</lp1:iscollection>
lp1:executableF</lp1:executable>
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
</D:multistatus>
]

double>stuupid&file

@ffurano
Copy link
Contributor

ffurano commented Jul 29, 2020

ouch now I see.... Github removes the XML escaping when pasting...

@ffurano
Copy link
Contributor

ffurano commented Jul 29, 2020

`$ davix-ls 'http://littlexrdhttp.cern.ch:1094/tmp/stupid&dir' --trace body
DAVIX(body): Body block (303 bytes):
[<D:propfind xmlns:D="DAV:" xmlns:L="LCGDM:"><D:prop><D:displayname/><D:getlastmodified/><D:creationdate/><D:getcontentlength/><D:quota-used-bytes/><D:resourcetype><D:collection/></D:resourcetype><L:mode/><D:owner></D:owner><D:group></D:group></D:prop></D:propfind>]

DAVIX(body): Read block (1101 bytes):
[
<D:multistatus xmlns:D="DAV:" xmlns:ns1="http://apache.org/dav/props/" xmlns:ns0="DAV:">
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/" xmlns:lp3="LCGDM:">
<D:href>/tmp/stupid&dir</D:href>
<D:propstat>
<D:prop>
lp1:getcontentlength33</lp1:getcontentlength>
lp1:getlastmodifiedTue, 28 Jul 2020 08:12:40 GMT</lp1:getlastmodified>
lp1:resourcetype<D:collection/></lp1:resourcetype>
lp1:iscollection1</lp1:iscollection>
lp1:executableT</lp1:executable>
lp1:iscollection1</lp1:iscollection>
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/" xmlns:lp3="LCGDM:">
<D:href>/tmp/stupid&dir/double>stuupid&file</D:href>
<D:propstat>
<D:prop>
lp1:getcontentlength0</lp1:getcontentlength>
lp1:getlastmodifiedTue, 28 Jul 2020 08:12:40 GMT</lp1:getlastmodified>
lp1:iscollection0</lp1:iscollection>
lp1:executableF</lp1:executable>
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
</D:multistatus>
]

double>stuupid&file`

@ffurano
Copy link
Contributor

ffurano commented Jul 29, 2020

Well... apparently there's no way to paste verbatim on Github. Please trust me that the response is properly escaped now :-P

@InfinityTotality
Copy link
Author

InfinityTotality commented Jul 29, 2020

Haha. I will take your word for it. I'll have to sit down and see if I can build from source at some point to test. Thanks for addressing this so quickly!

@bbockelm It does seem to work well on directories with no problematic filenames using either Windows Explorer or WinSCP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants