Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System dependencies in p:urify() #934

Closed
xml-project opened this issue Jan 2, 2020 · 6 comments
Closed

System dependencies in p:urify() #934

xml-project opened this issue Jan 2, 2020 · 6 comments

Comments

@xml-project
Copy link
Member

Currently we say:

Otherwise, if the operating system’s file separator equals the backslash “\”, every backslash in the argument will be replaced with a forward slash.

I take this to mean, that the replacement does only takes place on those systems, e.g. on Windows.

IMHO this is odd and might lead to unexpected consequences: It's odd, because a test suite running on a UNIX system can never test, whether the implementation (if it would run on Windows) returns the correct results. For the unexpected results, consider:

<p:load href="p:urify('folder\folder\file.xml')" />

On a Windows system would work (provided the file exists in the CWD). But if you run this on Unix an XD0064 would be raised (even if the file exists) because '\' is not replaced and therefor the URI is not valid.
Did I miss something? Do we really want this? Why is the replacement relative to the system's file separator? Shouldn't we replace '\' in all cases to '/'?

@ndw
Copy link
Contributor

ndw commented Jan 2, 2020

RFC 1738 (the old URI spec) used to describe “\” as among the “unsafe” characters. I thought there was a production called “unwise” at one point, but I can’t find it. RFC 3986 doesn’t seem to describe the unsafe characters anymore, I’m not sure why. However, RFC 3987 still says that “\” is not allowed in a URI. (Unfortunately, it says this in passing, in prose, and not in a formal way, AFAICT.)

I’ve certainly internalized the idea that un-percent-escaped “\” characters are not allowed in URIs. I propose that p:urify should always turn “\” into “/”.

@gimsieke
Copy link
Contributor

gimsieke commented Jan 2, 2020

On Linux I can create files whose names contain backslashes:

gerrit@edna:~$ uname -a
Linux edna 3.2.0-4-amd64 #1 SMP Debian 3.2.96-2 x86_64 GNU/Linux
gerrit@edna:~$ echo test > 'folder\folder\file.txt'
gerrit@edna:~$ file folder\\folder\\file.txt
folder\folder\file.txt: ASCII text

On Unix, p:urify() needs to percent-encode the backslash, so the URI will be like file:///…/folder%5Cfolder%5Cfile.txt.

This is not purely hypothetical, we already saw backslashes in file names, although inserted unintentionally. I wouldn’t, for example, be able to run p:file-info on such a file if p:urify() blindly assumed backslashes were equivalent to forward slashes.

We need to have a test result comparison that takes into account on which platform the processor runs / what the file system separator is. See xproc/3.0-test-suite#132 and https://gist.github.com/gimsieke/7a9362f47aa8b38dc1b2fc4bced79bf4

@ndw
Copy link
Contributor

ndw commented Jan 2, 2020

Righto. Percent escape them on Unix, turn them into "/" on Windows. So be it.

@xml-project
Copy link
Member Author

OK, I missed the the URL-escaping for Linux. But ...
To my reading then we have to live with the fact that

<p:load href="p:urify('folder\folder\file.xml')" />

leads to different results if I run it on Windows or Linux?

And: To test this, I think we need to be able to fake the actually used OS and say: Whatever you are: Run this test as you were a Windows / Linux. Right?

@gimsieke
Copy link
Contributor

gimsieke commented Jan 2, 2020

OK, I missed the the URL-escaping for Linux. But ...
To my reading then we have to live with the fact that

<p:load href="p:urify('folder\folder\file.xml')" />

leads to different results if I run it on Windows or Linux?

Yes

And: To test this, I think we need to be able to fake the actually used OS and say: Whatever you are: Run this test as you were a Windows / Linux. Right?

We can discuss what to do in the upcoming call. I’d prefer not to fake them but to use what is actually reported as OS and separator and just accept that also the result is platform-dependent. I tried to anticipate this in the Gist: If the OS does not identify as Windows, an error is expected for a path such as //www.acme.com/lib/acme.js, for Windows it’s file://www.acme.com/lib/acme.js

@xml-project
Copy link
Member Author

OK. I think we can close this. Any objection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants