Introduce a file system path → URI normalization function? #451

Open
opened this Issue Jul 13, 2018 · 8 comments

Projects
None yet
4 participants
Contributor

gimsieke commented Jul 13, 2018

 In the light of the discussions in #325 and #449, I’d like to propose a function p:urify() as described in #325 (comment). It should basically convert file system paths to file: URIs. This function is useful because users often like to submit file system paths instead of URIs as options, while most steps with an href or similar option only accept URIs. The issue has become more pressing because with #449 we introduced space-separated href values. When heuristically accepting file system paths instead of proper URIs, a processor must not interpret spaces in the argument as part of the (file system) name. Instead, each space must be regarded as separating (possibly relative) URIs. We should exonerate users from the burden of percent-escaping, file:-prepending, … file system paths. p:urify() should by and large be idempotent if the argument is already recognized as an absolute URI. It could, however, perform additional normalizations on URIs such as eliminating consecutive duplicate slashes or, if the argument is determined to represent a directory, append a trailing slash. When passing user-supplied paths to a step that accepts space-separated URIs, a pipeline author could just apply p:urify to the paths:   (assuming that a sequence of URIs will be flattened to a space-separated string that is then tokenized and each token is cast as a URI). Relative paths should be resolved to the current working directory (as currently returned by pos:info, not to the pipeline’s base URI. Path separators should be interpreted as per the system property reported by pos:info, with the exception that if the separator is reported as \, / should also be interpreted as a separator. We have implemented a similar function as a step, tr:file-uri. It does a bit more than the suggested function, namely optional catalog resolution and optional retrieval of HTTP resources (stored in a temporary directory). It returns an XML structure with the local URI, the current working directory, the last path component, etc. I see some value in having a function though because it can be used in AVTs and XPath expressions in general, without the need to insert another step into the subpipeline. It would still be nice to be able to use a catalog resolver even for URIs that don’t refer to XML resources. Maybe p:urify() can accept an optional parameter map that provides implementation-defined functionality, such as { 'use-xml-catalogs': true(), 'fetch-http': true() }.
Contributor

eriksiegel commented Jul 14, 2018

 Like Gerrit, I've written these kinds of functions many times in my XSLT and XQuery programs. So yes, good idea.
Contributor

ndw commented Jul 14, 2018

 👍 The name p:urify() doesn't fill my heart with joy, but we can fuss about that later.
Contributor

gimsieke commented Jul 14, 2018

 It's a bit of a p:un, but we can certainly agree on a different name, such as p:file-uri
Contributor

ndw commented Jul 15, 2018

 🤦‍♂️ I totally didn’t see ‘p:urify’. That is good. 😆

Contributor

ndw commented Sep 5, 2018 • edited

 Some examples: c:\path\to\file => file:/c:/path/to/file \\hostname\path\to\file => file://///hostname/path/to/file probably others
Contributor

eriksiegel commented Sep 5, 2018

 a b ==> a%20b

Contributor

gimsieke commented Sep 5, 2018

 Spec this out (as an operation from a string to a string)

dmj commented Sep 5, 2018

 Documents to consider: RFC8090 "file" Scheme, include Appendix D File URIs in Windows