New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a file system path → URI normalization function? #451

Open
gimsieke opened this Issue Jul 13, 2018 · 8 comments

Comments

Projects
None yet
4 participants
@gimsieke
Contributor

gimsieke commented Jul 13, 2018

In the light of the discussions in #325 and #449, I’d like to propose a function p:urify() as described in #325 (comment).

It should basically convert file system paths to file: URIs.

This function is useful because users often like to submit file system paths instead of URIs as options, while most steps with an href or similar option only accept URIs.
The issue has become more pressing because with #449 we introduced space-separated href values. When heuristically accepting file system paths instead of proper URIs, a processor must not interpret spaces in the argument as part of the (file system) name. Instead, each space must be regarded as separating (possibly relative) URIs.
We should exonerate users from the burden of percent-escaping, file:-prepending, … file system paths.

p:urify() should by and large be idempotent if the argument is already recognized as an absolute URI. It could, however, perform additional normalizations on URIs such as eliminating consecutive duplicate slashes or, if the argument is determined to represent a directory, append a trailing slash.

When passing user-supplied paths to a step that accepts space-separated URIs, a pipeline author could just apply p:urify to the paths:

  <p:document href="{$paths ! p:urify()}"/>

(assuming that a sequence of URIs will be flattened to a space-separated string that is then tokenized and each token is cast as a URI).

Relative paths should be resolved to the current working directory (as currently returned by pos:info, not to the pipeline’s base URI.

Path separators should be interpreted as per the system property reported by pos:info, with the exception that if the separator is reported as \, / should also be interpreted as a separator.

We have implemented a similar function as a step, tr:file-uri. It does a bit more than the suggested function, namely optional catalog resolution and optional retrieval of HTTP resources (stored in a temporary directory). It returns an XML structure with the local URI, the current working directory, the last path component, etc.

I see some value in having a function though because it can be used in AVTs and XPath expressions in general, without the need to insert another step into the subpipeline.

It would still be nice to be able to use a catalog resolver even for URIs that don’t refer to XML resources. Maybe p:urify() can accept an optional parameter map that provides implementation-defined functionality, such as { 'use-xml-catalogs': true(), 'fetch-http': true() }.

@eriksiegel

This comment has been minimized.

Show comment
Hide comment
@eriksiegel

eriksiegel Jul 14, 2018

Contributor

Like Gerrit, I've written these kinds of functions many times in my XSLT and XQuery programs. So yes, good idea.

Contributor

eriksiegel commented Jul 14, 2018

Like Gerrit, I've written these kinds of functions many times in my XSLT and XQuery programs. So yes, good idea.

@ndw

This comment has been minimized.

Show comment
Hide comment
@ndw

ndw Jul 14, 2018

Contributor

👍

The name p:urify() doesn't fill my heart with joy, but we can fuss about that later.

Contributor

ndw commented Jul 14, 2018

👍

The name p:urify() doesn't fill my heart with joy, but we can fuss about that later.

@gimsieke

This comment has been minimized.

Show comment
Hide comment
@gimsieke

gimsieke Jul 14, 2018

Contributor

It's a bit of a p:un, but we can certainly agree on a different name, such as p:file-uri

Contributor

gimsieke commented Jul 14, 2018

It's a bit of a p:un, but we can certainly agree on a different name, such as p:file-uri

@ndw

This comment has been minimized.

Show comment
Hide comment
@ndw

ndw Jul 15, 2018

Contributor

🤦‍♂️ I totally didn’t see ‘p:urify’. That is good. 😆

Contributor

ndw commented Jul 15, 2018

🤦‍♂️ I totally didn’t see ‘p:urify’. That is good. 😆

@gimsieke gimsieke added the core-spec label Sep 5, 2018

@ndw

This comment has been minimized.

Show comment
Hide comment
@ndw

ndw Sep 5, 2018

Contributor

Some examples:

  • c:\path\to\file => file:/c:/path/to/file
  • \\hostname\path\to\file => file://///hostname/path/to/file
  • probably others
Contributor

ndw commented Sep 5, 2018

Some examples:

  • c:\path\to\file => file:/c:/path/to/file
  • \\hostname\path\to\file => file://///hostname/path/to/file
  • probably others
@eriksiegel

This comment has been minimized.

Show comment
Hide comment
@eriksiegel

eriksiegel Sep 5, 2018

Contributor

a b ==> a%20b

Contributor

eriksiegel commented Sep 5, 2018

a b ==> a%20b

@gimsieke gimsieke self-assigned this Sep 5, 2018

@gimsieke

This comment has been minimized.

Show comment
Hide comment
@gimsieke

gimsieke Sep 5, 2018

Contributor

Spec this out (as an operation from a string to a string)

Contributor

gimsieke commented Sep 5, 2018

Spec this out (as an operation from a string to a string)

@dmj

This comment has been minimized.

Show comment
Hide comment
@dmj

dmj Sep 5, 2018

Documents to consider:

dmj commented Sep 5, 2018

Documents to consider:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment