Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on p:directory-list required #89

Closed
xml-project opened this issue May 17, 2019 · 12 comments

Comments

Projects
None yet
3 participants
@xml-project
Copy link
Contributor

commented May 17, 2019

After reading the changed specs, some question came up:

  1. What is the precise behaviour if recursive=true and the directory-name is excluded: I think this means, that also all entries of this directory are excluded. Right?

  2. On the other hand (recursive=true) if the directory passes the regex-test, it does not mean that all entries of this directory are included, but they have to pass the regex-test too. Right?

  3. I thought recursion and path-relative matching should be used on p:directory-list-load too. Is this just an oversight or did I miss something?

  4. The output port signature of p:load-directory-list ("application/xml") was not changed to "any". Why?

@gimsieke

This comment has been minimized.

Copy link
Contributor

commented May 17, 2019

  1. I thought about another thing: The slash that will be prepended to a relative path prior to matching. It is unnecessary because we can use a start-of-string anchor, ^, in the regex. So if the relative path is a/a/b/file.txt, ^a/ will match just fine.

After reading the changed specs, some question came up:

1. What is the precise behaviour if recursive=true and the directory-name is excluded: I think this means, that also all entries of this directory are excluded. Right?

Yes, we should state explicitly that if a directory is excluded, all of its content is excluded, too.

2. On the other hand (recursive=true) if the directory passes the regex-test, it does not mean that all entries of this directory are included, but they have to pass the regex-test too. Right?

This needs to be clarified, too. So ^a/ will match a/a/b/file.txt (among other things), while ^a/$ will only match the directory a/. The regex (^|/)a/$ will match a/ and a/a/, while /a/$ will only match a/a/. I think this is why Norm proposed (in London last year) that we prepend a slash to the relative path. Then /a/ will match any of the a directories, without the need for special treatment of the first directory.

In addition, we should say for recursive="true" that if a relative path is matched by an include filter, all its ancestor directories (but not their content if not included explicitly) will be included, too. This means that file.txt$ will produce

<c:directory xml:base="file:/home/jane/` name="jane">
  <c:directory xml:base="a/" name="a">
    <c:directory xml:base="a/a/" name="a">
      <c:directory xml:base="a/a/b/" name="b">
        <c:file xml:base="a/a/b/file.txt" name="file.txt"/>
      </c:directory>
    </c:directory>
  </c:directory>
</c:directory>

excluding all other files and subdirectories below the given directories.

3. I thought recursion and path-relative matching should be used on p:directory-list-load too. Is this just an oversight or did I miss something?

4. The output port signature of p:load-directory-list ("application/xml") was not changed to "any". Why?

Due to oversight and lack of time yesterday, I didn’t change anything about p:load-directory-list yet. Will amend it today.

Final question: Is an option max-depth useful in conjunction with recursive="true"? It might improve performance, compared to, for ex., exclude-filter="^([^/]+/){3,}", where the exclude filter is only applied when all infinite-depth relative paths have been computed.

@xml-project

This comment has been minimized.

Copy link
Contributor Author

commented May 25, 2019

Since we now match (parts) of the entry's path and no longer its name (as in XProc 1.0) I think the complete URI should appear on c:directory, c:file and c:special.

@gimsieke

This comment has been minimized.

Copy link
Contributor

commented May 25, 2019

Matching will be performed against a relative path. I don’t think having the full URI on an element will help with matching since you will always need to subtract the base directory from this path prior to matching.
I’d rather see @xml:base on each entry as sketched above, relative to the absolute @xml:base that is attached to the base directory element. This way, you can just use base-uri() on any sub-element to obtain its absolute path. It will be resolved against the absolute base URI of the top-level c:directory element.

@xml-project

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2019

@gimsieke Actually you can trust me, that I know my problems and I think about my proposals before I file a PR. Having the absolute URI on each element makes it super easy to load the respective document. There is no way to doubt this.
Apart from this, I do not agree with your approach to withhold data because they might be reconstructed in some other (complex) way. Having the URI and a relative path does not exclude each other.

@gimsieke

This comment has been minimized.

Copy link
Contributor

commented May 26, 2019

What is difficult about invoking base-uri(.) on a c:file attribute?
We already use @xml:base for the absolute URI of the top-level element. If we attach the relative paths as @xml:base to each other element down the tree, the absolute URI for each item is just what base-uri() returns.
This then already provides enough information to do both without complex computations: find out the absolute URI of an element (using base-uri()) and match the relative path of an element against a pattern (using the relative @xml:base attributes).
An additional benefit is that it doesn’t bloat the document with largely redundant common paths.

@xml-project

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2019

  1. Just write down your XPath expression and you will see how different it is from
    p:load href="{c:file/@uri}".
  2. I proposed a PR, while your approach is not specified in the specs yet.
  3. "Bloating" an intermediate doc like c:directory does not sound like a good argument to me.

But anyway: As you saw I withdraw my proposal, so you do not need to argue anymore.

@gimsieke

This comment has been minimized.

Copy link
Contributor

commented May 26, 2019

1. Just write down your XPath expression and you will see how different it is from
   `p:load href="{c:file/@uri}"`.

p:load href="{c:file/base-uri(.)}"

@xml-project

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2019

No! Check it! It does not work!

@gimsieke

This comment has been minimized.

Copy link
Contributor

commented May 26, 2019

Ah, sorry. When I checked it a couple of day ago in oXygen, I (to my surprise) received the base URI file:/home/jane/a/a/b/file.txt for the c:file in the example above.
I was surprised because I thought that each base URI is resolved against the base URI of the surrounding node rather than to the last absolute base URI when looking upwards. I don’t know how I managed to produce this unexpected result, but I somehow then believed this was in accordance with the spec.
When I checked again in oXygen, I receive file:/home/jane/a/a/a/a/a/b/a/a/b/file.txt for the example above, which is in line with what I previously expected.
So I agree that @xml:base doesn’t solve the problem of computing the absolute path – unless we use the @name values for @xml:base, with trailing slashes.
So unless we use @xml:base in this way (which would be quite redundant with @name), we can indeed use an attribute for the complete path.
For this path, I’d rather use a relative path in relation to the base URI, as it is in the zip directory, c:zipfile. In c:zipfile, it is called @name and contains the relative path (c:zipfile is a flat structure). I wouldn’t propose to use relative paths for the @name attributes in the result of p:directory list though.
So we can now discuss whether we want to have an attribute for full or relative URIs on each element.
Sorry again for the confusion.

gimsieke added a commit to gimsieke/3.0-steps that referenced this issue Jun 10, 2019

ndw added a commit that referenced this issue Jun 10, 2019

Merge pull request #105 from gimsieke/directory-list
address issues discussed in #89

@ndw ndw changed the title Clarification on p:directory-list and p:load-directory-list required Clarification on p:directory-list required Jun 10, 2019

@ndw

This comment has been minimized.

Copy link
Collaborator

commented Jun 10, 2019

  • Instead of ‘recursive’, use a ‘max-depth’ attribute with the
    value ‘unbounded’ or a non-negative integer. Default value is
    “1”, the immediate members of the directory. A value of 0
    returns information about the directory, but none of its
    children.
  • Each of the c:file and c:directory (and c:other) children will
    have an xml:base attribute so that the full URI of the item can
    be easily determined.

gimsieke added a commit to gimsieke/3.0-steps that referenced this issue Jun 10, 2019

@gimsieke gimsieke referenced this issue Jun 10, 2019

Merged

Directory list #111

@xml-project

This comment has been minimized.

Copy link
Contributor Author

commented Jun 14, 2019

@gimsieke : Can we close this issue?

@gimsieke

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2019

I think so

@gimsieke gimsieke closed this Jun 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.