Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p:split-sequence for non-xml documents #362

Open
xml-project opened this issue Feb 27, 2020 · 11 comments
Open

p:split-sequence for non-xml documents #362

xml-project opened this issue Feb 27, 2020 · 11 comments
Assignees

Comments

@xml-project
Copy link
Contributor

@xml-project xml-project commented Feb 27, 2020

While twiddling with p:unarchive on a word document yesterday, I was quite surprised that the following does not work:

<p:unarchive>
  <p:with-input href="some-document" />
</p:unarchive>

<p:split-sequence test="ends-with(p:document-property(., 'base-uri'), 'word/document.xml')" />

I expected this to work and I do think, it would be quite useful to split sequences on the base of document properties. However the port sourceon p:split-sequence is defined as "xml html". So it does not work if there are other documents on the port.

Is there any reason I do not see why we need the restriction on the input port? Would it not make sense to have content-type="any" for the input and the output ports?

@gimsieke

This comment has been minimized.

Copy link
Contributor

@gimsieke gimsieke commented Feb 27, 2020

I don’t see a reason to restrict it to xml html, either.

@xml-project

This comment has been minimized.

Copy link
Contributor Author

@xml-project xml-project commented Feb 27, 2020

@ndw @xatapult Objections?

@ndw

This comment has been minimized.

Copy link
Collaborator

@ndw ndw commented Feb 27, 2020

No objections from me. What is the consequence of attempting to evaluate an XPath expression on a non-XML document in this case? Suppose the test is count(//*) gt 3 for example? Is the step required to raise an error (a particular error?) or does the test simply return false?

@gimsieke

This comment has been minimized.

Copy link
Contributor

@gimsieke gimsieke commented Feb 27, 2020

For JSON documents (represented as maps, for example), a static error “XPTY0020: the context item is not a node” will be raised if you try to refer to the context, at least this is what Saxon does.

For other documents, we say that they are “empty XDM documents”. What does that mean? If they are represented by an otherwise empty document node, //* will return the empty sequence.

@xml-project

This comment has been minimized.

Copy link
Contributor Author

@xml-project xml-project commented Feb 27, 2020

Right: For count(//*) gt 3 you will get an error, if it is a JSON document. But this is a special (bad?) example because there are a lot of XPath expressions making sense here.

I think we do not need to change anything in the inner logic of p:split-sequence: The prose says thats the document is taken as the context item and the decision is made based on the effective boolean value of the test-expression.
Pipeline authors therefore could use any XPath expression they like. Some, like the one @ndw mentioned, will raise an error if I carelessly suppose that all documents will have a document node. So the right way for the count-example would probably be:

<p:split-sequence test="if (. instance of node()) then count(//*)=3 else false()" />
@xatapult

This comment has been minimized.

Copy link
Contributor

@xatapult xatapult commented Feb 27, 2020

No objections (at all). On the contrary: all for any

@xml-project

This comment has been minimized.

Copy link
Contributor Author

@xml-project xml-project commented Mar 4, 2020

After playing a bit with the XSLT 3.0 rule, that an error in a pattern is a non-match, I have to say, I realy like it.
Therefor I changed my mind and would propose, that the XPath expression in p:split-sequence is applied to the documents in turn: If the result is an effective-boolean-true, the document appears on the matchport, if it is an effective-boolean-false or an error is raised, the document appears on the non-match port.
Does this make sense to anyone?
Do we need an additional attribute fail-on-error (false by default), that make the step explicitly raise an error instead of sorting the document into the non-matched sequence? I think this is a no-brainer but I wonder whether the additional complexity is worth it.
Thoughts?

@gimsieke

This comment has been minimized.

Copy link
Contributor

@gimsieke gimsieke commented Mar 4, 2020

I’m a bit hesitant because in all other cases where we match by XPathExpression or XSLTSelectionPattern (for example, p:add-attribute/@match), we don’t treat errors as non-match but as errors.

And I don’t think we should extend the XSLT rule that an error in a pattern (or in an expression, in our case here) is a non-match to all other XPathExpression and XSLTSelectionPattern attributes. (I’m not implying that you suggested this, I’m just pondering whether this is a rule that should be generalized.)

Apart from split-sequence, it might make sense to treat errors as false() in p:wrap-sequence/@group-adjacent so that two adjacent documents with either erroneous or false()-returning expressions will be grouped (because deep-equal(false(),false())=true()).

If we do so for both p:split-sequence and p:wrap-sequence, a fail-on-error option might be useful. If you as an implementor are willing to accept the additional complexity, that is.

@xml-project

This comment has been minimized.

Copy link
Contributor Author

@xml-project xml-project commented Mar 4, 2020

To my understanding all XSLTSelectionPattern are defined by XSLT 3.0, so an error is a non-match by definition. Is there are ruling in XProc saying the opposite?
Apart from that: Expanding it to XPathExpressions is a point where I am also not sure, we want to do this. Pro: Shorter XPath-Expressions because I do not need a if () to prevent an error (if I do not want one). Con: It would be OUR rule and might lead to confusion.

Don't know. May be we should call it all of and leave p:split-sequence XML/HTML only. If you want to split a mixed sequence, you could always use a p:for-each with a p:choose as child.

@gimsieke

This comment has been minimized.

Copy link
Contributor

@gimsieke gimsieke commented Mar 4, 2020

You are right, apart from certain areas such as AVTs or when we refer to context items (as in count(//*)) in conditionals or when declaring variables we don’t say that an error must be thrown if an XPath expression is erroneous.
But we shouldn’t leave it to an implementation to decide whether an XPath error is thrown or whether the expression will be tacitly assumed as false() in certain contexts.

This and the original question is maybe something for our next call.

@xml-project

This comment has been minimized.

Copy link
Contributor Author

@xml-project xml-project commented Mar 4, 2020

But we shouldn’t leave it to an implementation to decide whether an XPath error is thrown or whether the expression will be tacitly assumed as false() in certain contexts.

Definitely not!

@ndw ndw self-assigned this Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.