Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect URL reading requests inside of p:xslt to results of other steps #47

Open
nkutsche opened this Issue Feb 11, 2019 · 11 comments

Comments

Projects
None yet
4 participants
@nkutsche
Copy link

nkutsche commented Feb 11, 2019

In 1.0 many times I have the problem that existing XSLTs using the fn:doc() function to point to any file on the disk, but I would like to provide the document by an XProc step.

We had a discussion at Prague about that and noticed that it is a bit asymmetrical, to allow the catching of writing requests (with the secondary step), but not the reading requests.

Is there any update in 3.0?

@gimsieke

This comment has been minimized.

Copy link

gimsieke commented Feb 11, 2019

I don’t think there is any change wrt what was already available in 1.0. In particular, there is no standard way to make another step’s results available to fn:doc() under a special URI.
In order for the XProc processor to be able to calculate the dependency graph, the other step’s output needs to somehow connect to an input of p:xslt. A magic connection from a URI to another step’s result IMHO would need a concept of implicit connections. An XProc processor would need to analyze which URIs are being called (dynamically) by fn:doc() in the XSLT step and then see whether there is a step that provides the output that is associated with that URI. This way, the processor cannot build the dependency graph statically (at compile time). Therefore, unless I failed to understand your requirement correctly, this doesn’t fit into the XProc processing model.

What you can already do, also in 1.0: Connect the output of other steps to the source port of the XSLT step. Assuming that the other step’s output may be unambiguously identified by its base URI, top-level-element name or namespace, attributes, or other means, you can just use the default collection in order to access this document.

If you intend to use your XSLT standalone with fn:doc() and in XProc, one way to go is to use xsl:import. Suppose your base XSLT contains

<xsl:variable name="other-step-result" as="document-node(element(foo))?" 
  select="if (doc-available('foo-result.xml')) then doc('foo-result.xml') else ()"/>

you can import this stylesheet from an XProc-enabled stylesheet that overrides this variable:

<xsl:variable name="other-step-result" as="document-node(element(foo))?" 
  select="collection()[ends-with(base-uri(), 'foo-result.xml')]"/>

(assuming that its base URI ends in 'foo-result.xml', which it will do automatically if it was produced by another XSLT step by <xsl:result-document href="foo-result.xml">…).

You need to connect the foo output like this:

<p:xslt>
  <p:with-input port="stylesheet">…</p:with-input>
  <p:with-input port="source">
    <p:pipe port="secondary" step="foo"/>
    <p:pipe port="result" step="bar"/>
  </p:with-input>

The secondary port of the foo XSLT step gives you all the documents that have been written using xsl:result-document from the foo step.

Does this kind of answer your question?

@ndw

This comment has been minimized.

Copy link
Collaborator

ndw commented Feb 11, 2019

I don't think @gimsieke 's solution is a general solution to the problem that @nkutsche is describing.
Everytime the question of a document manager comes up, it turns out to be very difficult.
One problem is that the base URI doesn't change as documents flow through many steps, and the order of execution of steps is not always 100% determinate, so there may be many documents "in the pipeline" with a particular base URI and you won't always get the same one.
I wonder if a "p:cache()" step that operates like the identity step but caches the base URI of the documents that flow through it so that they can be retrieved later by the resolver would be a good idea?

@nkutsche

This comment has been minimized.

Copy link
Author

nkutsche commented Feb 11, 2019

Thanks for the answers. So it seems more complex than I thought. Maybe it helps, when I describe how I would expect it:

<p:identity name="secondary-src">
    <p:input port="source">
        <p:inline>
            <root xml:base="https://github.com/xproc/foo.xml"/>
        </p:inline>
    </p:input>
</p:identity>
<p:xslt>
    <p:input port="source"><!-- ... --></p:input>
    <p:input port="stylesheet">
        <p:inline>
            <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
                <xsl:template match="/">
                    <xsl:copy-of select="doc('https://github.com/xproc/foo.xml')"/>
                </xsl:template>
            </xsl:stylesheet>
        </p:inline>
    </p:input>
    <p:input port="secondary-src">
        <p:pipe port="result" step="secondary-src"/>
    </p:input>
</p:xslt>

I had thought just with an URI Resolver for the Saxon this could be manageable using the following rule: Look for the requested URI at first into the secondary-src port for a document with the requested base URI.

@ndw

This comment has been minimized.

Copy link
Collaborator

ndw commented Feb 11, 2019

Oh, that's an interesting idea. It never occurred to me that the documents you might want to refer to would be explicitly passed in on a port. That's much more tractable, though it does raise the question of binding for that port.
I wonder if we could do this with a p:documents parameter that takes as its value an array of documents. Or is that too much trouble for the author?

@nkutsche

This comment has been minimized.

Copy link
Author

nkutsche commented Feb 11, 2019

I'm a bit confused now. Can't this be just an input port which expects a sequence which is by default empty?

@ndw

This comment has been minimized.

Copy link
Collaborator

ndw commented Feb 11, 2019

It can, but it'll be awfully inconvenient in what I think is the overwhelmingly more common case where you don't want/need any such documents. In that case you'll have to explicitly bind that port to empty. (Or it'll get bound to the default readable port which might have unanticipated consequences.)

@nkutsche

This comment has been minimized.

Copy link
Author

nkutsche commented Feb 11, 2019

That's maybe my confusion. For me a default binding means, that I can skip the binding completely, because there is a default. Doesn't the spec says that the default readable port is only connected automatically to primary ports? An unbinded secondary port would use the binding of it's p:input, wouldn't it? And this could contain a p:empty...

@xml-project

This comment has been minimized.

Copy link
Contributor

xml-project commented Feb 12, 2019

I am not sure, I missed something important. If I understand the problem right, we want a hard-coded uri for a document to be used, if the stylesheet runs standalone or if no value is provided for it on p:xslt. If a document is provided, this should be used.
I had a similar problem last year and I came up with this solution: The port source on p:xslt is a sequence port, where the first document is the context item for the transformation and the whole sequence is the default collection.

In the stylesheet I can decide whether to use the hard-coded document with doc() or the supplied value with collection().

<xsl:value-of select="if (count(collection())=2) 
  then collection()[2] 
  else doc('some-url')" />

Given this, when I call p:xslt with a single document on source, the hard-coded uri is used, but if I bind two documents to source, the second one is used in the stylesheet.

Does this solve the problem?

@ndw

This comment has been minimized.

Copy link
Collaborator

ndw commented Feb 12, 2019

Only in a limited case. You might have a stylesheet which refers to several documents and it might not be straightforward to edit it. Consider trying to change the DocBook stylesheets so that a particular doc() reference can be replaced by a generated document.

For me, the canonical example for this feature is XInclude. I have a document that XIncludes a file and in this particular circumstance I want to use a document that I've previously generated in the pipeline to fulfill the request for that document.

I think it's a valuable feature, but it's not easy to specify.

@ndw

This comment has been minimized.

Copy link
Collaborator

ndw commented Feb 12, 2019

@nkutsche all input ports have to have bindings. So if there's a port it either will be bound to the default readable port or it will be an error to leave it not explicitly bound. That's the trouble.

@nkutsche

This comment has been minimized.

Copy link
Author

nkutsche commented Feb 12, 2019

@ndw: ok, thanks. Sounds a bit strange, but this is not the right place to open a discussion about it. :-)

I wonder if we could do this with a p:documents parameter that takes as its value an array of documents. Or is that too much trouble for the author?

If an input port does not provide a solution, I think a parameter would be great too. If I understand it correctly, it would be as much effort as if I provide a document to an XSLT parameter, wouldn't it? This should be sufficient. As you pointed out, in 90+% of the use cases you don't need this. And if you need it you will be glad, if you have the possibility and will not complain about to complex work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.