Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Media type of atomic step results #761

Open
xml-project opened this Issue Feb 11, 2019 · 19 comments

Comments

Projects
None yet
5 participants
@xml-project
Copy link
Contributor

xml-project commented Feb 11, 2019

@Conal-Tuohy wrote on issue #529
On a different matter, I was wondering if there was a need to coin the new media type application/vnd.xproc+atomic, or if application/json could be used instead. Initially I thought this would be kosher, but then checked up by reading the registration document for application/json and I saw that a "JSON text" would need to be either a JSON object or JSON array (=XDM map or XDM array) and that literal types were not allowed. So I deleted my original comment.

But since then, I read https://tools.ietf.org/html/rfc8259#page-5 and realise that this restriction has been relaxed.

A JSON text is a serialized value. Note that certain previous specifications of JSON constrained a JSON text to be an object or an array.

So if literals like "foo" are now valid instances of application/json, I'd like to suggest dropping application/vnd.xproc+atomic, and instead treating these atomic values as part of the same case as XDM maps and arrays.

@eriksiegel

This comment has been minimized.

Copy link
Contributor

eriksiegel commented Feb 11, 2019

Is that true for all atomic types? dates/times? durations? If not we have to make a distinction.

@xml-project

This comment has been minimized.

Copy link
Contributor Author

xml-project commented Feb 11, 2019

@eriksiegel
I totally agree with you: Taking XDM atomics as JSON is lossless only for xs:string (not for the subtypes) -> JSON string and xs:boolean -> JSON boolean. For all number in XDM there would be a loss with JSON number.

And all other types had to be stringified to represent them in JSON, thereby loosing their respective type information.

@Conal-Tuohy

This comment has been minimized.

Copy link

Conal-Tuohy commented Feb 12, 2019

You are right that subtypes of string would lose their type annotation if they were serialized as JSON, since they'd have to be converted to a plain JSON string.

All numbers in JSON are floating point. There's no formal limit to the precision, though the spec warns that specific JSON consumers may not accept very large, or very high precision numbers. Also positive and negative infinity and NaN are not allowed.

But isn't this lossiness already also the case when a step returns an XDM map or array, which are supposed to be represented as JSON documents? What happens if my XSLT returns an array of xs:dateTime values? Am I missing something?

@xml-project

This comment has been minimized.

Copy link
Contributor Author

xml-project commented Feb 12, 2019

Good point. That raises the even more general question if the following is an error or not:

<p:identity>
  <p:with-input>
    <p:inline content-type="application/json">{map{{'time': current-dateTime()}} }</p:inline>
  </p:with-input>
</p:identity>

The XPath expression and the map constructed are both valid, but they are not proper JSON, right?

@Conal-Tuohy

This comment has been minimized.

Copy link

Conal-Tuohy commented Feb 13, 2019

The resulting document can be serialized as JSON, though. That's the main point, is it not? i.e. that they are JSON-compatible.

@Conal-Tuohy

This comment has been minimized.

Copy link

Conal-Tuohy commented Feb 14, 2019

It seems to me that an XProc document's media type is best seen as a serialization hint, rather than a hard constraint on the format of the data while it remains unserialized in the XProc pipeline.

Incidentally, I wonder if a similar situation with regard to type annotations exists also with XProc documents which are XML documents? If an XSLT step were to output an XML document containing typed nodes such as e.g. attributes of type xs:dateTime, but without providing a link to a schema document, then would that document not also lose those type annotations when it is eventually serialized?

@ndw

This comment has been minimized.

Copy link
Contributor

ndw commented Feb 21, 2019

Per 21 Feb editorial meeting: the content type of all atomic values is application/json, even if the value type is something like xs:dateTime which isn't really JSON. This is consistent with allowing steps to produce maps that contain xs:dateTime values and calling them application/json

@ndw ndw self-assigned this Feb 21, 2019

@Conal-Tuohy

This comment has been minimized.

Copy link

Conal-Tuohy commented Feb 22, 2019

Presumably it would just be a runtime error to attempt to serialize sugh a "JSON" document if it contained NaN?

@gimsieke

This comment has been minimized.

Copy link
Contributor

gimsieke commented Feb 22, 2019

As you suggested, the content-type document property will only be fully accounted for when serializing the document, and this SHOULD happen in accordance with what the XSLT and XQuery Serialization 3.1 spec says about application/json. And there it is:

If the numeric value cannot be represented in the JSON grammar (such as Infinity or NaN), then the serializer MUST signal a serialization error.

So I think implementers will probably raise a dynamic error when serializing NaN. But if they have good reasons not to, they can decide to serialize it otherwise, because they only SHOULD serialize according to the XQuery/XSLT serialization spec.

@Conal-Tuohy

This comment has been minimized.

Copy link

Conal-Tuohy commented Feb 25, 2019

Regarding the existing section 3.2 Creating documents from XDM step results there's a question in my mind still about whether the programmer can determine the content-type of documents produced by XSLT and XQuery steps in the stylesheet or query itself using the relevant serialization method (i.e. in the usual way that XSLT and XQuery scripts define the content type of their output(s), when run outside of an XProc pipeline). Currently section 3.2 rules that out in favour of a fixed set of rules based on the data type of the output, which means that even if I have an XSLT that produces a map and has an <xsl:output media-type="application/ld+json"/>, my p:xslt would actually output a document whose content-type document property was application/json.

@gimsieke

This comment has been minimized.

Copy link
Contributor

gimsieke commented Feb 25, 2019

Yes, now that implementers can probably use the XSLT or XQuery serialization parameters for a given result document, we should amend 3.2 to say that if an implementation has access to the intended serialization parameters, it must use them. This might have some interoperability issues. I don’t know whether we can stipulate that each implementation be able to find out the XSLT/XQuery serialization parameters that pertain to a given result document.

@xml-project

This comment has been minimized.

Copy link
Contributor Author

xml-project commented Feb 25, 2019

I think using XSLT's and/or XQuery serialization parameters for normal document output is misleading since neither XSLT or XQuery is serializing anything. Having those parameters temporarily available in one implementation does not make the situation better IMHO.
If you want XSLT to serialize your results, call fn:serialize() and return a text document to XProc.
BTW 3.2 is the general ruling for all steps, not just the one which happen to have internal serialization parameters. I would therefor argue to stick to our consent from last Thursday.

@Conal-Tuohy

This comment has been minimized.

Copy link

Conal-Tuohy commented Feb 25, 2019

I don't necessarily want a text/plain document, though; I might want, for instance, an application/ld+json document or an application/tei+xml document, or text/csv, or something. If the XSLT serialization properties are ignored, I can't see how one could have an XSLT that produced a specific document type. It seems to me that I would need to use p:cast-content-type to fix the document's type, after generating it, and that it would be impossible for an XSLT to decide on the document type dynamically. Is that right?

@Conal-Tuohy

This comment has been minimized.

Copy link

Conal-Tuohy commented Feb 25, 2019

Some context about the use case I have in mind: I have written XProc 1 pipelines which use XSLT transforms to perform HTTP content negotiation. These XSLT transforms return c:body documents with content-type attributes (whose value may vary), as wrappers around the actual content. It had seemed to me that this extra wrapper would not be needed in XProc 3 because the document-properties() function would allow such metadata to be attached to the document, but if the XSLT cannot specify the output content type then I would need to use a wrapper element, I think.

@gimsieke

This comment has been minimized.

Copy link
Contributor

gimsieke commented Feb 25, 2019

Yes, because XSLT doesn’t serialize your document, XProc does.

I doubt that a default value of text/plain and a value text/csv that your XSLT stylesheet provided makes a difference for the resulting document. The same holds for text/plain and application/ld+json. Once they are serialized, they are indistinguishable from plain text.

But there may be other parameters such as encoding, html-version, omit-xml-declaration etc. that will be ignored by the XProc processor if only specified in xsl:output (or by other XSLT means). I see that it is desirable to use the stylesheet-provided params when invoking the XSLT from an XProc pipeline.

So we have two concurrent “interop” requirements here. Yours is that you don’t want XProc to serialize XSLT output differently than standalone XSLT would do (and you don’t want to repeat yourself by specifying all parameters in the XProc again). The other interop requirement is that the results of a pipeline should be the same no matter what processor you use. One processor might be able to preserve the XSLT-intended serialization params, another might not.

Due to possible limitations of the underlying XSLT and XQuery processors, we cannot require all implementations to use the serialization options of p:xslt and p:xquery result documents though. On the other hand, if an implementation can make use of this information, I wouldn’t want to tell pipeline authors that they still have to specify all serialization parameters in the pipeline. It would be more convenient if p:xslt or p:xquery attached the requested serialization parameters in a serialization document property. Then a pipeline author can decide to prefer these parameters:

<p:variable name="default-serialization" 
  select="map{'media-type': 'text/plain', encoding: 'cp1252'}"/>
…
<p:store href="{base-uri()}" 
  serialization="map:merge((p:document-property(., 'serialization'), $default-serialization), 
                           map{'duplicates': 'use-first'})"/>

So p:store wouldn’t magically use the parameters that are attached as document properties. The pipeline author needs to consciously use them. I should also be noted that we wouldn’t change the content-type document property to, for example, application/ld+json from the original text/plain that a resulting standalone text node will get from p:xslt, as per what we decided. p:xslt, p:xquery and maybe others would optionally use a different document property, serialization, to convey the serialization intent that the XProc processor might know from the XSLT or XQuery processor that it invoked.

In order to make it even more interoperable, we could introduce a function like p:step-available() that reports whether a given step will attach serialization properties:

p:attaches-serialization-parameters($step-type as xs:string) as xs:boolean

(perhaps with a third value don’t know). A processor that will attach serialization parameters by default might provide a configuration option to switch serialization map attachment off.
This way, pipeline authors can raise an error or revert to a default serialization if the processor reports that it won’t attach any serialization maps to the result documents.

@xml-project

This comment has been minimized.

Copy link
Contributor Author

xml-project commented Feb 25, 2019

@gimsieke I am in no way opposed to making the serialization parameters of the stylesheet available to xproc processing. But I do not think that this is Conal's point.

Having that said: If we all look at the topic for this issue, it isn't about xslt and xquery at all. Last time it took me nearly an hour to move the comment to the step repo.
Please use the step repo for further discussions about p:xslt / p:xquery.

@gimsieke

This comment has been minimized.

Copy link
Contributor

gimsieke commented Feb 25, 2019

It is an overarching issue for potentially also other steps than XQuery and XSLT (SPARQL queries come to my mind). Maybe it is better to say in the core spec something along these lines:

If underlying functionality that a step wraps (like XSLT processing wrapped in p:xslt or XQuery processing wrapped in p:xquery) uses compatible serialization parameters and if these parameters are accessible to the XProc processor, the XProc processor SHOULD attach a map with these parameters as a document property called serialization.

@Conal-Tuohy

This comment has been minimized.

Copy link

Conal-Tuohy commented Feb 26, 2019

@gimsieke

you don’t want XProc to serialize XSLT output differently than standalone XSLT would do (and you don’t want to repeat yourself by specifying all parameters in the XProc again)

Yes, and in particular I would like the XSLT to be able to dynamically compute content-type values.

I think your suggestion of a serialization document property would satisfy my requirement. If I understand your suggestion correctly, if I had an XSLT which produced documents of different types, and I wanted it to be able to set the content-type property of its output documents, I could just marshal the document-properties(/)?serialization?content-type into the document's own content-type property using a subsequent p:cast-content-type step.

<p:cast-content-type content-type="{document-properties(/)?serialization?content-type}"/>

Correct?

@Conal-Tuohy

This comment has been minimized.

Copy link

Conal-Tuohy commented Feb 26, 2019

@gimsieke

I doubt that a default value of text/plain and a value text/csv that your XSLT stylesheet provided makes a difference for the resulting document. The same holds for text/plain and application/ld+json. Once they are serialized, they are indistinguishable from plain text.

I think text/csv files are very much distinguishable from text/plain. They are only the same in terms of the XDM (which treats them equally, as text). But the point of XProc using IANA media types to classify documents is surely to be able to support use cases which rely on those finer distinctions. For example, HTTP servers written in XProc need to be able to report content types to HTTP clients, using the HTTP Content-Type header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.