Skip to content

Raw Minutes, 6 Feb

Achim Berndzen edited this page Feb 14, 2018 · 4 revisions

Tuesday, 6 Feb

Present: Martin, Gerrit, Erik, David, Achim, Christophe, Romain, Matthieu, Jim, Bert, Ari, Jirka, Geert (afternoon)

Prioritized Agenda

Tuesday

  • Status update

  • Review London minutes

  • Document model for non-XML documents

  • JSON documents

  • Text documents

  • Binary

  • Serialization of non-XML documents

  • Default values for options on steps?

  • Maps with QNames as keys

Wednesday

  • What are we presenting?

  • On Thursday

  • The pre-conference day workshop

  • On Saturday

  • The status report to the community

  • Action items: who’s doing the things we decided?

  • Next meeting?

  • XML London?

  • XML Amsterdam?

  • Balisage?

If we have time agenda:

  • Detailed questions from other agenda page

  • p:value-available()

  • Issues list

  • Specifics of importing functions from XSLT/XQuery

  • Step library discussion

  • Review XML London minutes

  • Validation report

  • Java API for extension steps

  • Glue code for multiple implementations?

  • Test suite

Status update

  • Norm: Made some progress on spec and implementation. This whole day job thing is a huge distraction. Unprepared to commit to a release date, but will make best efforts for summer.

  • Achim: We have made some progress, I have a machine that does some XProc. Three or four steps implemented; guessing it’s 60% of the core functionality. Did some work on my the test suite, ~120 tests. Thought I’d have a machine by now, now hoping for summer.

  • We had workshop meetings at Prague, London, and Aachen. A busy year!

  • Erik: Working on my book; it’s going well. Need an implementation to make significant further progress. Unfinished copies available for anyone interested in teaching XProc, just get in touch: erik@xatapult.nl

Some discussion of a "what’s new" document. A good candidate for something to write after we feel that the spec is nailed down.

Some discussion about the fact that XSLT stylesheets mash attribute values so upgrade and compatibility transformations produce difficult-to-read results. C’est la vie.

  • Martin: I’ve written specifications for some steps for the step library. They’re in issues in the issues list for the steps repository.

  • David: I’ve started using XProc more and more. It’s my go-to tool for data integration problems. It replaces a lot of PHP and other tools that I used in the past. Thinking of writing a paper for a future XML conference.

Review Aachen minutes

  • High level: we accomplished many of the goals we set; a few to discuss again today.

Document model for non-XML documents

  • Achim: Every document will show up as a document node in some XPath environments. We called out text and binary documents in relation to MIME types, left everything else undefined. The spec says that a select expression that selects only text nodes is a text document. So the text document isn’t implementation defined.

  • Norm: I think a text document should be a single text node inside a document node. So I guess it isn’t implementation defined.

  • Achim: Yes.

Added: This is issue 285

  • Norm: We still binary documents as implementation defined. But we may need to have functions to convert these implementation-defined binary documents explicitly into hexBinary/base64 nodes.

  • Norm: What about JSON documents.

  • Achim: I think we should have standard JSON documents.

  • Norm: Yes. XSLT 3.x/XPath 3.x have ways of dealing with JSON, we should do the same thing.

  • Achim: It’s to and from maps; it’s not implementation defined. It should be possible to construct this in p:inline.

  • Norm: Uhm…​

` <p:inline document-properties="map { 'content-type': 'application/json' }" expand-text="false"> {"test": "foo"} </p:inline> `

  • Norm: We agree that should work, right?

Added: This is Issue 286

Some discussion of the curly brace problem: expand-text defaults to true.

  • Norm: We can make expand-text sensitive to the content-type, but should we?

  • Eric: Maybe the default for expand-text should be false?

  • Achim: I’ve used it a lot recently and I get used to TVTs just working

  • David: Maybe we should have p:inline and p:inline-template and remove the attribute?

  • Achim: I like it.

  • Norm: I think making the expand-text conditional on the content-type is a bad idea.

  • Christophe: I think this has a small benefit for the weight of the feature.

Consensus: leave it the way it is, just point out that you have to set the expand-text attribute; you can do it globally if you have predominantly JSON inlines…​

  • Achim: I have reservations about the fact that expand-text inherits.

  • Norm: Yeah, but that’s already the way it is.

Serialization of non-XML documents

  • Achim: We don’t say what happens when non-XML documents appear on p:store or p:http-request

  • Norm: I think we want it to be the case that serializing XML only serializes valid XML, serializing text serializes as a plain text file, serializing JSON does the right thing, and serializing binary does, uh, the binary thing.

  • Matthieu: But XSLT lets you serialize things that aren’t well-formed XML.

Norm checks, indeed XSLT will serialize two root elements even when the output method is XML

  • Romain: And what about HTML?

  • Norm: Uhhhhh…​should we back off and say that if you attempt to serialize as XML, it works as long as you start with an XDM? (As opposed to an implementation-defined form of binary document.)

  • Erik: There are a lot of consequences here, we could stick with well-formed XML.

  • Ari: But there’s an XPath serialization error for this case, it should be an error.

Some discussion of serialization options and the sad fact that you can’t tell what the options were on an XSLT document that came from a stylesheet. You have to put the serialization options in the XProc.

  • Norm: So where are we?

Consensus: Leave the status quo: method=xml means that well-formed XML is required; it remains an error to attempt to serialize as XML something that is not a well-formed XML document.

Detailed questions from the agenda page

(We’ll do as many of these as we can before lunch!)

p:value-available()

  • Eric: I think we can get rid of this. The XPath spec says that an option always has a value, it’s () if it isn’t specified.

  • Achim: In XProc, you can have an option that doesn’t have a value. I opened #262 about this.

After some discussion, consensus appears to be that the optional option feature is more difficult than it’s value justifies.

Consensus: Remove p:value-available() and the feature it supports. The default value for options is the empty sequence.

Added: This is Issue 287

Shouldn’t the p:declare-step/@xpath-version be an xs:decimal?

  • Norm: Maybe. That makes versions like 3.1.1 or 4.2-5 difficult.

  • Erik: When I raised this question, there were inconsistencies, but those have been resolved. However but p:xpath-version-available() and p:version-available() both have decimal parameters.

  • Norm: I think that’s a bug.

Consensus: fix those functions so that they take strings.

Added: This is Issue 288

Allow variables in the "prolog"?

You need maps in a step’s prolog for specifying output serialization. But you can’t use variables there, so these will be “fixed” maps, specified in full at the location where they’re needed. Would it be an idea to allow variable declarations in between p:input/p:output/p:option elements?

Achim: As long as they don’t have a context item, I think it’s ok for them to be allowed. Erik: There’s no default readable port there so that’s ok. Matthieu: What about named outputs like XSLT? Norm: We don’t have that yet. Christophe: Can we declare character maps? Norm: We don’t have them yet. Gerrit: I miss them.

Consensus: Allow p:variable to appear in the "prolog"; the context item is explicitly empty in this context. It cannot use a pipeline connection to establish a context item.

Added: This is Issue 289

Character maps?

  • Norm: Implementation and syntax to be determined, does anyone object to adding them?

Consensus: Someone to write up a proposal.

Added: This is Issue 290

Note that they are only a serialization feature, so trying to cheat and passing the buck back to XSLT won’t work.

Why is the @parameters on the p:load an AVT?

Overtaken by events, it’s no longer an AVT.

Primary output ports?

What I thought it was decided that all steps have a primary output port (so also p:store). This is not (yet?) in the specs or am I mistaken?

  • Norm: The p:store step has only one output so it’s primary by default.

  • Erik: Ok.

Default values for options on steps?

  • Martin: I often have to have long p:with-option options on each step because we have common options. What if I could inherit these somehow? Maybe we can have an option to make this behavior explicit?

  • Erik: Options you declare in your prolog are automatically available to the steps you use.

  • Martin: What about doing this at the group level as well?

  • David: What happens if you import steps from someone else? Doesn’t this introduce the possibility of declaring random options on pipelines?

  • Martin: You could still do it explicitly.

  • Erik: My concern is that it’s something strange; it’s not something you see in other languages.

  • Gerrit: I think a similar requirement was met by parameter ports in 1.0.

  • Achim: You can control options much better because you have lexical values for them. It’s hard to find a misspelled key name.

  • Erik: If you have an option X in your main step, if the second step expects an option X then it gets the inherited value.

  • Norm: I had thought that we would do this for just the option named 'parameters', not for all option names.

  • Martin: I’d be ok if this was just from the pipeline prolog.

  • Gerrit: I don’t want to have to create a map just because it was designed to accept a parameter option.

  • Erik: Why don’t we create a with-option-map element that expands a map into a set of options.

  • Gerrit: It’s the opposite of p:in-scope-variables (or whatever that’s called).

  • Norm: I’m very worried about unintended consequences if we do this for all option names; what if a pipeline has an initial-template option, now that suddenly applies to all p:xslt?

  • Gerrit: I often use tunneling in XSLT, maybe something like that would be helpful.

Some discussion of how XSLT tunnelling works.

`` <p:declare-step> <p:option name="debug-uri" tunnel="true"/> <p:option name="debug" tunnel="true"/> …​ <p:declare-step type="my:step"> <p:option name="debug-uri" tunnel="true"/> …​ <!-- $debug is not in scope here -→ <p:declare-step type="my:other-step"/> <p:option name="debug-uri" tunnel="true"/> <p:option name="debug" tunnel="true"/> …​ </p:declare-step>

  <my:other-step> <!-- gets $debug-uri, but not $debug -->
</p:declare-step>
    <my:step/> <!-- $debug-uri is inherited from the scope -->
    ...
  </p:declare-step>
````

Some discussion of "tunneling". Consensus appears to be that, unlike XSLT, options do not have to "tunnel" through step declarations that do not mention the tunnel parameters (e.g., $debug on my:other-step, above).

*Gerrit: Maybe we can call the option pass-down? *Achim: We don’t want to call it tunnel?

*Gerrit/Martin/Erik: I’d be happy to call it tunnel.

*Ari: Couldn’t we just specify the default on the root element?

Some discussion of this idea.

  • Achim: I think we should have to specify it everywhere.

  • Martin: It’s easier to check.

  • Norm: We could come back to making it more global later.

Consensus: We’ll attempt to spec and implement a 'tunnel' option on p:option. If a called step has declared an option as 'tunnel' and the calling step does not specify a value for that option, the in-scope value is inherited.

Achim observes that option typing complicates things a bit.

Consensus: If a tunnelled option has a declared type that differs from the declared type in the outer scope, it is an error. Working definition of "that differs" the normalized-space string-comparison of the two values of the 'as' attribute are different.

Added: This is Issue 291

A tunneled option does not satisfy the constraint required=true on a declared option.

Maps with QNames as keys?

  • Achim: Many maps are defined in XProc QName→String. (E.g, the serialization options map.) XML Calabash 2 has a magic feature where strings get treated as unqualified strings.

  • Norm: If we can live with the dynamic cast to QName…​

  • Erik: But this deviates from XPath.

Norm cries silently.

Today, with magic:

` <p:inline document-properties="map { 'content-type': 'application/xml' }"/> `

If we say that the keys are QNames:

` <p:inline document-properties="map { q{}content-type: 'application/xml' }"/> `

  • Norm: I guess that works. It’s going to be a little rough on newcomers, but I guess it works.

On review of the spec, we find that the document-properties and parameters maps are QName→String, whereas the serialization option is String→String. Is that an error? (Yes, #268).

Specifics of importing functions from XSLT/XQuery

There was an initial concern that this only works in Saxon EE. Christophe indicates that he tested it and it works in Saxon HE.

The proposal is in #42.

` <p:import-functions href="foo.xsl" type="application/xslt+xml" namespace="http://example.com/foo"/> `

Erik observes that "should implement" would be better expressed as "are encouraged to implement" to avoid RFC 2119.

  • Matthieu: What about embedding the function in the XProc pipeline directly as a data island? I do this a lot with Schematron.

  • Achim: Let’s defer to XProc 3.1

  • Norm: I think a combination of self-reference URIs and p:pipeinfo might be coerced into doing that, not that I plan to implement it real soon!

In the case of XSLT, all of the functions, in the specified namespace, declared in that stylesheet (and any imported/included stylesheets) are imported. If the href identifies an XSLT package, then only the functions declared public are imported.

In the case of XQuery, only public functions in the specified namespace are imported.

In the case of other languages, the mechanisms are implementation-defined.

The p:import-functions can only occur where p:import can occur.

  • Achim: Could we just use p:import and overload it, instead of having a new element?

  • Norm: We could, but I don’t think we’d be helping our users much.

  • Erik: I think it would be harder to explain because it has different attributes.

  • Norm: Let’s not for now.

  • Bert: Why doesn’t p:import have a type? Morgana has a content type to be able to import Java steps.

  • Norm: How does that work?

  • Achim: The declaration is magically added with p:import. If the content-type is application/xquery, it loads the functions. If it’s application/xslt+xml, it loads the functions. If it’s application/java-archive, it loads the functions and/or steps depending on what it finds in the JAR file.

  • Bert: I wasn’t thinking of overloading it for functions, but it might still be a good idea to provide a type attribute on p:import so that we can import Java.

Consensus: we should add a type attribute to p:import (independent of the discussion of overloading it using it to load functions).

This is Issue 42

Java API?

  • Norm: Presumably the goal is for users to implement steps in Java (or at least on the JVM) that will work on all processors (or at least more than one).

  • Achim: This is distinct from calling Pipelines from, for example, Java. We discussed that in London last year.

  • Norm: Yes, I think so.

Norm shows his current interface for steps and mumbles about it a bit.

` trait XmlStep { def inputSpec: XmlPortSpecification def outputSpec: XmlPortSpecification def bindingSpec: BindingSpecification def setConsumer(consumer: XProcDataConsumer) def setLocation(location: Location) def receiveBinding(variable: QName, value: XdmValue, context: ExpressionContext) def receive(port: String, item: Any, metadata: XProcMetadata) def initialize(config: RuntimeConfiguration) def run(context: StaticContext) def reset() def abort() def stop() } `

Achim shows his current interface for steps and mumbles about it a bit.

` public interface MessageSupplier { public MoPLMessage getMessages(final String inboxName); public MessageChunk getFirstMessage(final String inboxName); public void putMessages(final String outboxName, final MoPLMessage messages); public void putMessage(final String outboxName, final MessageChunk message); public void raiseError(final MoPLRuntimeException message) throws SuspendExecution; public MoPLLocator getLocator(); public DynamicMoPLContext getContext(); public String getStepName(); } `

FYI: Neither Achim nor Norm take publication of these APIs as any sort of declaration of stability. Your mileage may vary. Here be dragons. Use at your own risk. Your gun, your bullet, your foot. Caveat emptor. Etc.

  • Norm: So very much the same on one level and very different on another.

  • Erik: Is there any possibility of consensus?

  • Norm: Maybe. I think we’d need a third thing.

  • Achim: Yes. We need an abstract layer on the two systems.

  • Norm: I think the abstraction will be less efficient, but it should be possible to have an abstraction that works.

  • Geert: Is this really a requirement?

  • Martin: Yes. We have extension steps for our processing and other clients and other service vendors that use our framework would like to be able to run them on different processors.

  • Achim: It is a requirement where you want to distribute steps but you don’t want to have a vendor lock in.

  • Norm: Ok, I think that was useful.

  • Bert: What I was thinking about most was how you read and write documents from a port. Norm, you have Saxon, right? (Yes) Achim, what do you use?

  • Achim: I have an abstraction on top of XOM. For some operations like recurring XSLT, I will also have an implementation for the TinyNode tree in Saxon. This isn’t so much about implementing a step, it’s about accessing data.

  • Norm: We’d have to pick a format for this putative adapter class.

  • Bert: We have a bunch of steps. Some use DOM as an input port. Some create a DOM tree. Some use the XMLStreamReader interface. Some use the XMLStreamWriter interface for writing. And then there are some that use Saxon’s model.

  • Norm: I pass trees around for the actual documents so the streaming XML APIs wouldn’t gain anything for you.