Minutes for day two: 8 February 2017

Norman Walsh edited this page Feb 8, 2017 · 1 revision

XProc Workshop Minutes, Day 2, 8 February 2017

Background links:

Agenda / Minutes

  • Present: Norm, Romain, Achim, Christophe, Matthieu, David, Geert, Gerrit, Martin, Henry, Liam

New Action Items

  • Martin to write a proposal for @p:message and how to extend the messaging to log levels and other features using possibly p:pipeinfo or a new p:message element.

  • Nic and Ari to discuss the status of the community group and report back on xproc-dev by 15 Feb 2017.

  • Norm to propose a template for the extension steps repo.

Time-boxed review of this issue

  • About requirements, not proposals
  • How do documents figure into our relations with our XPath interface
  • We need precision in the requirements

Henry begins by projecting a document: https://rawgit.com/xproc/Workshop-2017-02/master/docs/docReq.html

Liam: XPath is very primary; but we’re talking about doing other kinds of expressions. Yesterday, I wondered if handles could be functions. Now I’m wondering if the expression language could be pluggable. Henry: That’s been in-and-out. I think for 3.0, it’s a big step to introduce, at the language level, multiple expression languages. Norm: I think it’s a question of how to mix the languages.


Observation: ./function() is conceptually the same as function($x)

  1. The output of a step can go in a variable, $x.
  2. I must be able to get the properties of the variable, $x.
  3. $x must have identity
  4. select=”p:document-properties($x)”

David: Why a document node, why not a string. Norm: It could be a string. Henry: The problem of collision is arises. Norm: A string would work, but string-length($x) would be 5 or 19.

Norm: We could also just make it a view: an XPath document, a JSON object, a text node, a hexBinary node.

Romain: We need access to the content from the empty document node. Norm: So you want to be able to use the binary functions on the hexBinary. Romain: Yes. Henry: Stipulate the empty document node solution for a moment. And the existance of a way to ask for the type of something. If the answer isn’t XDM or JSON, you’d like to be able to say coerce to XDM (string, hexBinary).

It would also be useful to be able to go the other way, coerce a hexBinary into an image/jpeg.

Achim: if I get a text/plain document, do I get a string or a blob? Romain: You get an empty document node, you can coerce it into string. Achim: Users are going to find that confusing. But we can tell users that they have to explicitly take the risk of attempting to get a large amount of data. Henry: We also talked about trying to put some logic into p:load.

Norm: We’re 50 minutes into the hour. Henry: Here’s the procedural question: do we have enough of an answer to proceed. It feels to me like we do. Norm: I think we do. I propose that we proceed with some spec drafting using the empty document node answer and see if it works or raises other problems on closer inspection.

Any objections? None heard.

David: For me it’s hard to talk about coercing media types. What it means is to process an entity in a different way. It’s not coercion. Norm: It’s not really coercion; it’s changing the label. David: If you have the load step and you say you want to load it as text/plain. If you use an HTTP URI, then this would mean that the implementation would send an Accept: header of text/plain. Norm: Alternatively, you just treat the octet stream returned as text/plain. David: The other way around is just a claim. Norm: I agree, “coercion” isn’t the right word.


Matthieu: I’d like to be able to override steps on p:import, is that something we could consider? Norm: Yes, but I suggest we have an issue about that and have the discussion when we’ve had a chance to think about it. Gerrit: We solve this by dynamically generating the pipeline and then importing that.

Review of open issues

On reflection, trying to do this in the face-to-face seems like it won’t end well.

Proposals for resolving two issues

Norm: The depends attribute, PR #33. Henry: Remind me, what’s it for? Norm: It’s for the case where you have a step with a side-effect which isn’t manifest in the data flow: don’t start step B until step A has finished, irrespective of what you think you know. … It’s been implemented and appears to be sufficient. Liam: If you called it ‘wait-for’ instead of ‘depends’, then it would have been clearer. Henry: The simplicity of this depends on the answer to my question about streaming tomorrow: no. Christophe: I prefer ‘wait-for’. Henry: I prefer ‘depends’ because it leaves open the question of what exactly it means; we may need that space in the future.

Some discussion of the semantic variance between “depends” and “wait-for”.

Liam: I don’t think we really have to find an answer here. Norm: Let’s try taking this one to email.

Proposal: Merge the PR. Any objections?

None heard.

Norm: Allow attribute value templates in extension attributes, PR #32. Achim: Some of the extension attributes are handled in static analysis and some dynamically. It only makes sense for attributes that are being evaluated dynamically. Norm: What attributes are evaluated statically? Achim: depends are extension attributes today. Norm: Errmm…can we finesse this by saying that processors are free to forbid AVTs in any extension attributes that they wish? (Added comment.)

Some discussion of what the semantics of forbidding might mean. Could mean curly braces not allowed, could mean not interprted as an AVT. Up to the implementation.

Proposal: Merge the PR with that note, at editor’s discretion Any objections?

None heard.

Proposed list of issues that warrant face-to-face time:

\* How to improve debugging *\* issue 18

  • Norm: The observation that implementations should do better with error messages is a point taken.
  • Achim: What do I get if an error occurs is one question. Another question is what do I get on p:catch? How good is the error vocabulary. Today we only get the name of the error. Maybe we could improve that.
  • Norm: It’s very difficult to define error output. I think a proposal in this area would be a very good thing.
  • Achim: The question is how useful would this be to users?
  • Martin: Sometimes it would be very useful. We’re currently using Schematron that we extended with spans to handle reporting where the error occurred.
  • Achim: Step names or step types both seem like they’d be valuable. Just more information.
  • Norm: Someone should make a proposal.
  • David: I find it hard to get the error messages at all. It was never a problem to figure out where the error was, but getting the message is hard.
  • Norm: Yeah, my implementation sucks.
  • David: A tutorial on p:log would be good.
  • Norm: Yes, it’s a short tutorial but it would be good to have.

*\* @cx:message

  • Norm mutters on about @cx:message…proposes @p:message to avoid stepping on the “message” option name.
  • Martin: What about adding a terminate attribute?
  • Gerrit: Isn’t this p:error?
  • Norm: Do you want to do this conditionally?
  • Henry: The advantage of having it as an attribute on a step is that it’s much simpler vis-a-vis the plumbing.
  • Norm: So a step with p:terminate on it runs the step and then aborts.
  • Henry: I propose p:abort with a message.
  • Romain: Can we add a severity to p:message so that it works like proper logging?
  • Gerrit: Maybe we can have p:error work like logging and messaging.
  • Norm: Risking design on the fly: if p:message is a single string then print it. If it’s a sequence of two strings then the first string is a log level and the second string is the message. What the processor does with the log level is implementation defined.
  • Achim: What about a function that returns the log level?
  • Geert: I’ve been experimenting with overloaded steps. We could put the message and other options in the step content. That makes the mechanism extensible.
  • Norm: Yes. We also have p:pipeinfo, you could use that.
  • Henry: Maybe pipeinfo is a better way to do this altogether.
  • Norm: I’m still attracted to @p:message for the simplicity of printing a message, and maybe p:pipeinfo for log levels and such.
  • Romain: Or p:message element for more complicated messages.
  • Norm: Right. So this wasn’t simple. I think we need a proposal.

    ACTION: Martin to write a proposal

Return to yesterday’s discussion for a moment

  • Achim: There’s one more point in the specs dealing with XML vs. non-XML documents that I find inconvenient. If I have a heterogenous sequence, when this sequence hits a select expression, an error is thrown.


    …I’d prefer to ignore the select expression for non-XML.

  • Henry: We now have so many things flowing through pipelines that I think this kind of defaulting behavior will be surprising. Note also that under Norm’s proposal, these will be documents and you’ll get the empty sequence.
  • Achim: Ah, yes. We’ll just have to live with it. Nevermind.
  • Norm: We might be able to provide less verbose solutions with, for example, a function that takes a sequence of nodes and an XPath expression and does the right thing.

Reopening the question from this morning

  • Henry: The fact that a sequence of pipe content types can be heterogenous means we may need to think about new tooling. Dispatching on type will be more common so we might like to make that easy.

Important workflows (for publishing use cases)

  • Henry: I was just wondering if there were obvious mismatches between the functionality of 1.0 (steps or architectural) from the perspective of publishing workflows. If you regularly say “oh rats, it’s that problem again”…
  • Romain: In our publishing workflows, we’re dealing with file sets. The way we do that is we have the in-memory documents that can be processed by XProc and we have an XML representation of the filesets. A directory structure, for example. A lot of our steps have these two things as inputs and outputs. We have to connect them explicitly everywhere.
  • Henry: Two things?
  • Romain: The in-memory documents and the fileset description. It would be convenient if some (more) connections could be implicity. If outputs of a particular type were automatically connected to specific inputs, for example.

Some discussion of how a zip step is wrapped for this purpose.

  • Romain: More implicit connections would be useful.
  • Gerrit: So the connections are grouped automatically? You want to have multiple primary ports? How are they connected?
  • Norm mumbles something about using media types to connect ports.
  • Romain: There are other ideas here: both implicitly connecting from preceding steps and for grouping connections together.
  • Martin: So if you have a p:xslt step and it’s preceded by two steps that produce XML and XSLT, they’d both be connected?
  • Romain: If I have a step that produces HTML and binary images and the following step receives an HTML document and binary images, I want them to be connected implicitly.
  • Norm: I think this is an interesting idea; but it’s complicated and we need a proposal to review.
  • Romain: That’s one thing that we do often. It depends on how we define document sets.
  • Henry: That (re)raises the question of whether we need the concept of document sets. Whether this is the same as the document collection idea or whether it’s more pipeline appropriate, I’m not sure. But in any case, “in the publishing workflow we often move document collections around” is worth considering. That’s not something we directly support in XProc. Wether the idea of document sets as I have them in my head from the Markup Pipeline from 15 years ago is actually what’s needed, I’m not sure. But thinking hard about sets is worth doing. I don’t think it’s for 3.0. It’s too big a change. It raises a whole bunch of questions. Whether you have to have steps designed to work with document sets or whether there’s a story about default plumbing is unclear.
  • Romain: We might be able to leverage non-XML document ideas to solve this.
  • Henry: If you want to change the composition of a set, you need the whole set, and doing that on a sequence representation of the set will be very un-intuitive. … So we could imagine having document collections flowing throw pipes, not sequences of documents but collections to which you have random access.
  • Norm: Uh, er, maybe. :-)
  • Gerrit: It’s hard to say if there are other things because we have worked around them. We have, for example, a catalog resolver for non-XML types so that we can refer to fonts and things. It’s an XProc step, we don’t need it anymore. We created extension steps to do image-metadata extraction, resizing, etc. Maybe one can have an EXProc steps at some point for processing images. We have an extension step that does unzip a bit differently than the proposal for pxp:unzip. It extracts the whole archive to disk and then other steps are able to work on them (on disk). This could be improved with the new concept that you have binary data flowing through steps.
  • Martin: What we currently use is a step called file-uri to work with URIs and operating system paths.
  • Gerrit: This is also encapsulated in a step.
  • Romain: At the language level, there’s not much missing in XProc 3.0. There are a variety of utility steps that could be standardized or not.
  • Henry: What about interfaces to databases?
  • Gerrit: I once wrote an issue about whether it could be a good idea to have implicit validation. Could you read the xml-model PI and do the right thing. You’d also want to have an easy way to prepend the PIs to documents. And you’d want to have a way to produces SVRL report for validated documents.
  • Norm: So in addition to a general p:validate step, this includes the idea of, for example, a @validate attribute on p:output to say “validate any document with a xml-model PI”.

Some discussion of the xml-model PI: https://www.w3.org/TR/2011/NOTE-xml-model-20110811/

  • Jirka: It is now also an ISO standard.

Probably lunch. Probability, 100%.

What’s the status of the W3C community group?

Nic Gibson joins us by telepresence.

  • Norm: What about the community group?
  • Nic: Ari and I both think it would be good if we tried to do something.

Ari in particular seems interested. The question is, what value is there in having both a mailing list and a community group?

  • Liam: I don’t think there mutually exclusive?
  • Ari: That’s the question.
  • Liam: There are a couple of lists.
  • Norm: I think there’s only one, xproc-dev.
  • Liam: I think the community group has a mailing list as well. I think

we should keep using the xproc-dev list.

  • Norm: What’s the title of the community group these days?
  • Nic: Data Pipeling Use Cases.
  • Norm: Does keeping a moribund community group open help us, hurt us,

or is it neutral.

  • Nic: The question is, do Ari or I (or anyone else) have time to make

it not moribund.

  • Norm: Is that something you can answer it now?
  • Nic: I think it would be good to talk to Ari about it. And I have some

actions that I never got around to finish: mailing various lists and anyone who’s ever posted to xproc-dev.

  • Norm: You’d want to craft the message carefully.
  • Nic: I recon that Ari and I can chat by the time the XML Prague conference

is over.

ACTION: Nic and Ari to discuss and report back on xproc-dev by next week.

  • Norm: Anything else?
  • Liam: At this point “no”, eventually we’ll want to see some sort of a draft

of use cases.

  • Achim: The description of what the community group does is out of date

with respect to our current plans. We need to say that XProc is still alive even if the working group no longer exists. We need to assert that the community group is the center of XProc activities.

  • Norm: That is an interesting point. It sounds like we need to rewrite

the description. Maybe change the name.

  • Liam: I’m not sure if you can change the name of a community group.
  • Norm: Ok. Let’s see what comes out of Nic and Ari come up with and the

consider next steps.

What’s our thinking on the “resource manager”?

  • What are the semantics of pipelines?
    • What are the lowest-level abstractions needed to describe/discuss pipelines?
  • There’s metadata flowing through the pipe on the one hand and a resource manager for local copies of things being fetched and stored. And then variables are right in the middle.

On reflect, we all feel that we’ve covered these items sufficiently earlier today or yesterday. Henry may come back with a simplified proposal after further consideration.

Any other business?

  • Achim: We should discuss how we can encourage the community to suggest new steps.
  • Norm: Couldn’t we just use the ‘extensions’ XProc repo?
  • Romain: Yes, and we could make a custom template for that.

ACTION: Norm to propose a template for the extension steps repo

  • Norm: After we have the template, let’s avoid “blank page syndrome” by populating the repo with the existing exproc steps. Maybe then have the exproc.org redirect there.
  • Romain: And then have PRs to add them to the step spec.

Some discussion of how to organize the specs and repos. Must have a single entry point for the user.

  • Achim: If we put the exproc.org steps there, we should all read them again and see if we can clarify them. Norm and I have interpeted some of them differently.
  • Romain: Should the extension steps target XProc 3.0 or 1.0 or what?
  • Achim: I think we should target 3.0.
  • Norm: Yeah…
  • Romain: When is a step considered ready to be implemented? And how can I tell if the implementation is conformant with the spec?
  • Achim: We should have test cases.
  • Romain: Something like semver perhaps.
  • Norm: So if you want version 1.3.5 of a step, you look to see if the implementor claims to support 1.3.5.

Next steps

  • How do we do this?
  • Henry: The best way this works in open source projects is BDFL.
  • Achim: We should divide the work up, the test suite, the documentation could be done separately.
  • Norm: Achim and I seem to be signed up to do the spec editing.
  • Achim: It would be nice to have one more editor who isn’t an implementor.
  • Norm: Henry, you’re the obvious candidate.
  • Henry: I’ll say yes, but you have to tell me if I’m doing a good job.
  • Achim: It would be nice to have a user.
  • Norm: Yeah. You have someone in mind?
  • Gerrit: I’ll work on editing too.
  • Henry: This is the SGML working group model: editors, a core group, and a broader group.

Work items

  • A spec: Norm, Achim, Gerrit, Henry as time permits
  • A test suite: David
  • Step proposal curator: Geert
  • Documentation: Christophe, Matthieu

Status updates

  • Monthly reports to xproc-dev on the second Tuesday of the months starting on 14 March 2017.

Where do we start?

  • The documents at spec.xproc.org are the current head of development.

How long do we expect this to take?

  • Goal: approaching functional completeness by XML Prague 2018 (Henry proposes beta release)

Next meeting?

  • XML Prague 2018?
  • Henry: Having one at the beginning or end of the summer might be useful, but we don’t know yet.
  • Maybe XML Amsterdam?

Thank our hosts

  • Thanks to Jirka, XML Prague, and the University of Economics.