Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Problematic/confusing algorithm text re: the handling of URL @context values #265
Context Processing step 5.2.1 says:
This seems rather verbose, as "the document containing the
Steps 5.2.3 through 5.2.5 say:
These steps are confusing for a few reasons:
The wording, while verbose, is consistent with how we have described IRI resolution since 1.0. What's changed is to make it clear that it is the base IRI of the document, rather than any
I don't think that it's necessary to burden the algorithm with more options for description the location of the document being processed, as it's available given that it was dereferenced. Implementations may, of course, use any internal mechanisms they need to remember this.
It is implicit that implementations maintain such a memory, without being overly explicit on how they do it.
Sounds like a good optimization would be to only retain the
It could have been 184.108.40.206, but it doesn't change the control flow.
IMO, the text is clear as written and implementable without being overly prescriptive.
But the URL in question isn't the one that was just dereferenced, it's the one that may have been dereferenced to pass the
Or have I misunderstood how it's being used? I thought I had it sorted as my current understanding led me to an implementation that does pass the test...
It's not always clear in the text where data is going to be used (either elsewhere in the same algorithm or in other algorithms). This is especially true for implicit state data like this (and somewhat related to the discussion in #260, where at least you can visualize repeated uses of explicit variables in the EDs). Without explicit text, I'm not sure as an implementor that I'd be confident based on just the text in 5.2 that I could only store the
My issue with 5.2.5 is that what seems like the primary action of the step (setting
I can live with this as it is if there's resistance to changing it, but I reported an issue because the text as written tripped me up and only walking through tests and stepping through the algorithms with @azaroth42 got me to a point of understanding (a benefit independent implementors who come along after REC probably won't have!).
Could you perhaps recommend some specific text that could go into an informative note such that it wouldn't cause a normative change to the spec but would offer the clarifications you're seeking? For example:
"Note: In order to prevent being overly prescriptive, this algorithm makes no statement about how internal state is kept for dereferenced context documents and broadly applies error handling checks. Provided that the output of the algorithm is unchanged, implementations may employ optimizations such as retaining only the
I think your "retaining only the @context part " language might be reasonable.
I do not think the "broadly [applying] error handling checks" language is appropriate. The thing that needs to be checked in 5.2.5 does not even happen in some invocations of the algorithm. The previous invocation which passed through 5.2.4 carried out the error checking, but the current invocation which passes through 5.2.3 does not dereference or transform anything into an internal representation. I really do think this error checking language should be in 5.2.4, but given how recent updates to my other issues have been handled, I don't see why this would be considered a normative change.
Regarding the IRI resolution using an out-of-bound base IRI, however, I do think this needs more bookkeeping steps (as new input variables to Context Processing and Create Term Defintion, and in the related invocations in Context Processing steps 5.2.6 and 5.13, and in Create Term Defintion 23.3). I would have thought that these new variables/steps would represent a normative change, but since it seems the consensus is that changes that reinforce the expected output of the test suite are not normative, perhaps this is merely editorial.
@kasei I added a note, based on @dlongley's suggested text, and a couple of others to provide some non-prescriptive guidance to implementers. It's true that this could have been handled by adding parameters to the algorithm, but would have had a broad impact, and is not the only way that it can be implemented, given that it's contained in the RemoteDocument which is being processed as a context.
Please let us know if this works, or suggest changes that would make it more acceptable.
The note about retaining
I'm still not convinced about this provenance tracking issue. Agreed that it's "not the only way that it can be implemented," but surely that's true of the entire document? My concern is that (to the best of my understanding) the provenance tracking here is required for code paths that handle both dereferenced content and non-dereferenced content. In some cases it's important to keep track of the provenance, and in others it's not. I don't think that's a good situation to be in while assuming that it's clear how such tracking can/should be done.
My belief is that these changes (some of these were mentioned above) are sufficient to perform the tracking:
While these changes are certainly not trivial, I think they are relatively contained, and (assuming they are correct) provide explicit information on where provenance tracking must occur and where the resulting data is used.
This issue was discussed in a meeting.
View the transcriptHandling
Rob Sanderson: link: #265
Rob Sanderson: the reason we’ve called this one out in particular
… is that the two Greg(g)’s have different views on what the exact wording should be
Gregg Kellogg: yes. we went back and forth a bit
… and what Greg was pushing for seemed like a very large rewrite
… and dlongley chimed it a bit also
… and that size change didn’t seem warranted
… the core question is how to resolve a relative context URL
… that text is certainly more vague
… the base of the document is used to resolve that
… but that may not be explicitly passed in
… so dlongley suggested we add a note that we were avoiding being overly prescriptive
… and a few more notes to clarify the intent and use throughout that section
… and without changing the API signature
… given where we are in the publishing process, I don’t think a heavier change is warranted
Rob Sanderson: it wouldn’t be a normative change would it be?
Gregg Kellogg: it depends on what you believe the boundary of the change is
… if we’re changing API signatures and data that’s being passed around
… then that would be a normative change
Pierre-Antoine Champin: and that we might introduce new bugs…
Gregg Kellogg: there are ways implementations can work around this without our changing the API
Rob Sanderson: any questions/
… k. I would suggest that we reject this issue on the grounds that we believe this would be a breaking change
… while it might be clearer to do it the way Greg suggests, the current way works without needing to change API signatures
Gregg Kellogg: no. it actually change the API signature, but the algorithm signature
… but nonetheless it would have far reaching API ramifications
Rob Sanderson: pchampin what are your thoughts on this one?
Pierre-Antoine Champin: I’m with Greg on this one
… with gkellogg on this one
… the potential for new bugs is quite high
Proposed resolution: Reject #265 as overreaching at the current stage of the process; changing algorithm signatures has too far-reaching consequences for a clarity change (Rob Sanderson)
Ivan Herman: +1
Gregg Kellogg: +1
Rob Sanderson: +1
Pierre-Antoine Champin: +1
Tim Cole: +1
Benjamin Young: +1
Jeff Mixter: +1
Resolution #2: Reject #265 as overreaching at the current stage of the process; changing algorithm signatures has too far-reaching consequences for a clarity change
Gregg Kellogg: I did mark the issue as rejected
… but I didn’t know if there was more to do to close this up
Rob Sanderson: we should call it out as an issue that was not accepted
… and point to this resolution as the rational for the rejection
… we’ve discussed it, considered it, and rejected the full extent of the change
… but we plan to add notes to clarify things to avoid future confusion
Ivan Herman: I checked with Ralph about what triggers a full CR or not
… and it’s largely under the control of the WG
… there’s no algorithm for when it should go back to the Director
… this one does seem to be on the borderline
… and it would restructure the algorithm heavily
… and all the other implementers would have to cross check things against the new changes
… if it’s a bug, then it would be justifiable
Rob Sanderson: it’s not wrong to implement it the way gkellogg has described it
Ivan Herman: no. of course not
… no one said they have to follow these algorithm lines specifically
… they just have to pass the tests
Gregg Kellogg: yes. we say that explicitly actually
Ivan Herman: and if they pass it without following all the algorithm steps, that’s fine
Rob Sanderson: any more on this one?
Can you please explain how other ways of implementing this could happen without the approach I've described? To keep track of the provenance of downloaded contexts, mustn't the current algorithm implicitly be passing around this information as I've explicitly described?
As this has been rejected by the WG, can you at least confirm for me that the approach I've described will handle the tracking of context provenance as intended? Given that this happens implicitly in recursive code paths that only sometimes require this tracking, I am not entirely confident that what I've described will cover the intention of the spec. If the spec text can't be changed, I'd be much more comfortable if somebody from the WG could at least confirm that I've understood the intention correctly.
For the second part, we did discuss that your approach is not only correct but even a better way to do it ... but that the extent of the change was too high, given that the algorithm is not wrong, just not as clear or transparent.
Explaining the current text I leave to @gkellogg :)
@kasei It's entirely reasonable for you to use the approach you've outlined. It's not necessary for implementations to precisely follow the algorithithms, and this is a case where developer initiative is necessary to meet the intention of the algorithm. There may be other state or options that an implementation may find useful for implementing the algorithms which aren't strictly spelled out.
My own implementation keeps a
The algorithm text makes it clear that relative IRIs for contexts are loaded relative to the location of the document containing the reference, so these mechanisms help to maintain that. Another implementation may do it differently.
It was felt that the changes to the algorithm signature, and all the places it is invoked could have a destabilizing effect, and weren't strictly necessary for implementations to be able to interpret the spec and pass tests.