Updates to processing algorithm and other fixes #116

mattgarrish · 2019-10-16T12:33:01Z

This PR makes the following changes:

updates the processing algorithm to split it into more atomic functions where appropriate (makes the main algorithm clearer and also makes it easier for profiles to extend specific functions instead of step numbers).
updates description and accessibilitySummary to arrays of localizable strings.
standardizes the name "value category".
adds a generic object category and updates accessModeSufficient to reference it.
adds prose for reading systems to inspect reading order for unknown profiles and react accordingly.

It should fix #101 #106 #108 #109 #110 #113 and #115

I'm going to update the experimental processor code next just to make sure I didn't break anything splitting up the algorithm.

Preview | Diff

updates description and accessibilitySummary to arrays of localizable strings; standardizes on "value category"; adds a generic object category and updates accessModeSufficient to reference it; adds prose for reading systems to inspect reading order for unknown profiles and react accordingly

mattgarrish · 2019-10-16T15:37:32Z

I also split the processing section in three, as it was getting bloated. There's now introduction and error handling subsections.

iherman · 2019-10-16T16:26:55Z

To repeat what I said in #101 (comment): the algorithm yield the profile value, that is then part of the generated data structure. As a consequence, the WebIDL should have a profile member, and no conformsTo one...

iherman

All in all, this is a big improvement!

index.html

iherman · 2019-10-16T17:08:25Z

I wonder whether it is possible to add explanation blocks to the "functions", mainly the last ones. By the time the reader gets there, he/she may be a bit lost as for the goals of these and the fit the full picture.

mattgarrish · 2019-10-16T17:09:01Z

the algorithm yield the profile value, that is then part of the generated data structure. As a consequence, the WebIDL should have a profile member, and no conformsTo one...

Adding another member is easy enough, but we shouldn't get rid of conformsTo since it's not exclusively about setting a profile. It's also a means of identifying conformance with wcag or other possible standards, so that information shouldn't still be retained.

rdeltour

Nice work!
I've not reviewed all the changes in detail, but did review the general approach and read the algo. It's a clear improvement IMO, which fixes some ambiguities.

only major change is to move inserting of html defaults to the end of algorithm so title and reading order don't become empty too late to harvest

mattgarrish · 2019-10-17T01:10:17Z

I've just made a few more changes to fix up the algorithm. The only prominent one is to move the harvesting of html defaults to the last step. Until the data is processed and validated (and invalid data removed), there's no way to know if name and readingOrder could benefit from this step (i.e., it can be useful both if values are intentionally omitted or are invalid).

iherman · 2019-10-17T04:58:34Z

Adding another member is easy enough, but we shouldn't get rid of conformsTo since it's not exclusively about setting a profile. It's also a means of identifying conformance with wcag or other possible standards, so that information shouldn't still be retained.

Oops, I forgot about this. You are right

mattgarrish · 2019-10-17T11:18:56Z

I wonder whether it is possible to add explanation blocks to the "functions"

I'm going to do this now that it seems stable, but I also just made a commit that groups the functions together by how they're used. The first level are the ones called by the main processing algorithm, and then beneath each of those are the definitions for the specific functions they call.

iherman · 2019-10-17T11:28:53Z

The context specific nature of some of the terms (like url) is not reflected in the "normalize data" function. Are the values of term and value enough as input arguments for that function? Shouldn't we also forward a value of a "context", ie, the value of manifest when calling from the "processing a manifest" and the value of normalized when recursively invoked from (8)?

iherman · 2019-10-17T11:56:07Z

The normalization step do not seem to take care of the possibility of having, as an expected value, a single Localizable String (and turn a simple value into one). Or do we want to remove that category (this is related to #115)?

mattgarrish · 2019-10-17T14:08:13Z

The normalization step do not seem to take care of the possibility of having, as an expected value, a single Localizable String

Yes, that's what I was asking in #115. When would you restrict a localizable string to only one locale at the specification level? I haven't been able to think of a use case, as it seems to impact i18n if you do. If you have a localizable string, you seem to require an array.

We do convert single strings into an array with only the one localized string, but never into a term with just a localizablestring object as its value.

We could perhaps make clear in the localizable string definition that whenever you use one you have to allow an array.

dauwhe · 2019-10-17T14:11:05Z

If manifest is not a map, fatal error, return failure.

Is it possible for a syntactically valid JSON file to not be a map?

mattgarrish · 2019-10-17T14:19:57Z

Is it possible for a syntactically valid JSON file to not be a map?

Not that I'm aware of. A JSON object is a set of key/value pairs which is what a map represents.

I checked the translation to infra types with the WebDriver spec to try and match up the conversion prose.

mattgarrish · 2019-10-17T14:33:08Z

The context specific nature of some of the terms (like url) is not reflected in the "normalize data" function.

Blah, right, I forgot to go back to that.

Yes, we need to send in if there is a parent context. But how far down the potential rabbit hole do we want to follow this? Context seemingly could change anything, but right now it would only affect whether url is converted to an array or not. I can handle this by changing the start to:

If, depending on context, term expects an array ...

But, related to the other question about single localizable strings, when would you not allow more than one entity, or disallow more than one linked resource? We don't currently have any use cases for these, so do we make the algorithm account for scenarios that may never play out?

mattgarrish · 2019-10-17T14:42:49Z

Yes, we need to send in if there is a parent context.

Although, trying to think this through, what meaningful context can you send to the function? What distinguishes url coming in in the initial call from url being encountered recursively? Maybe if a type is set you'll have some context for the nested case, but otherwise what system do we use to differentiate?

mattgarrish · 2019-10-17T14:46:05Z

otherwise what system do we use to differentiate?

Do we just keep sending through the initial key into each recursive call, so if you find url anywhere inside of readingOrder it has to be a single value?

dauwhe · 2019-10-17T15:00:40Z

Is the preview still accurate? I worry that this is invalid:

{
  "@context": ["https://www.w3.org/ns/pub-context", "https://schema.org"],
  "type": "Audiobook",
...
}

It's even hard to research issues like the ordering of multiple contexts in JSON-LD because this seems to be a rather advanced usage.

mattgarrish · 2019-10-17T15:19:10Z

I worry that this is invalid:

Yes, it is, but the algorithm is only checking what the spec requires: https://w3c.github.io/pub-manifest/#manifest-context

I believe the order is necessary because our context inherits from schema.org, but @iherman can speak better to that. (But a new issue would really help me track what's left to do here :)

iherman · 2019-10-17T15:43:50Z

If manifest is not a map, fatal error, return failure.
Is it possible for a syntactically valid JSON file to not be a map?

Yes. It can be an array.

iherman · 2019-10-17T15:47:18Z

If, depending on context, term expects an array ...

But, related to the other question about single localizable strings, when would you not allow more than one entity, or disallow more than one linked resource? We don't currently have any use cases for these, so do we make the algorithm account for scenarios that may never play out?

The only place it came up, afaik, is with url. It does not make any sense to have multiple url-s for a linked resource, but it seems to be o.k. to have several url-s for a publication.

The only (very weak!) use case I have for the latter is the duality of short name based URI-s for a W3C TR document and its full blown, dated one. Although I would probably encode it using id for the short named one...

iherman · 2019-10-17T15:49:05Z

Although, trying to think this through, what meaningful context can you send to the function? What distinguishes url coming in in the initial call from url being encountered recursively? Maybe if a type is set you'll have some context for the nested case, but otherwise what system do we use to differentiate?

In my (coming, but slowly...) Typescript implementation the only information you need is its type (in the Typescript sense). I am not sure how to put that information into infra in general...

iherman · 2019-10-17T15:52:00Z

I believe the order is necessary because our context inherits from schema.org, but @iherman can speak better to that. (But a new issue would really help me track what's left to do here :)

I think the order is relevant; there are some schema.org terms for which we define some additional information in the context. Typical case is to make it clear that the order of the authors is significant (in JSON-LD parlance we add a "@container:@list" to the reference). If the schema.org context came second, it would wipe this out.

mattgarrish · 2019-10-17T16:00:45Z

The only place it came up, afaik, is with url. It does not make any sense to have multiple url-s for a linked resource, but it seems to be o.k. to have several url-s for a publication.

Right, that's why I'm wondering if we should just put in the array step that whether to convert a value is context-dependent but leave the rest of the algorithm alone (i.e., for linkedresource, entity and localizablestring, keep the expectation that they will be arrays).

There are probably other cases where single/array may factor into processing, but it seems like something we can leave for extension steps to cover.

iherman · 2019-10-17T16:24:50Z

You mean we make an exception for href or for the array? But either way, the information whether that step should be applied is context dependent...

I guess I do not really understand what you mean:-(

mattgarrish · 2019-10-17T16:34:56Z

You mean we make an exception for href or for the array?

We make an exception for whether url gets converted to an array. If the context is url (the top-level instance), it allows an array. If the context is readingOrder, resources, links or alternate, then there is no conversion to an array (we'd have to similarly check this at the value category step, too).

But after the step to convert the incoming data to an array, we have three steps that are array-dependent (for an "array of linkedresources", etc.). I say we leave these alone and not make them, for example, "a linkedresource or array of linkedresources" and have to handle both cases in the algorithm. We don't have any cases where a property accepts single values of these.

mattgarrish · 2019-10-17T16:37:37Z

Another sad realization I'm having while looking at adding explanations is that it no longer makes a lot of sense to use javascript examples when we're talking about infra types. They should probably be swapped to the new syntax... :'(

add profile term to internal rep; merge recursion steps and add intro prose; improve some function names; add explanations for most steps; convert json examples to infra; add dfn for objects

mattgarrish · 2019-10-17T23:32:44Z

The last commit has a couple of major changes:

it adds context variables to function calls and also adds a definition at the start of the algorithm so the purpose of the context doesn't repeatedly have to be explained.
it adds a profile term to internal rep;

I don't like reusing "context", but can't think of a better name. Having the definition, and linking to it from the function calls, helps a bit, though.

And with the change to note that LocalizableString always becomes an array, I think I'm now up to date with all the comments in this thread. Let me know if there are still issues with the changes I've made, though

iherman · 2019-10-18T04:30:39Z

I would say merge this, it is a great equilibrium point.

As I go ahead with my typescript toy implementation I may hit issues, but we can treat them as they come.

B.t.w.:

Another sad realization I'm having while looking at adding explanations is that it no longer makes a lot of sense to use javascript examples when we're talking about infra types. They should probably be swapped to the new syntax... :'(

I see you have made the changes and I am not sure it was necessary... I still find the JS/JSON syntax way more readable than infra so, as a reader, the examples may be less helpful to me now than they were. But maybe this is because I am an old-skool person who has not yet grasped the beauties of infra...

iherman · 2019-10-18T06:06:24Z

Minor: 4.3.1./3 (setting entities) if the value is an Array, it should also be a validation error. Actually, this may be the case for (4) and (5), too

iherman · 2019-10-18T06:32:49Z

Actually, this may be the case for (4) and (5), too

Sorry, it is there for (4) and (5) (the terminology is 'list' not 'array' :-)

mattgarrish · 2019-10-18T11:15:30Z

But maybe this is because I am an old-skool person who has not yet grasped the beauties of infra...

The use of guillemets with both lists and maps makes it more confusing than it probably needs to be, but you get used to it.

Try writing it without a french keyboard or an arrow character, though... :(

mattgarrish · 2019-10-18T11:26:24Z

(the terminology is 'list' not 'array' :-)

Normally I'd agree with you, but not in the normalization function. We're referring to our value categories when talking about what a term expects, so it's clearer to use "array" than link "list" to our non-infra definition.

But, yes, a list in a list of entities should be an error. I'll add that.

iherman · 2019-10-18T11:45:53Z

Point 8 in the normalization function is, I believe, wrong. It says:

if normalized is a list, for each item of normalized that is a map, for each key → keyValue of normalized, set key to the result of running normalize data given key, keyValue, lang, dir, base and context.

I believe it should say

if normalized is a list, for each item of normalized that is a map, for each key → keyValue of item, set key to the result of running normalize data given key, keyValue, lang, dir, base and item.

the key/keyValue should refer to the item (in the list) and not to normalized
the context set for the recursive normalization should be item (ie the map in the list), and not the context that was used when calling normalization

Also the return value of the normalization may be a 'failure'. Just as for the top level, the following remark should also be added somehow. That means it should be removed from the list.

(When the value is a map, that means that 'normalized' will get the failure value, which will return failure to the upper level caller. Although it may be good to spell this out.)

mattgarrish · 2019-10-18T12:01:44Z

the key/keyValue should refer to the item (in the list) and not to normalized

Yes, that's wrong.

the context set for the recursive normalization should be item (ie the map in the list), and not the context that was used when calling normalization

The context should be key, I believe, not the whole object, as that way you always have the owner term to compare against (e.g., readingOrder would be the context for all the linkedresources in its array, and then alternate would become the context when you begin processing all the linkedresources within that term).

@context

fix #117 by removing @context during normalization

mattgarrish · 2019-10-18T14:01:53Z

I've made the latest batch of fixes in the last commit and also added a check to remove @context during the normalization step.

I'm going to merge this now, as it's grown beyond its useful life as a PR now. Time to get back to more atomic commits.

mattgarrish requested review from iherman, GarthConboy and wareid October 16, 2019 12:33

mattgarrish changed the title ~~Updates to processing algroithm and other fixes~~ Updates to processing algorithm and other fixes Oct 16, 2019

iherman reviewed Oct 16, 2019

View reviewed changes

index.html Outdated Show resolved Hide resolved

index.html Outdated Show resolved Hide resolved

index.html Outdated Show resolved Hide resolved

iherman requested a review from rdeltour October 16, 2019 16:58

rdeltour approved these changes Oct 16, 2019

View reviewed changes

updates to processing algorithm from implementing as code;

327d8e7

only major change is to move inserting of html defaults to the end of algorithm so title and reading order don't become empty too late to harvest

fix function name

28e1d27

group processing functions

5a16bc0

mattgarrish added 2 commits October 17, 2019 20:17

add context variables to function calls;

f3f630f

add profile term to internal rep; merge recursion steps and add intro prose; improve some function names; add explanations for most steps; convert json examples to infra; add dfn for objects

clarify that localizablestring is always converted to an array

513f8da

iherman self-requested a review October 18, 2019 04:26

iherman approved these changes Oct 18, 2019

View reviewed changes

updates per Ivan's latest comments;

b768200

fix #117 by removing @context during normalization

mattgarrish merged commit 97e1f46 into master Oct 18, 2019

mattgarrish deleted the editorial/processing-clarifications branch October 18, 2019 14:02

This was referenced Oct 18, 2019

split processing functions #108

Closed

mixing value type/category #113

Closed

value type check complicated and draconian #109

Closed

conformsTo single and/or array of literals #106

Closed

accessmodesufficient value type confusing #110

Closed

Updates to processing algorithm and other fixes #116

Updates to processing algorithm and other fixes #116

Conversation

mattgarrish commented Oct 16, 2019 • edited by pr-preview bot Loading

mattgarrish commented Oct 16, 2019

iherman commented Oct 16, 2019

iherman left a comment

Choose a reason for hiding this comment

iherman commented Oct 16, 2019

mattgarrish commented Oct 16, 2019

rdeltour left a comment

Choose a reason for hiding this comment

mattgarrish commented Oct 17, 2019

iherman commented Oct 17, 2019

mattgarrish commented Oct 17, 2019

iherman commented Oct 17, 2019

iherman commented Oct 17, 2019

mattgarrish commented Oct 17, 2019

dauwhe commented Oct 17, 2019

mattgarrish commented Oct 17, 2019

mattgarrish commented Oct 17, 2019

mattgarrish commented Oct 17, 2019

mattgarrish commented Oct 17, 2019

dauwhe commented Oct 17, 2019

mattgarrish commented Oct 17, 2019

iherman commented Oct 17, 2019

iherman commented Oct 17, 2019

iherman commented Oct 17, 2019

iherman commented Oct 17, 2019

mattgarrish commented Oct 17, 2019

iherman commented Oct 17, 2019

mattgarrish commented Oct 17, 2019

mattgarrish commented Oct 17, 2019

mattgarrish commented Oct 17, 2019

iherman commented Oct 18, 2019

iherman commented Oct 18, 2019

iherman commented Oct 18, 2019

mattgarrish commented Oct 18, 2019

mattgarrish commented Oct 18, 2019

iherman commented Oct 18, 2019

mattgarrish commented Oct 18, 2019

mattgarrish commented Oct 18, 2019

mattgarrish commented Oct 16, 2019 •

edited by pr-preview bot

Loading