Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to processing algorithm and other fixes #116

Merged
merged 7 commits into from
Oct 18, 2019

Conversation

mattgarrish
Copy link
Member

@mattgarrish mattgarrish commented Oct 16, 2019

This PR makes the following changes:

  • updates the processing algorithm to split it into more atomic functions where appropriate (makes the main algorithm clearer and also makes it easier for profiles to extend specific functions instead of step numbers).
  • updates description and accessibilitySummary to arrays of localizable strings.
  • standardizes the name "value category".
  • adds a generic object category and updates accessModeSufficient to reference it.
  • adds prose for reading systems to inspect reading order for unknown profiles and react accordingly.

It should fix #101 #106 #108 #109 #110 #113 and #115

I'm going to update the experimental processor code next just to make sure I didn't break anything splitting up the algorithm.


Preview | Diff

updates description and accessibilitySummary to arrays of localizable strings;
standardizes on "value category";
adds a generic object category and updates accessModeSufficient to reference it;
adds prose for reading systems to inspect reading order for unknown profiles and react accordingly
@mattgarrish mattgarrish changed the title Updates to processing algroithm and other fixes Updates to processing algorithm and other fixes Oct 16, 2019
@mattgarrish
Copy link
Member Author

I also split the processing section in three, as it was getting bloated. There's now introduction and error handling subsections.

@iherman
Copy link
Member

iherman commented Oct 16, 2019

To repeat what I said in #101 (comment): the algorithm yield the profile value, that is then part of the generated data structure. As a consequence, the WebIDL should have a profile member, and no conformsTo one...

Copy link
Member

@iherman iherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all, this is a big improvement!

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@iherman iherman requested a review from rdeltour October 16, 2019 16:58
@iherman
Copy link
Member

iherman commented Oct 16, 2019

I wonder whether it is possible to add explanation blocks to the "functions", mainly the last ones. By the time the reader gets there, he/she may be a bit lost as for the goals of these and the fit the full picture.

@mattgarrish
Copy link
Member Author

the algorithm yield the profile value, that is then part of the generated data structure. As a consequence, the WebIDL should have a profile member, and no conformsTo one...

Adding another member is easy enough, but we shouldn't get rid of conformsTo since it's not exclusively about setting a profile. It's also a means of identifying conformance with wcag or other possible standards, so that information shouldn't still be retained.

Copy link
Member

@rdeltour rdeltour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!
I've not reviewed all the changes in detail, but did review the general approach and read the algo. It's a clear improvement IMO, which fixes some ambiguities.

only major change is to move inserting of html defaults to the end of algorithm so title and reading order don't become empty too late to harvest
@mattgarrish
Copy link
Member Author

I've just made a few more changes to fix up the algorithm. The only prominent one is to move the harvesting of html defaults to the last step. Until the data is processed and validated (and invalid data removed), there's no way to know if name and readingOrder could benefit from this step (i.e., it can be useful both if values are intentionally omitted or are invalid).

@iherman
Copy link
Member

iherman commented Oct 17, 2019

Adding another member is easy enough, but we shouldn't get rid of conformsTo since it's not exclusively about setting a profile. It's also a means of identifying conformance with wcag or other possible standards, so that information shouldn't still be retained.

Oops, I forgot about this. You are right

@mattgarrish
Copy link
Member Author

I wonder whether it is possible to add explanation blocks to the "functions"

I'm going to do this now that it seems stable, but I also just made a commit that groups the functions together by how they're used. The first level are the ones called by the main processing algorithm, and then beneath each of those are the definitions for the specific functions they call.

@iherman
Copy link
Member

iherman commented Oct 17, 2019

The context specific nature of some of the terms (like url) is not reflected in the "normalize data" function. Are the values of term and value enough as input arguments for that function? Shouldn't we also forward a value of a "context", ie, the value of manifest when calling from the "processing a manifest" and the value of normalized when recursively invoked from (8)?

@iherman
Copy link
Member

iherman commented Oct 17, 2019

The normalization step do not seem to take care of the possibility of having, as an expected value, a single Localizable String (and turn a simple value into one). Or do we want to remove that category (this is related to #115)?

@mattgarrish
Copy link
Member Author

The normalization step do not seem to take care of the possibility of having, as an expected value, a single Localizable String

Yes, that's what I was asking in #115. When would you restrict a localizable string to only one locale at the specification level? I haven't been able to think of a use case, as it seems to impact i18n if you do. If you have a localizable string, you seem to require an array.

We do convert single strings into an array with only the one localized string, but never into a term with just a localizablestring object as its value.

We could perhaps make clear in the localizable string definition that whenever you use one you have to allow an array.

@dauwhe
Copy link

dauwhe commented Oct 17, 2019

If manifest is not a map, fatal error, return failure.

Is it possible for a syntactically valid JSON file to not be a map?

@mattgarrish
Copy link
Member Author

Is it possible for a syntactically valid JSON file to not be a map?

Not that I'm aware of. A JSON object is a set of key/value pairs which is what a map represents.

I checked the translation to infra types with the WebDriver spec to try and match up the conversion prose.

@mattgarrish
Copy link
Member Author

The context specific nature of some of the terms (like url) is not reflected in the "normalize data" function.

Blah, right, I forgot to go back to that.

Yes, we need to send in if there is a parent context. But how far down the potential rabbit hole do we want to follow this? Context seemingly could change anything, but right now it would only affect whether url is converted to an array or not. I can handle this by changing the start to:

If, depending on context, term expects an array ...

But, related to the other question about single localizable strings, when would you not allow more than one entity, or disallow more than one linked resource? We don't currently have any use cases for these, so do we make the algorithm account for scenarios that may never play out?

@mattgarrish
Copy link
Member Author

Yes, we need to send in if there is a parent context.

Although, trying to think this through, what meaningful context can you send to the function? What distinguishes url coming in in the initial call from url being encountered recursively? Maybe if a type is set you'll have some context for the nested case, but otherwise what system do we use to differentiate?

@mattgarrish
Copy link
Member Author

otherwise what system do we use to differentiate?

Do we just keep sending through the initial key into each recursive call, so if you find url anywhere inside of readingOrder it has to be a single value?

@dauwhe
Copy link

dauwhe commented Oct 17, 2019

Is the preview still accurate? I worry that this is invalid:

{
  "@context": ["https://www.w3.org/ns/pub-context", "https://schema.org"],
  "type": "Audiobook",
...
}

It's even hard to research issues like the ordering of multiple contexts in JSON-LD because this seems to be a rather advanced usage.

@mattgarrish
Copy link
Member Author

I worry that this is invalid:

Yes, it is, but the algorithm is only checking what the spec requires: https://w3c.github.io/pub-manifest/#manifest-context

I believe the order is necessary because our context inherits from schema.org, but @iherman can speak better to that. (But a new issue would really help me track what's left to do here :)

@iherman
Copy link
Member

iherman commented Oct 17, 2019

If manifest is not a map, fatal error, return failure.
Is it possible for a syntactically valid JSON file to not be a map?

Yes. It can be an array.

@iherman
Copy link
Member

iherman commented Oct 17, 2019

If, depending on context, term expects an array ...

But, related to the other question about single localizable strings, when would you not allow more than one entity, or disallow more than one linked resource? We don't currently have any use cases for these, so do we make the algorithm account for scenarios that may never play out?

The only place it came up, afaik, is with url. It does not make any sense to have multiple url-s for a linked resource, but it seems to be o.k. to have several url-s for a publication.

The only (very weak!) use case I have for the latter is the duality of short name based URI-s for a W3C TR document and its full blown, dated one. Although I would probably encode it using id for the short named one...

@iherman
Copy link
Member

iherman commented Oct 17, 2019

Although, trying to think this through, what meaningful context can you send to the function? What distinguishes url coming in in the initial call from url being encountered recursively? Maybe if a type is set you'll have some context for the nested case, but otherwise what system do we use to differentiate?

In my (coming, but slowly...) Typescript implementation the only information you need is its type (in the Typescript sense). I am not sure how to put that information into infra in general...

@iherman
Copy link
Member

iherman commented Oct 17, 2019

I believe the order is necessary because our context inherits from schema.org, but @iherman can speak better to that. (But a new issue would really help me track what's left to do here :)

I think the order is relevant; there are some schema.org terms for which we define some additional information in the context. Typical case is to make it clear that the order of the authors is significant (in JSON-LD parlance we add a "@container:@list" to the reference). If the schema.org context came second, it would wipe this out.

@mattgarrish
Copy link
Member Author

The only place it came up, afaik, is with url. It does not make any sense to have multiple url-s for a linked resource, but it seems to be o.k. to have several url-s for a publication.

Right, that's why I'm wondering if we should just put in the array step that whether to convert a value is context-dependent but leave the rest of the algorithm alone (i.e., for linkedresource, entity and localizablestring, keep the expectation that they will be arrays).

There are probably other cases where single/array may factor into processing, but it seems like something we can leave for extension steps to cover.

@iherman
Copy link
Member

iherman commented Oct 17, 2019

You mean we make an exception for href or for the array? But either way, the information whether that step should be applied is context dependent...

I guess I do not really understand what you mean:-(

@mattgarrish
Copy link
Member Author

You mean we make an exception for href or for the array?

We make an exception for whether url gets converted to an array. If the context is url (the top-level instance), it allows an array. If the context is readingOrder, resources, links or alternate, then there is no conversion to an array (we'd have to similarly check this at the value category step, too).

But after the step to convert the incoming data to an array, we have three steps that are array-dependent (for an "array of linkedresources", etc.). I say we leave these alone and not make them, for example, "a linkedresource or array of linkedresources" and have to handle both cases in the algorithm. We don't have any cases where a property accepts single values of these.

@mattgarrish
Copy link
Member Author

Another sad realization I'm having while looking at adding explanations is that it no longer makes a lot of sense to use javascript examples when we're talking about infra types. They should probably be swapped to the new syntax... :'(

add profile term to internal rep;
merge recursion steps and add intro prose;
improve some function names;
add explanations for most steps;
convert json examples to infra;
add dfn for objects
@mattgarrish
Copy link
Member Author

The last commit has a couple of major changes:

  • it adds context variables to function calls and also adds a definition at the start of the algorithm so the purpose of the context doesn't repeatedly have to be explained.
  • it adds a profile term to internal rep;

I don't like reusing "context", but can't think of a better name. Having the definition, and linking to it from the function calls, helps a bit, though.

And with the change to note that LocalizableString always becomes an array, I think I'm now up to date with all the comments in this thread. Let me know if there are still issues with the changes I've made, though

@iherman iherman self-requested a review October 18, 2019 04:26
@iherman
Copy link
Member

iherman commented Oct 18, 2019

I would say merge this, it is a great equilibrium point.

As I go ahead with my typescript toy implementation I may hit issues, but we can treat them as they come.

B.t.w.:

Another sad realization I'm having while looking at adding explanations is that it no longer makes a lot of sense to use javascript examples when we're talking about infra types. They should probably be swapped to the new syntax... :'(

I see you have made the changes and I am not sure it was necessary... I still find the JS/JSON syntax way more readable than infra so, as a reader, the examples may be less helpful to me now than they were. But maybe this is because I am an old-skool person who has not yet grasped the beauties of infra...

@iherman
Copy link
Member

iherman commented Oct 18, 2019

Minor: 4.3.1./3 (setting entities) if the value is an Array, it should also be a validation error. Actually, this may be the case for (4) and (5), too

@iherman
Copy link
Member

iherman commented Oct 18, 2019

Actually, this may be the case for (4) and (5), too

Sorry, it is there for (4) and (5) (the terminology is 'list' not 'array' :-)

@mattgarrish
Copy link
Member Author

But maybe this is because I am an old-skool person who has not yet grasped the beauties of infra...

The use of guillemets with both lists and maps makes it more confusing than it probably needs to be, but you get used to it.

Try writing it without a french keyboard or an arrow character, though... :(

@mattgarrish
Copy link
Member Author

(the terminology is 'list' not 'array' :-)

Normally I'd agree with you, but not in the normalization function. We're referring to our value categories when talking about what a term expects, so it's clearer to use "array" than link "list" to our non-infra definition.

But, yes, a list in a list of entities should be an error. I'll add that.

@iherman
Copy link
Member

iherman commented Oct 18, 2019

Point 8 in the normalization function is, I believe, wrong. It says:

if normalized is a list, for each item of normalized that is a map, for each key → keyValue of normalized, set key to the result of running normalize data given key, keyValue, lang, dir, base and context.

I believe it should say

if normalized is a list, for each item of normalized that is a map, for each key → keyValue of item, set key to the result of running normalize data given key, keyValue, lang, dir, base and item.

  • the key/keyValue should refer to the item (in the list) and not to normalized
  • the context set for the recursive normalization should be item (ie the map in the list), and not the context that was used when calling normalization

Also the return value of the normalization may be a 'failure'. Just as for the top level, the following remark should also be added somehow. That means it should be removed from the list.

(When the value is a map, that means that 'normalized' will get the failure value, which will return failure to the upper level caller. Although it may be good to spell this out.)

@mattgarrish
Copy link
Member Author

  • the key/keyValue should refer to the item (in the list) and not to normalized

Yes, that's wrong.

  • the context set for the recursive normalization should be item (ie the map in the list), and not the context that was used when calling normalization

The context should be key, I believe, not the whole object, as that way you always have the owner term to compare against (e.g., readingOrder would be the context for all the linkedresources in its array, and then alternate would become the context when you begin processing all the linkedresources within that term).

fix #117 by removing @context during normalization
@mattgarrish
Copy link
Member Author

I've made the latest batch of fixes in the last commit and also added a check to remove @context during the normalization step.

I'm going to merge this now, as it's grown beyond its useful life as a PR now. Time to get back to more atomic commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lack of conformsTo too draconian?
4 participants