The textDirection and processingLanguage properties are not needed #335

kevinmarks · 2016-08-02T19:27:22Z

These are both simplistic assertions about an external resource that provide no useful information to a user or a user-agent.
An external resource can have multiple text directions, and languages; attempting to boil these down to one is not practical in general. See http://unicode.org/reports/tr9/ for the nuances of text direction

halindrome · 2016-08-02T19:53:29Z

I personally agree with this.

azaroth42 · 2016-08-02T20:21:48Z

Respectfully, these were added as the result of a LOT of discussion with the internationalization group as to their real utility and requirements. Unless you can get them to agree with your position, there's no new information to reconsider their inclusion. They're not mandatory, so if you don't like them, don't use them :)

kevinmarks · 2016-08-02T20:22:55Z

Is this discussion documented anywhere?

On 2 Aug 2016 1:21 pm, "Rob Sanderson" notifications@github.com wrote:

Respectfully, these were added as the result of a LOT of discussion with
the internationalization group as to their real utility and requirements.
Unless you can get them to agree with your position, there's no new
information to reconsider their inclusion. They're not mandatory, so if you
don't like them, don't use them :)

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#335 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAGCwMa_wa-xgRWBJ3wy68rUkmpYAfvdks5qb6begaJpZM4Ja9iv
.

tantek · 2016-08-02T20:25:19Z

Are there test cases for these properties to verify implementations of them and ensure interoperability? Are they "at risk" or are they required of all implementations?

aaronpk · 2016-08-02T20:26:27Z

From what I understand, the i18n group was not recommending specifically these properties. They were pointing out the issue of supporting bidirectional text as a need.

My current understanding of unicode and utf-8 is that there are a handful of control characters that can accomplish the text direction issues that were raised.

azaroth42 · 2016-08-02T20:29:07Z

Links:

halindrome · 2016-08-02T20:46:26Z

@tantek there are not yet test cases but yes, they will be tested. We do not consider these particular properties to be "features" as such, but will use the testing to be able to assess whether the various optional properties are in use in the various implementations under test.

@aaronpk yes it is possible to embed text direction in Unicode strings. The basic natural language of an annotation could be useful in that a client could select comments on something only in languages they understand.

r12a · 2016-08-02T23:17:41Z

These are both simplistic assertions about an external resource that provide no useful information to a user or a user-agent.
An external resource can have multiple text directions, and languages; attempting to boil these down to one is not practical in general. See http://unicode.org/reports/tr9/ for the nuances of text direction

Could you clarify what is the alternative you are proposing?

aphillips · 2016-08-02T23:55:38Z

// chair hat off

Note that Section 3.2.1 does not refer to the body of the Annotation itself. It contains a set of attributes that can be applied to an external resource. That is, to a separate file---not the Annotation itself. This is an important distinction, since in many cases the given attributes do not have an effect on the processing or presentation because the resource itself takes care of providing the necessary information.

That said, there exists an important type of resource where this is not true: plain text formats.

@kevinmarks Yes, text can have multiple directions. However, the Unicode Bidirectional Algorithm (TR9, which you mention) requires, among other things, a base direction with which to start. By default this is usually left-to-right (LTR) because most scripts are LTR. However, for documents containing primarily right-to-left (RTL) content, it is useful to have a way to set the base direction to RTL in cases in which the resource cannot itself set the base direction, such as plain text.

@halindrome That's correct, there are Unicode controls that can be used (and are sometimes required) in plain text to control and manage directionality. However, these controls may not be present in a resource if the base direction is supplied externally. Thus, providing a base direction externally may be necessary in order to ensure proper presentation.

@kevinmarks Regarding processingLanguage, this exists precisely because language is fulfilling a different purpose. The language attribute, which is allowed to appear multiple times per resource, is meant to indicate the intended audience(s) for the resource. Because it can appear 0 or more times, there is no way to set the base language for presenting the resource in cases where two or more language tags are provided and this can have an important effect on the intelligibility of the presentation (particularly for font selection in Far East languages). It is a bit of a kludge, since ideally language would take care of this.

It is the case that most document formats contain language and direction information (in which case, any external attributes should be ignored). But the current zoo of attributes exists to serve the subset of resources that cannot help themselves.

gsergiu · 2016-08-03T09:54:35Z

I also had several complains on the processingLanguage, and I still have the feeling that this is not documented enough in the standard, and concrete usecases are needed to understand theri meaning.
In the past discussions there were two types of scenarios discussed:

One was related to the "correct" representation of texts with multiple languages (e.g. european, arabic, chinese, hebrew...) Additionally to this scenario, there was the concern of audio readers...
There is the search scenario, where the NLPs need to know which algorithms to use, as they are language specific.

again my feedback on the 2 scenario types:

I doubt that processingLanguage and textDirection are able to solve the (absolutely) correct representations of the text. Simply because the exact identification of the text parts written in different languages is needed.
For the indexing/search scenario, processingLanguage might be sufficient, still ... I'm not convinced that this should have a single value! It is ok for text that are writen to >90% in one language, but absolutely not ok for texts which habe near 50-50% distribution!
Futhermore, it is not enough to have a definition of processingLanguage, which is anyway a little bit vague given that it is intended to serve to purposes at this stage (a dangerous approach).
Who sould set this property?

This is for sure a property that will not be set by the end users. (they are likely to set the language property)
is the client application the part of the system in charge of setting this value when the annotation is created? .. probalby in some exotic scenarios, as I don't expect that the NLP is applied before pressing the submit button.
is the server in charge of setting the processingLanguage? Well .... actually the server is the one that needs this value as input, in order to know how to tokenize, normalize, stemm the text. Should an automatic language detection algorithm be used? Should the server simply advertize the language of processing algorithms that were applied? If yes, why should be the server be constrained to use only one processingLanguage?
I think this is mainly a kind of client-server negociation mechanism (the client should know which processing languages are supported by the server and choose one or more of them). I think this is the first usecase to be addressed in order to provide a clear definition and meaning of the field.

BigBlueHat · 2016-08-03T13:27:14Z

Some more links for those who care to dig and understand:
https://www.w3.org/International/wiki/ContentMetadataJavaScriptDiscussion
http://r12a.github.io/docs/bidi-plain-text/index.html

Given that the default text direction is not always properly set within a text/plain resource or textual property value, having the textDirection property seems useful--and has plenty of history at its back.

processingLanguage seems most valuable to machine generated and consumed annotations. @aphillips summary above seems sufficient.

gsergiu · 2016-08-03T14:50:34Z

@BigBlueHat
1 . The default text direction can be automatically identified from the script part of the tag codes. The default script for each language is defiend in the IANA registry. This is all static information.

So there is a differentiation to make between the external resources and textual body, and these should probably be explained in a non-normative note. (this should be always the case when redundant information is included in the annotations. It must be stated clear which is the master source of information in case of conflicts.)

2 . exactly this kind of information is missing in the current draft

... processingLanguage seems most valuable to machine generated and consumed annotations. @aphillips summary above seems sufficient.

azaroth42 · 2016-08-03T14:55:43Z

Unless there's a proposal that hasn't already been discussed in the links referenced above, I propose close wontfix.

jjett · 2016-08-03T15:01:59Z

+1 for close/wontfix

aaronpk · 2016-08-03T16:28:20Z

My point is that there are unicode control characters that can already accomplish setting the base text direction. From https://www.w3.org/International/questions/qa-bidi-unicode-controls

The following example shows how these control characters could be used in plain text.

The example given is setting the base direction of the text in am HTML title attribute.

Having a separate property outside of the string itself for specifying the base text direction is fragile and will likely lead to loss of that information as it is propagated between systems.

My proposal is to drop the textDirection property and add a note recommending including the unicode control characters in the string if it is needed.

gsergiu · 2016-08-03T16:48:21Z

well ... obviously someone needs this peace of information, but in the current version of the draft, they are a bit confusing. I consider that better explanation should be added (at least as non-normative notes) in order to:

provide a better explanation on why and when they are needed
indicate how do they relate to the other properties/property values to which they are redundant

azaroth42 · 2016-08-03T17:03:53Z

Okay, but that's not /this/ issue. Please create a new issue, with a proposal for improved descriptions. Thanks!

tantek · 2016-08-03T17:21:04Z

Are there any other JSON-based specs that have textDirection and processingLanguage properties , and if so (links?), any implementation experience with them (links?), any publishing / consuming sites/code experience with them? (links?)

And if not, (that is, this spec is the first to attempt this solution, which is my guess) then can we at a minimum mark textDirection and processingLanguage as "at risk" since they are more in a state of incubation, rather than having been proven?

Ok for each property to have different answers to the above questions, and thus different conclusions per above reasoning.

gsergiu · 2016-08-04T08:21:10Z

@azaroth42
Well .. there were enough discussions and proposals, but there were no decission.
I don't think it is helping to open many different tickets on the same ground problem.
If we close this ticket with a set of concrete action points, I'm happy to contribute with the creation of new issues and proposing concrete solutions for them. But as I said, I would need first a agreement/commitment from the editors that the new tickets will not have the same final as the existing one, for which it is recongnised that there is an issue/weaknes in the current version of the specification, but they are closed always with wontfix ...
This is from my point of view a contradiction.
It is natural to have long discussions on identifying the root of the problem ... and it is natural that the required action items are different than the original proposal of the ticket. But I don't feel ok to close the tickets without a conclusion. (won't fix is not a proper conclusion for this ticket)

gsergiu · 2016-08-04T08:30:34Z

@aaronpk I don't think that the implementations should do (complex) unicode text processing to extract information like textDirection, which is obvious/available in explicit for in the annotation editors. Yes ... I do agree that there is a level of redundancy in the processingLanguage and textDirection infromation, and these fields souldn't be considered the "master" source of information. From my side, this is what has to be documented. Also ... it should be also made clear who needs these fields and when?.. (basically the usecases that are missing in the specifications)

However, I agree with one point that it is a very bad idea to mixup the real payload with the presentation metadata (i.e. in html everyone uses css nowadays). But I bet this improvement could only happen in the second version of the standard.

BigBlueHat · 2016-08-04T14:49:19Z

@gsergiu this current issue is "The textDirection and processingLanguage properties are not needed." The editors on this issue--based on past discussions with @r12a and @aphillips as well as their current points feel that this particular issue should be marked as wontfix. If there are concrete actions to be taken with regards to clarification of their purpose, feel free to file them as issues.

There's also a wide ranging assumption in this current discussion that everything on the web is being stored and disturbed in Unicode, UTF-8, or at least something that supports the same or similar direction specifying characters.

Given that annotation bodies and targets can both be remote resources, it seems prudent that we give those who want it the ability to state these things--as it was deemed during our face-to-face when discussing I18N issues (with several RTL language authors and speakers present) that these would be useful to people working with these languages.

Please reference this issue from any new issues created that have direct action to be taken that solves for these scenarios in what you feel is a better way.

Thanks.
🎩

gsergiu · 2016-08-04T15:25:23Z

Yes , I expressed a similar opinion, that some people need it, and they should remain in the standard as long a no better solution is proposed.

However, the people the didn't participated in the past discussions are quite right to say that the fields are not needed, as there is no real explanation in the standard about who needs, this fields, given the existance of the language tags and implications of redundant information.
(you might remember also discussions on redundant fields for External resource)

Given this ... I claim that even this ticket is more about improper documentation of the 2 fields, that about their exclusion from the standard. And I hope that we have a agreement on this point.... otherwise we spend again time for long discussion that end up with "won't fix".

I find it more appropriate from the process point of view to accept that the ticket is partially valid, create a new ticket for that part, and than close the current ticket. (I cannot enforce this work process, but it is a little bit frustrasting to invest time in discussions, that end up with the conclusion .. you are right but we won't fix)

I would be great if the community members would have more time to contribute with very concrete/valid solutions, but this is very hard .. when we are not aware about past discussions.

So ... I hope I can create new tickets tomorrow, that it will be fine for me to close this issue ..

azaroth42 · 2016-08-04T17:25:49Z

If some people need it, then as @BigBlueHat said, this issue can be closed -- the claim is that they are /not/ needed. If we were to remove them, that would not be just an editorial change, it would be a normative one (hence @tantek's question about whether they're marked at-risk, given that we're in CR). If the solution is to provide a better explanation, that is just an editorial issue that we can take care of before the PR phase.

My request for a proposal as to a solution is because I want folks who claim the documentation is insufficient to not just complain, but actually come up with something better 😄 "I don't like the way you wrote that" is just not helpful at this stage.

iherman · 2016-08-04T18:12:22Z

On 4 Aug 2016, at 19:25, Rob Sanderson notifications@github.com wrote:

If some people need it, then as @BigBlueHat said, this issue can be closed -- the claim is that they are /not/ needed. If we were to remove them, that would not be just an editorial change, it would be a normative one (hence @tantek's question about whether they're marked at-risk, given that we're in CR). If the solution is to provide a better explanation, that is just an editorial issue that we can take care of before the PR phase.

+1

tantek · 2016-08-04T18:12:40Z

@azaroth42 agreed that "I don't like the way you wrote that" is just not helpful at this stage.

I believe the larger problem is one of not just "not needed", but rather, as this thread has uncovered: unproven, untested, and likely insufficient. A broken feature is typically worse than none.

Since apparently no other JSON-based spec uses such an approach (sideband properties per text property), textDirection and processingLanguage are a first time "hypothetical" and definitely aspirational proposal themselves.

I'm worried that they will give the appearance of satisfying i18n requirements, when in practice they won't (we don't know, and the burden of proof is on prototyping/implementability/usability, not on the absence thereof), and that will put us a worse position (broken features, backcompat headaches) than if they were absent.

Aside: In general W3C work (web platform in particular) is frowning on anything aspirational being REC-track at this point. Not completely consistently across W3C yet, but more and more, and this (Annotations) may be an instance worth paying attention to in that regard.

A concrete proposal would be drop these two aspirational properties, and instead provide a note explaining the limitations (as uncovered by i18n folks) in this version of the spec.

Additional optional details:

instead of textDirection, implementations should use Unicode directional control chars if present, otherwise use the "first strong" etc. algorithms described by the i18n folks.
instead of processingLanguage, implementations should use the HTTP Content-Language returned on the resource as a base

aaronpk · 2016-08-04T18:14:41Z

Agreed with all @tantek's points above. I've opened a new issue, #336 to discuss the concrete proposal of dropping textDirection in favor of recommending unicode control characters.

azaroth42 · 2016-08-04T18:21:51Z

@tantek: I agree with your points as well, and the somewhat last-minute addition of the properties is unfortunate. As with AS2, we left i18n review until we were happy with the rest of the work rather than engaging early. Hindsight being what it is, we certainly would have done that differently, and hopefully future WGs can learn from it rather than repeat.

That said, the review did reveal needs that aren't solved by unicode. The properties are not only for embedded strings (which in JSON we can expect to be unicode) but for arbitrary resources with URIs. I have no idea how PDFs store text strings (for example) and how well implemented the control characters are in those strings, but I can point you to many instances of older or just badly implemented XML documents in a huge variety of encodings. As these resources can take the role of the body of the Annotation, the unicode proposal isn't sufficient to address the requirements.

gsergiu · 2016-08-05T08:42:14Z

@azaroth42
I think that we have the key for the answer in you question:

That said, the review did reveal needs that aren't solved by unicode. The properties are not only for embedded strings (which in JSON we can expect to be unicode) but for arbitrary resources with URIs. I have no idea how PDFs store text strings (for example) and how well implemented the control characters are in those strings, but I can point you to many instances of older or just badly implemented XML documents in a huge variety of encodings. As these resources can take the role of the body of the Annotation, the unicode proposal isn't sufficient to address the requirements.

I see it exactly the opposite.

One might need to know the text direction for correct representation of text embedded in the annotations (TextualBody), not for the correct respresentation of external resources.
The external resources must have included inside the "files/bitstreams" all information required for a correct representation. It is not the responsability of annotations to correct wrong html/pdf/xml.
(I might be a usecase for it ... but it is not included in the current version of the standard).
Probably some selectors would need this information, the "textDirection" might be relevant for the text position selection. In that case ... the selector must set the value inside selector and not inside teh target/body

azaroth42 · 2016-08-05T16:20:05Z

Discussed on the telco of 2016-08-05. The resolution was that there is no new information that wasn't already discussed. The proposal does not address the established need to cover non unicode content, however much we might like to simply require unicode everywhere, retroactively.
[which is the topic of #336]

However, we fully acknowledge that the i18n group are the experts in this matter. If @r12a @fsasaki @aphillips would please weigh in to clarify, we're happy to go with whatever those recommendations are. We've tagged this as wontfix but not closed it, anticipating that the reaction will be that things haven't changed since the previous discussion and we should continue to include them.

If the text is unclear, we continue to seek explicit proposals for how to improve it and would very much welcome i18n review of that text to ensure that we are correctly representing the requirements and usage. [which does not have a separate issue]

For the resolution of "It is not the responsibility to correct wrong html/pdf/xml", we have opened Issue #339 to clarify that annotation born descriptive features are hints and not to be considered authoritative information. This covers much much more than just this one feature, and will be prominent at the top of the document.

Reference: http://www.w3.org/2016/08/05-annotation-irc#T15-30-52 (and above)

tantek · 2016-08-05T17:34:39Z

@azaroth42 appreciate the thoughtful consideration. I can understand the desire to at least try something (even if novel/untested) rather than nothing, and yes, defer that preference to WG consensus.

My only requests (to "accept" this resolution to keep these features) is to both 1&2 (optionally also 3):

Acknowledge the novel nature of these features with a non-normative "Warning" or "Note" saying something like these two properties are a novel (unverified and previously untested) way of attempting to handle direction/language information in a JSON syntax, and thus implementation and usability feedback is strongly encouraged, especially with respect to whether implementations are able to satisfy the i18n requirements (link) for users, with these features in particular.
Add to the exit criteria: End-user verification that users of RTL / mixed directions and mixed languages are able to satisfy the stated i18n requirements use-cases using implementations that implement these properties. (If this can't be verified, then there's no proof the features have actually "helped" such users, and thus having them there may be worse than not having them, since they would provide a false/superficial sense i18n support).
Optionally, mark these two properties as "At Risk", noting this aspect of the exit criteria, so that if this criterion that the features solve intended use-cases is not satisfied, then the WG has the option of simply dropping the properties in order to exit CR (presuming all other criteria are met of course) more quickly, deferring solving those use-cases to a different approach in a future version.

iherman · 2016-08-06T07:22:01Z

On 5 Aug 2016, at 19:34, Tantek Çelik notifications@github.com wrote:

@azaroth42 https://github.com/azaroth42 appreciate the thoughtful consideration. I can understand the desire to at least try something (even if novel/untested) rather than nothing, and yes, defer that preference to WG consensus.

My only requests (to "accept" this resolution to keep these features) is to both 1&2 (optionally also 3):

Acknowledge the novel nature of these features with a non-normative "Warning" or "Note" saying something like these two properties are a novel (unverified and previously untested) way of attempting to handle direction/language information in a JSON syntax, and thus implementation and usability feedback is strongly encouraged, especially with respect to whether implementations are able to satisfy the i18n requirements (link) for users, with these features in particular.

Add to the exit criteria: End-user verification that users of RTL / mixed directions and mixed languages are able to satisfy the stated i18n requirements use-cases using implementations that implement these properties. (If this can't be verified, then there's no proof the features have actually "helped" such users, and thus having them there may be worse than not having them, since they would provide a false/superficial sense i18n support).

Optionally, mark these two properties as "At Risk", noting this aspect of the exit criteria, so that if this criterion that the features solve intended use-cases is not satisfied, then the WG has the option of simply dropping the properties in order to exit CR (presuming all other criteria are met of course) more quickly, deferring solving those use-cases to a different approach in a future version.

While I would be fine with something like #1 referring to future versions of the spec, from an administrative point of view I am afraid #2 and #3 are not really possible. We are already in CR; setting/changing exit criteria or turning a feature to be 'at risk' is not possible at this point… Introducing this would trigger a new CR round and, beyond the extra time required it would be possible only with the Director's approval.

tantek · 2016-08-07T01:34:40Z

@iherman I don't understand what you mean by "#1 referring to future versions of the spec". The problem is in this version of the spec, and thus the note makes sense inline to refer to this version.

Re: would be possible only with the Director's approval.
My understanding is that per new W3C process (2014?) a group may iterate and produce a new CR without having to go back to the Director, that is, what used to require bouncing between LCWD and CR, now is just a matter of iterating in CR.

Regardless, worse than "extra time required" or "possible only with the Director's approval", if there are features in a spec which are known to either not have test cases, or not have test cases that test the functionality for which the features were added (in this case, the i18n requirements), or not have implementations that pass those test cases in a way that demonstrates interoperable user functionality from the i18n requirements, then those features MUST NOT advance to PR, whether or not explicitly noted in CR exit requirements. My suggestion above was more to be explicit about it in the spec rather than having it be implied.

If untested or unimplemented or uninteroperable features (these properties) were explicitly at-risk, the group may drop them to help transition to PR. Otherwise untested/unimplemented/uninteroperable features (especially a novel approach as documented) must block a CR from transitioning to PR.

halindrome · 2016-08-07T14:28:48Z

@tantek you are not wrong. But let's not put too much importance on this "feature". It isn't a "feature" in the classic sense of the word. These are optional properties in a data model that might be present in content. There are no requirements that they be present, nor that they be interpreted if they are present. It's just advice. There are LOTS of such properties in this data model. The presence of absence of them in annotations generated by clients is something we will evaluate in the test cases. If they are present, we will test the values to ensure they conform to the requirements of the spec. But that doesn't really mean anything. At least, that's my interpretation.

iherman · 2016-08-07T14:35:40Z

@tantek, just to put the admin issue at rest:

6.4.1 Revising a Candidate Recommendation

If there are any substantive changes made to a Candidate Recommendation other than to remove features explicitly identified as "at risk", the Working Group must obtain the Director's approval to publish a revision of a Candidate Recommendation. This is because substantive changes will generally require a new Exclusion Opportunity per section 4 of the W3C Patent Policy [PUB33]. Note that approval is expected to be fairly simple compared to getting approval for a transition from Working Draft to Candidate Recommendation.

In addition the Working Group:

• must show that the revised specification meets all Working Group requirements, or explain why the requirements have changed or been deferred,
• must specify the deadline for further comments, which must be at least four weeks after publication, and should be longer for complex documents,
• must document the changes since the previous Candidate Recommendation,
• must show that the proposed changes have received wide review, and
• may identify features in the document as "at risk". These features may be removed before advancement to Proposed Recommendation without a requirement to publish a new Candidate Recommendation.

The Director MUST announce the publication of a revised Candidate Recommendation to other W3C groups and the Public.

See https://www.w3.org/2015/Process-20150901/#revised-cr

As the text does say that the Director's approval is probably quicker than for the first round, but we cannot just re-issue a document without further ado. We need to get approval, republish, and all that jazz.

(I have also checked the 2016 version of the process and, as far as I can see, there is no difference.)

Anyway. We should not get bogged down in admin issues, but we should also avoid overcomplicating our lives.

azaroth42 · 2016-08-07T17:20:43Z

I assume

"The Working Group [...] MAY identify features in the document as "at risk" ... without a requirement to publish a new Candidate Recommendation."

is before the initial CR rather than as a silent change to an existing document in CR. The wording probably should have been:

... MAY remove features previously marked "at risk" without ...

(metaspecifications are always fun! :) )

With that reading, we can't just mark them at risk whenever we want (which would have been useful, in this case). We have the testing process in hand, given that (a) the purpose is to verify that it has been implemented, not to validate the implementations and (b) they're optional parts of a feature, not individual features themselves.

I think all of the concerns have been addressed, and it's okay to close the issue?

iherman · 2016-08-08T07:06:59Z

On 7 Aug 2016, at 19:20, Rob Sanderson notifications@github.com wrote:

I assume

"The Working Group [...] MAY identify features in the document as "at risk" ... without a requirement to publish a new Candidate Recommendation."

is before the initial CR rather than as a silent change to an existing document in CR. The wording probably should have been:

... MAY remove features previously marked "at risk" without ...

(metaspecifications are always fun! :) )

My reading is that the group may identify (possibly new) "at risk" features before re-issuing a CR that this section is dealing with. Which is identical to what the group is allowed to do before issuing the original CR.

Removing 'at risk' features happen when going to PR, and is the natural possible action with or without a new CR publication.

With that reading, we can't just mark them at risk whenever we want (which would have been useful, in this case).

That is my reading, too. We would need a bona fide CR republication with the Director's approval.
We have the testing process in hand, given that (a) the purpose is to verify that it has been implemented, not to validate the implementations and (b) they're optional parts of a feature, not individual features themselves.

I think all of the concerns have been addressed, and it's okay to close the issue?

gsergiu · 2016-08-08T07:43:37Z

@halindrome

@tantek you are not wrong. But let's not put too much importance on this "feature". It isn't a "feature" in the classic sense of the word. These are optional properties in a data model that might be present in content. There are no requirements that they be present, nor that they be interpreted if they are present. It's just advice. There are LOTS of such properties in this data model. The presence of absence of them in annotations generated by clients is something we will evaluate in the test cases. If they are present, we will test the values to ensure they conform to the requirements of the spec. But that doesn't really mean anything. At least, that's my interpretation.

Thank you for the good hint ... We have a feature that is not a feature according to the text above.
And you are write. ... we have a feature that is not documented trough a usecase, if you don't have a usecase defined, you cannot have a test case! No Usecase, No Testcase -> implies no feature!!!

What speaks against moving these two "non-features" to the anex with the extensions, which is not normative?

I do support @tantek 's point of view:

If untested or unimplemented or uninteroperable features (these properties) were explicitly at-risk, the group may drop them to help transition to PR. Otherwise untested/unimplemented/uninteroperable features (especially a novel approach as documented) must block a CR from transitioning to PR.

gsergiu · 2016-08-08T09:00:25Z

@halindrome this is not completely true:

There are no requirements that they be present, nor that they be interpreted if they are present.

Yes, these fields are optional, but their meaning has to be interpreted and used by "text processors". By taking in account that ... the indexing of annotations is already a text processing process, I expect that the most of the annotations systems will involve text processing, and consequently they should use the "processingLanguage" in order to be fully compliant with the standard!

gsergiu · 2016-08-08T09:13:14Z

@azaroth42

I think all of the concerns have been addressed, and it's okay to close the issue?

As I documented also in the related issue#337 (comment), there is a big discrepancy between what is discussed in the related tickets and presented usecases, and what we can find in the current version of the draft.
Also .. I see a lot of complains, which are rejected as "nothing new" .. even if I don't find an explicit resolution of the issue! (e.g. the fileds will be not removed ... becasue usecase X and Y need it. see ... example m&n in the draft). I see only resolutions with won't fix, becasue it was discussed in ticket abc in github. I though that the goals is to improve the text of the standard, and not the text of the github issues. Am I wrong in my assumption?

r12a · 2016-08-09T11:02:52Z

However, we fully acknowledge that the i18n group are the experts in this matter. If @r12a @fsasaki @aphillips would please weigh in to clarify, we're happy to go with whatever those recommendations are. We've tagged this as wontfix but not closed it, anticipating that the reaction will be that things haven't changed since the previous discussion and we should continue to include them.

Addison and i are investigating direction questions with the Activity Streams folks at the moment, and on Thursday we are due to meet with them to discuss language. We'd like to work through these topics carefully with them before determining whether there are implications for Web Annotations. So please continue with your ongoing CR work for now and look out for more information from us shortly.

BigBlueHat · 2016-08-24T18:06:35Z

Web App Manifest contains both dir and lang properties which govern it's inline strings. Seemed worth noting as prior art and perhaps as additional reference to similar thinking.

kevinmarks · 2016-08-24T18:56:30Z

Activity Streams went the other way on dir, documenting how to include bidi signalling in the text itself, which is more robust: http://w3c.github.io/activitystreams/core/#biditext

r12a · 2016-08-26T19:25:10Z

@kevinmarks there are still issues, however, with signalling in the text itself. It's not an easy problem to solve. See my notes at http://w3c.github.io/i18n-discuss/notes/json-bidi.html

gsergiu · 2016-08-29T08:57:10Z

@kevinmarks @r12a
Well ... we might need a solution that integrates both approaches... because we have very different situations for embedded texts and external resources!

Yes, I do agree that signalizing the language for the each part of the text is the best solution; therefore this should be the preferred one. The open question is ... what is better to split the body in multiple bodies? to use embedded html? or to build a new json construct that fits better our purpose (... multiple bodies and embedded html are quite verbose or ugly for a json based API )
but ... keep in mind that the external resources are not under our control ... so we cannot change them, we have to leave with their "not internationalized"/legacy representation. So .. we cannot inline language and text direction, nor can we split the resource (i.e. but we can use selectors where needed)!

So:
We can use the approach 1. only for the text embedded in the annotations, meaning here Textual Body and TextSelectors (where applicable) and internationalized resources.
@BigBlueHat if I understood it correctly.. you are referring also to TextualBody (inline strings)

However, the big problem that originated this set of issues is the not-internationalized external resources. And none of the discussed solutions address this problem!
I have to reference again the wiki entry that documents this issues:
http://w3c.github.io/i18n-discuss/notes/annotation-language-use-cases

Unfortunately the definition of the fields, was changed too much ... and doesn't reflect the initial purpose of these fields, ... (and tryies to claim more that these fields can actually do)
According to http://w3c.github.io/i18n-discuss/notes/annotation-language-use-cases
ProcessingLanugage and TextDirection have only the meaning of specifying a default language and text direction for external resources (bodies or targets) that doesn't specify a language, doesn't specifies it appropriately (i.e. text direction can be automatically derived from the script tag) or it specifies more languages.

azaroth42 · 2016-10-27T15:29:18Z

Closing, as this has been split into other more specific issues.

azaroth42 added the model label Aug 2, 2016

iherman added this to the V1 PR milestone Aug 2, 2016

r12a added the i18n-review label Aug 2, 2016

gsergiu mentioned this issue Aug 5, 2016

Cardinality of the processingLanguage? #337

Closed

azaroth42 added the wontfix label Aug 5, 2016

r12a mentioned this issue Aug 8, 2016

The textDirection and processingLanguage properties are not needed #335 w3c/i18n-activity#198

Closed

This was referenced Aug 9, 2016

Processing language for mono-lingual and internationalized resources #342

Closed

Relationship between dc:language and processing language for multilingual resources #343

Closed

Processing language for multilingual resources #341

Open

iherman mentioned this issue Aug 19, 2016

ed sugg: textDirection #348

Closed

BigBlueHat mentioned this issue Aug 24, 2016

It should be possible to indicate the default base direction for natural language values w3c/activitystreams#336

Closed

5 tasks

azaroth42 closed this as completed Oct 27, 2016

js-choi mentioned this issue Nov 11, 2017

Add text language / direction attributes w3c/web-share#6

Closed

plehegar added i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. and removed i18n-review labels Mar 11, 2021

The textDirection and processingLanguage properties are not needed #335

The textDirection and processingLanguage properties are not needed #335

Comments

kevinmarks commented Aug 2, 2016

halindrome commented Aug 2, 2016

azaroth42 commented Aug 2, 2016

kevinmarks commented Aug 2, 2016

tantek commented Aug 2, 2016

aaronpk commented Aug 2, 2016

azaroth42 commented Aug 2, 2016

halindrome commented Aug 2, 2016

r12a commented Aug 2, 2016

aphillips commented Aug 2, 2016

gsergiu commented Aug 3, 2016

BigBlueHat commented Aug 3, 2016

gsergiu commented Aug 3, 2016 • edited Loading

azaroth42 commented Aug 3, 2016

jjett commented Aug 3, 2016

aaronpk commented Aug 3, 2016

gsergiu commented Aug 3, 2016 • edited Loading

azaroth42 commented Aug 3, 2016

tantek commented Aug 3, 2016

gsergiu commented Aug 4, 2016

gsergiu commented Aug 4, 2016

BigBlueHat commented Aug 4, 2016

gsergiu commented Aug 4, 2016

azaroth42 commented Aug 4, 2016

iherman commented Aug 4, 2016

tantek commented Aug 4, 2016

aaronpk commented Aug 4, 2016

azaroth42 commented Aug 4, 2016

gsergiu commented Aug 5, 2016 • edited Loading

azaroth42 commented Aug 5, 2016

tantek commented Aug 5, 2016

iherman commented Aug 6, 2016

tantek commented Aug 7, 2016

halindrome commented Aug 7, 2016

iherman commented Aug 7, 2016

azaroth42 commented Aug 7, 2016

iherman commented Aug 8, 2016

gsergiu commented Aug 8, 2016 • edited Loading

gsergiu commented Aug 8, 2016 • edited Loading

gsergiu commented Aug 8, 2016 • edited Loading

r12a commented Aug 9, 2016

BigBlueHat commented Aug 24, 2016

kevinmarks commented Aug 24, 2016

r12a commented Aug 26, 2016

gsergiu commented Aug 29, 2016 • edited Loading

azaroth42 commented Oct 27, 2016

gsergiu commented Aug 3, 2016 •

edited

Loading

gsergiu commented Aug 3, 2016 •

edited

Loading

gsergiu commented Aug 5, 2016 •

edited

Loading

gsergiu commented Aug 8, 2016 •

edited

Loading

gsergiu commented Aug 8, 2016 •

edited

Loading

gsergiu commented Aug 8, 2016 •

edited

Loading

gsergiu commented Aug 29, 2016 •

edited

Loading