-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommend that authors don't use @language
in @context
#1264
Comments
As noted in #1252, I think a lot of specifications just say that "only fields But... if this is not an appropriate mechanism to use, better to say that in your I18N Considerations. |
Yes, in general using
The issue is that we're dealing w/ a subset of JSON-LD in Verifiable Credentials because there is a subset of the group that doesn't want to do full JSON-LD processing, but rather a much simpler version of it: https://w3c.github.io/vc-data-model/#json-processing ... thus, we're not at liberty to use JSON-LD features that appear in the One option here is to alias |
First off, thank you so much for putting the effort into #1252 and taking our comments seriously. I really appreciate it. The conversation about Is it possible to include some sort of language like:
(Or, perhaps, something more like "JSON processors will not, in general, associate these values with name and direction (or any other) fields in the document") Or should I be pushing for |
The issue was discussed in a meeting on 2023-09-06
View the transcript4.2. Recommend that authors don't use
|
It seems to me that I18n's own words — "Specifications SHOULD NOT specify or require the use of language metadata for fields that cannot contain natural language text" — effectively say, "don't set |
@TallTed That's probably reading too much into the best practice, since the point of that one is to not get carried away with defining field-level language and direction metadata for what we call "syntactic content" (non-language strings). I agree that blindly processing |
PR #1271 has been raised to partially address this issue. What PR #1271 doesn't do is provide guidance on what people should do outside of the strong guidance we give here: https://w3c.github.io/vc-data-model/#language-and-base-direction I still think the guidance we have in the spec today is the best guidance to give, but @aphillips is noting that i18n WG might not think so (or be concerned about the number of potential human-readable strings that will have an "undefined" language). So, we should exhaust the solution space to ensure that we've considered all the options. I'll do that in the next comment in the thread. |
+CC: @seabass-labrax @aphillips @iherman @pchampin @gkellogg @dlongley Ok, let's try to document all of the options available to us in the VC specification regarding expressing i18n information on human-readable text fields. Let's presume the following VC example: {
"@context": ["https://www.w3.org/ns/credentials/v2", "https://language.example/contexts/v1"],
"type": ["VerifiableCredential", "MyHumanReadableCredential"],
"credentialSubject": {
"myHumanReadableProperty": "This is some human-readable text."
}
} Option A: Scalpel - Define the field as multilingualIf one were to follow the guidance we provide today in https://w3c.github.io/vc-data-model/#language-and-base-direction , then the second context would define that {
"@context": ["https://www.w3.org/ns/credentials/v2", "https://language.example/contexts/v1"],
"type": ["VerifiableCredential", "MyHumanReadableCredential"],
"credentialSubject": {
"myHumanReadableProperty": [{
"value": "This is some human-readable text.",
"lang": "en"
}, {
"value": "هذا بعض النص الذي يمكن قراءته بواسطة الإنسان.",
"lang": "ar",
"dir": "rtl"
}]
} It would also allow for the string to be expressed with an undefined language, like so: {
"@context": ["https://www.w3.org/ns/credentials/v2", "https://language.example/contexts/v1"],
"type": ["VerifiableCredential", "MyHumanReadableCredential"],
"credentialSubject": {
"myHumanReadableProperty": "This is some human-readable text."
}
} I think it might be the above that is giving @aphillips and the i18n WG some concerns. Option B: Sledgehammer - Use
|
If there is the potential for someone to add This is like an alternative to your Options A/C, but to define fields as being non-lingual. As suggested, if you haven't already done so, you can provide global term definitions See, for example Example 70 in JSON-LD 1.1 for setting A last note regarding |
@gkellogg wrote
Which makes option C more palatable. After all, we do not use |
My proposal is to allow all of A/B/C, put some explanation text in there to show what the pros and cons are, and let it be. |
The A/B/C options look great to me as well. However, I don't believe them to be "for JSON-LD processors only." They're equally as findable/traversable using I don't believe option A (scalpel) and C (JSON-LD) to be any more different than the preceding From a JSON-LD processing perspective, the processing would be identical and have the same result (thanks to the aliasing in the context files). Processing as "just JSON" is literally just the difference of looking for I can see any of A/B/C working, and would be happy to help expand the JSON Processing section to describe what's needed at that level. |
I am not a particular fan of the aliases myself, but we should be consistent. We removed the |
Agreed on the need for consistency. I prefer to keep the layers distinct, but that's a personal preference. However, these "language value objects" are unique animals--with or without the alias. We'd need to explain in the JSON Processing section that no other keys should appear in a language value object...or we'll end up with confused people putting new properties along side the However we define it, we need to make sure there's conformance and explanatory for JSON Processors since JSON doesn't have this capability natively as JSON-LD does. The |
I like @BigBlueHat's arguments here for not aliasing the language and related keywords. |
Isn't there a danger of misusing Your arguments are absolutely correct, and these are the reasons why I personally prefer keeping the |
Firstly, @msporny, thank you very much for showing multiple choices in a readable way. Option A/B/C looks good, and I agree with the discussions that the processing of option D is almost equivalent to other options. Also, I'm curious about how multilingual construction can be with option D. |
I added an option E -- using language translation files (to the original post above). |
I want to ensure that what I18N is looking for is clear: Any solution must provide:
Solutions may provide:
If document level defaults are provided:
There should also be a way to supply multiple languages within a document or to supply multiple additional languages (besides the default language). I believe this is already addressed. Option A satisfies these requirements, but at the cost of being verbose. As a callout, Webapp Manifest is addressing their similar use case with some design changes (after our meeting with them this week). One of these looks like a language map and might serve as an "option F". In this design there is a default language/direction (like option D) which applies to all NLS fields. NLS fields can optionally have "defaultLang": "en-US",
"defaultDir": "ltr",
...
"name": "I am in English",
"description": [ "value": "I am in British English", "lang": "en-GB"],
...
"name": [ "value": "استخدم في Bahrain مصر Kuwait!", "dir":"rtl"] Multilingual is also an option that uses a different structure from what you currently have and which looks like this: "name": [
"en": [ "value": "some name", "dir": "ltr", "lang": "en-US"],
"fr": [ "value": "some name in french" ],
"ar": [ "value": "some name in arabic", "dir": "rtl" ],
... etc...
], Note that I haven't had a chance to track down the reference to this but will add it here later. I think it would be good if we can align best practices across multiple (even totally unrelated) formats, but don't insist that you do. |
The issue was discussed in a meeting on 2023-09-14
View the transcript2.7. Internationalization Review for VCDM 2.0 (issue vc-data-model#1155)See github issue vc-data-model#1155. Manu Sporny: Let's clarify that normatie statements for use cases docment is requirements on VC Data Model. Sebastian Crane: this is a traffic issue so its convenient to keep it open. See github issue vc-data-model#1264. Manu Sporny: i agree with Sebastian. The other data point is that conversation started here, but moved to 1264.
Manu Sporny: the way we sorted the issues we have a PR, but it doesn't address all the language options. Brent Zundel: if we need to talk about this in this phase, we should have it now. Manu Sporny: thanks. The internationalization group asked us to specify how a default language for a document is specified.
Manu Sporny: They came back and said that he felt uncomfortable because we were not specifying a default language for the VC.
Brent Zundel: I don't know that we can avoid talking about the options. Sebastian Crane: thank you manu for creating the issue with the clear options. Manu Sporny: I can go through the options ... Brent Zundel: since Sebastian thinks C might be a winner, let's start there. Manu Sporny: Option C uses JSON-LD language features. Ivan Herman: I have a comment on the tech. but a practical comment first: Addison is around, so let's try to talk to him. Brent Zundel: That's right. We can take advantage of TPAC. Ivan Herman: on the technical side, the title is misleading because the JSON-LD language features are more than what is in option C.
Ivan Herman: So we should not cherry pick. Sebastian Crane: I'd like to agree. Two things: we are providing the option to do language support correctly. We can't force it, but we can enable it. Manu Sporny: this comes about because some implementers look at the @sign and freak out.
Manu Sporny: so if no one is complaining about that, we can just adopt it. Brent Zundel: If we were to proceed as mentioned, if we go with those Ivan Herman: JSON-LD scares the hell out of people sometimes because they are nervous about RDF. I have to emphasize what is in JSON-LD for the language has nothing to do with RDF. Dmitri Zagidulin: re: alias. I'd argue we have a better way than flipping a coin. We know there is signifcant pushback on @s. Manu Sporny: two things. If we decide to use JSON-LD keywords, we'll have to change the way name & description work.
Manu Sporny: I do agree with dmitriz that there is an allergic reaction to seeing Ivan Herman: I forgot to react to Manu about setting global language. From the JSON-LD point of view, that's not really a problem. Because we can specific that language doesn't apply for the datatype in the range of that specific property.
Phil Archer: I agree with Manu's comments, it's dangerous. but my participation here is minimal, so I understand.
Phil Archer: using a common word like value to mean something that other people don't use it for.
Brent Zundel: clarification the proposal is for lang_value and lang_direction, not "value".
Phil Archer: Ah... that's much better.
Sebastian Crane: The aliasing of Manu Sporny: ivan said some stuff I didn't understand. I'd like to.
Manu Sporny: If we alias to lang_string, lang_direction, then that's probably fine.
Manu Sporny: Also, what Phil said: there's a lot of data out there that already uses value and it would trigger a lot of confusion. Brent Zundel: we can keep going it. Sebastian Crane: I'm happy to help with PR.
Dmitri Zagidulin: to clarify, if we alias value to something else. It's the other direction. Aliasing from lang_value to @langauge, but you can still use @language elsewhere. Manu Sporny: if you compact using the VC context and you have @value throughout your JSON-LD, all those @value will be changed. Ivan Herman: if you make the alias an embedded context for a property.
Ivan Herman: so for the other properties, you can make it null. Manu Sporny: we need a deeper conversation and I don't know if we can get through this in 20 minutes.
Brent Zundel: this is a before CR issue. Is that because there will be normative changes to the spec?
Manu Sporny: yes. this has normative impact. Brent Zundel: so we will add time for this during TPAC.
this: https://github.com/w3c/vc-data-model/blob/main/contexts/credentials/v2#L122-L127.
|
Here's a big reason we shouldn't alias Regardless, other aliases could be considered, but more names for things do increase the risk of collisions--and the introduction of various case approaches ( |
Good catch.
As for removing the |
Another argument against aliasing The aliasing around |
The issue was discussed in a meeting on 2023-09-26
View the transcript2.6. Internationalization Review for VCDM 2.0 (issue vc-data-model#1155)See github issue vc-data-model#1155. Kristina Yasuda: This is an internationalization one.
See github issue vc-data-model#1264. Manu Sporny: Yeah, this is also tied in with issue 1264. There kinda/sorta duplicates of one another.
Manu Sporny: Let me link to Addison's response in IRC. Sebastian Crane: A few weeks ago, we had a call and I proposed a resolution, we didn't get to voting on that. The initial reception was unanimous reception within this WG, so I think the only thing to do is get the i18n people involved. Kristina Yasuda: Thanks. Quick question -- how is not using Manu Sporny: They are not aligned. Sebastian Crane: I would like to expand on that, I'm not a member of i18n WG at the moment. There's a technical reason not to do global language but there's also a reason that it's philosophical reason that it's not good, "you can enter" is the same meaning no matter what language you say it in.
Dave Longley: +1 to seabass.
Kristina Yasuda: Ok, I will reach out to set up a meeting with i18n. |
The issue was discussed in a meeting on 2023-10-03
View the transcript1. Internationalization WG review.Kristina Yasuda: Special meeting due to feedback on Internationalization. Kristina Yasuda: Please introduce yourself Addison. See github issue vc-data-model#1155. See github issue vc-data-model#1264. See github pull request vc-data-model#1271. Addison Phillips: I'm the chair of the I18N group at the W3C.
Manu Sporny: The background of this: we've had guidance about supporting internationalization, with a design pattern for people to follow. In the 1.0 and 1.1 work, we haven't seen much adoption of the I18N features.
Manu Sporny: We had a number of options to consider for how to do it in 2.0. Addison Phillips: I had read through the summary of the discussion and will summarise here to ensure that we have a common understanding.
Addison Phillips: I think it sounds like being somewhat in the Linked Data world as well as a more general specification produces some complications. Manu Sporny: On the topic of being both in the LD world and not, it seems like a subset of our community are less likely to adopt the specification when LD features are added.
Manu Sporny: As for Option E (using a translation file), I think you mentioned that you were against it, and there seems to be agreement within this WG. Addison Phillips: I would not object to translation files per se, but I would point out technical complications about multiple requests and resources. I think that doesn't sound like the right pattern for credentials. Manu Sporny: Option E can eliminated then! Addison Phillips: It is common for us to recommend that specifications do this. It would be better if there were generic mechanisms, but specification-specific fields are OK.
Kristina Yasuda: Are there any objections to eliminating option E?
Addison Phillips: I would suggest that you could keep it for a 'backup'. Andres Uribe: I think that's the default if we can't get consensus of anything else. Sebastian Crane: I wanted to mention option E with the translation files, sometimes they look really good on paper, but in practice, lots of complications.
Ivan Herman: Option D means having two properties: language and text direction. We need both on the default level.
Addison Phillips: Indeed, I agree we would need to have this.
Manu Sporny: We would express the value of the string, the language, and the text direction. I believe this meets the requirement that addison illustrated.
Manu Sporny: We've mainly been discussing whether to alias the "@x" terms. Sebastian proposed options C on two occasions, and there were no objections raised. I believe it addresses all concerns except for the ability to specify a global default.
Manu Sporny: It's not easy to test this, as multiple languages are optional.
Addison Phillips: I'm concerned that whilst the 'SHOULD' and 'MAY' are good supports for internationalisation, there will still be completely unlabeled strings. Manu Sporny: Can we ask if the group is OK exploring option C further? Kristina Yasuda: If anyone is strongly in objection to option C, please speak now. Sebastian Crane: To repeat for addisons' benefit -- VCs don't inherently have a language... language on field such as name/description is for human holder of VC in a wallet application. When you have the RDF world, link things together based on ontological truth, actual meaning...
Addison Phillips: Yes, important observation, locale-neutral data... when people talk about these things, name/description is how humans interact... can't look at other things and talk meaningfully about them... credential has BS of science -- those names/descriptions are of natural language pieces... want natural language to be associated with those parts, not other data. David Chadwick: I think MUST is fine, but not sufficient. Let's say you have a degree from a Japanese university and has language metadata, that degree credential is still not readable by a typical English person. Andres Uribe: I'm definitely supportive of Option C. In addition, I would like to see aliasing. I don't really understand why aliasing will cause problems with JSON-LD, so I would appreciate an explanation here.
Manu Sporny: The short answer is that
Manu Sporny: For that reason, I would be strongly opposed to aliasing if option C can be sufficient. It is only a single character difference.
Andres Uribe: Thank you, that answered my question. Manu Sporny: We would end up getting our alias appearing in unexpected places. Kristina Yasuda: I would like to ensure that other options are considered as well and we are running out of time.
Kristina Yasuda: Let's discuss option B and A.
Manu Sporny: The suggestion I'm hearing is to remove option A. It just allows us to use 'prettier' values, but doesn't have any advantage.
Manu Sporny: Option B provides a document-level default. The issue that we would need to flag is non-natural-language fields being classed as a specific language, such as Base64 data being marked as natural language. Addison Phillips: You can't prevent people from serialising
Sebastian Crane: I'd like to talk about Option C only.
Ivan Herman: We could argue the same thing about HTML, as few people write HTML by hand. What's the proportion of tools that produce linguistically undefined documents? Perhaps addison knows.
Ivan Herman: Maybe putting the language metadata in all fields is a bit naive.
Addison Phillips: To Sebastian's point, if you say that it's a MUST and all the libraries implement it, maybe it would be moot. I'm not sure that you could expect that response without a MUST. Kristina Yasuda: can everyone put your favorite option on IRC.
Dmitri Zagidulin: can we remind people of the difference between D and B? Manu Sporny: D creates a new feature, B uses an existing JSON-LD feature.
Ivan Herman: dlongley said that every field will get a language tag with B. However, if we had LD tags for datatypes, that won't be an issue.
Ivan Herman: It's not as bad as it looks considering the existence of those JSON-LD datatypes.
Dave Longley: I agree with you that we should use datatypes, but the JOSE and COSE parts do not have data types defined.
Sebastian Crane: It is a bit involved, I'll write to the mailing list, can we delay a vote for day or two to engage w/ email. Kristina Yasuda: We appreciate Addison's time. Thank you! |
In order to maximize interoperability when doing non-JSON-LD processing, we should suggest that authors don't set
@language
in@context
(which would set the default language for /all text fields/ in the VC). This would be problematic as some text fields in a VC carry base-encoded information and shouldn't have a language applied to it. Furthermore, processors that don't do JSON-LD Processing would be unaware of the default language being set in@context
.The text was updated successfully, but these errors were encountered: