Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommend that authors don't use @language in @context #1264

Closed
msporny opened this issue Aug 29, 2023 · 26 comments
Closed

Recommend that authors don't use @language in @context #1264

msporny opened this issue Aug 29, 2023 · 26 comments
Assignees
Labels
before-CR blocked i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. pr exists

Comments

@msporny
Copy link
Member

msporny commented Aug 29, 2023

In order to maximize interoperability when doing non-JSON-LD processing, we should suggest that authors don't set @language in @context (which would set the default language for /all text fields/ in the VC). This would be problematic as some text fields in a VC carry base-encoded information and shouldn't have a language applied to it. Furthermore, processors that don't do JSON-LD Processing would be unaware of the default language being set in @context.

@msporny msporny added post-CR i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. before-CR and removed post-CR labels Aug 29, 2023
@msporny msporny self-assigned this Aug 29, 2023
@aphillips
Copy link
Contributor

As noted in #1252, I think a lot of specifications just say that "only fields name and description contain natural language text and thus @language and @direction only apply to these fields". Specifying @language in the @context saves having to repeat the language metadata for every name and description when the file is in a single language (and makes the file a bit self-documenting regarding the intended language audience).

But... if this is not an appropriate mechanism to use, better to say that in your I18N Considerations.

@msporny msporny added the ready for PR This issue is ready for a Pull Request to be created to resolve it label Aug 29, 2023
@msporny
Copy link
Member Author

msporny commented Aug 31, 2023

As noted in #1252, I think a lot of specifications just say that "only fields name and description contain natural language text and thus @language and @direction only apply to these fields". Specifying @language in the @context saves having to repeat the language metadata for every name and description when the file is in a single language (and makes the file a bit self-documenting regarding the intended language audience).

Yes, in general using @language to set the default language for a general JSON-LD document is expected to be fine in many use cases. However...

But... if this is not an appropriate mechanism to use, better to say that in your I18N Considerations.

The issue is that we're dealing w/ a subset of JSON-LD in Verifiable Credentials because there is a subset of the group that doesn't want to do full JSON-LD processing, but rather a much simpler version of it:

https://w3c.github.io/vc-data-model/#json-processing

... thus, we're not at liberty to use JSON-LD features that appear in the @context as much as we'd like. In addition to that, VCs tend to carry base-encoded information, and because of that, using a blanket @language declaration would end up adding language to base-encoded strings that shouldn't have that language attached to it.

One option here is to alias @value, @language, and @direction globally, but then we might be invoking something we don't want to invoke. I'll have to do more analysis on that to see if it's even feasible (I expect that it's not).

@aphillips
Copy link
Contributor

aphillips commented Aug 31, 2023

First off, thank you so much for putting the effort into #1252 and taking our comments seriously. I really appreciate it.

The conversation about @language feels very "baby gone with the bathwater"? The @context was supposed to provide a means of getting document-default language and direction information into JSON-LD documents to avoid the need to serialize it on all of the natural language fields in a document. But if we can't use that mechanism, then the natural language fields have no language and direction and thus will incur processing and display problems later. The ability to use compound literals (as an option) still leaves us with a lot of monolingual documents in an unknown language floating about the Web.

Is it possible to include some sort of language like:

\@language/\@direction in the \@context only apply to fields name and direction that do not otherwise include language or direction metadata. When processing credentials using a JSON processor, these values do not have to be associated with all of the values in the document, so long as they are not removed from the document as a whole.

(Or, perhaps, something more like "JSON processors will not, in general, associate these values with name and direction (or any other) fields in the document")

Or should I be pushing for compoundLiteral on name and direction?

@iherman
Copy link
Member

iherman commented Sep 6, 2023

The issue was discussed in a meeting on 2023-09-06

  • no resolutions were taken
View the transcript

4.2. Recommend that authors don't use @language in @context (issue vc-data-model#1264)

See github issue vc-data-model#1264.

Manu Sporny: in w3c i18n review, github user aphillips who is in this issue has been trying to align our spec with recent w3c i18n guidance.
… specifically around @language annotations in credentials and documents.
… which I don't think many implementers have been following, at great peril for i18nal machine-readability.
… I don't think there's a standard way to use this feature in standard-tooling JSON, and I don't think many implementers are using it in JSON-LD.
… so the concern here is that cross-processing as outlined in the spec (reading JSON-LD as JSON) could be hindered with little tradeoff if no one's using it.
… so my questions are 1.) would the group object to describing this as a recommended way of doing i18n.
… and also 2.) would the group object to that description warning implementers about the base64 footgun this presents to cross-processing.

Orie Steele: thanks for that explanation. do i understand correctly that this is being removed in v2?
… i would prefer we took a strong position that we are discouraging it if no one implemented it after we recommended it in v1.
… i think it's worth mentioning that it was in v1 and not in v2, if not also an explanation of why.
… ideally something that makes the recommendation simple, because too much detail can overwhelm confused implementers.

Sebastian Crane: I would actually argue that i18n is very important here, and I think putting the @language inline might confuse JSON parsers less than annotating from @Context.

Orie Steele: +1 to being clear... lets not give many options, lets give a single clear recommendation.

Manu Sporny: that is an option, but Orie's point on a clear and rationale default recommendation should be taken seriously here.
… and I wouldn't characterize this as having recommended A in v1 and recommending not-A now, it's more like I think we have insufficient feedback from implementers.
… and I would appreciate feedback.
… since I feel like the v1 text was a compromise solution without a clear enough recommendation or rationale.
… which would take some time to create in v2, i.e. writing up multiple options and considering their pros and cons.

Orie Steele: wrong ~= we recommended something that people are not using.

Ted Thibodeau Jr.: 1. record erratum (really, bug) on v1; 2. explain in v2 why changed from v1, why NOT to do that thing we thought was a good idea before we had experience to date; 3. push I18n WG to produce some kind of whitepaper for all SDOs to refer to (because it seems like non-W3C orgs may not be doing such a great job with I18n, and everybody needs every SDO to do better with I18n).

Orie Steele: i'd be ok outlining many paths forward, and then picking one.

Orie Steele: we should say UTS-46 / WHATWG URL for i18n.

Sebastian Crane: I am more bullish on timeline because I do not think we have that many options to weigh or people defending each.
… would consensus on this call be possible?

Brent Zundel: sebastian has a resolution proposal.
… call for refinements or changes before they go into the minutes?

Joe Andrieu: I would change support to specify or include -- an additional/supplemental context file should be able to include it or apply it.

Sebastian Crane: +1 that is.

Ivan Herman: nit: not just language but language directionality was mentioned in the issue, and not here in the proposal.

Joe Andrieu: +1 to that advice from Orie.

Manu Sporny: yes, +1 to what Orie is saying.

Sebastian Crane: +1.

Orie Steele: I would have expected the recommendation to be bipartite: not in core context, and how/if/whether to use in additional contexts.

Manu Sporny: @language is a sledgehammer that shouldn't be used for VCs.

Manu Sporny: (when expressed in the base context).

Orie Steele: or, as I thought i heard, that the recommendation was not to use it in either.

Sebastian Crane: +1.

Brent Zundel: dlongley and orie have variations in the irc.

Juan Caballero: .. that overlap considerably.

Manu Sporny: https://w3c.github.io/vc-data-model/#language-and-base-direction.

Ted Thibodeau Jr.: this is so dangerous to decide without LOTS of evidence. see https://www.w3.org/International/articles/strings-and-bidi/.

Manu Sporny: I think we need more time, there is already guidance in the link i just shared, and what orie is calling the second part is what's missing.

Orie Steele: I'd be good with dlongley's proposal.

Manu Sporny: and i worry that even that would not fulfill the i18n peoples' request.

Orie Steele: its an improvement over what we have now, its not sufficient.

Manu Sporny: yes ^.

Dave Longley: notes we could also require all encoded values use type multibase to enable language in the context, but that is very unlikely to get consensus.

Ted Thibodeau Jr.: also see https://www.w3.org/TR/string-meta/.

Joe Andrieu: I think it might be reductive to say it should never be used in @contexts, there are places even in VC were it might make sense.

@TallTed
Copy link
Member

TallTed commented Sep 6, 2023

It seems to me that I18n's own words — "Specifications SHOULD NOT specify or require the use of language metadata for fields that cannot contain natural language text" — effectively say, "don't set @language for entire @context"!

@aphillips
Copy link
Contributor

@TallTed That's probably reading too much into the best practice, since the point of that one is to not get carried away with defining field-level language and direction metadata for what we call "syntactic content" (non-language strings).

I agree that blindly processing @language (or @direction) in @context might result in all string data values acquiring language (or direction) metadata, including those that should not have such metadata. And while it's easy for humans to know that name and description are "special" in VC, "blind application" is the only option for most processors.

@msporny
Copy link
Member Author

msporny commented Sep 10, 2023

PR #1271 has been raised to partially address this issue. What PR #1271 doesn't do is provide guidance on what people should do outside of the strong guidance we give here: https://w3c.github.io/vc-data-model/#language-and-base-direction

I still think the guidance we have in the spec today is the best guidance to give, but @aphillips is noting that i18n WG might not think so (or be concerned about the number of potential human-readable strings that will have an "undefined" language). So, we should exhaust the solution space to ensure that we've considered all the options. I'll do that in the next comment in the thread.

@msporny
Copy link
Member Author

msporny commented Sep 10, 2023

+CC: @seabass-labrax @aphillips @iherman @pchampin @gkellogg @dlongley

Ok, let's try to document all of the options available to us in the VC specification regarding expressing i18n information on human-readable text fields. Let's presume the following VC example:

{
  "@context": ["https://www.w3.org/ns/credentials/v2", "https://language.example/contexts/v1"],
  "type": ["VerifiableCredential", "MyHumanReadableCredential"],
  "credentialSubject": {
    "myHumanReadableProperty": "This is some human-readable text."
  }
}

Option A: Scalpel - Define the field as multilingual

If one were to follow the guidance we provide today in https://w3c.github.io/vc-data-model/#language-and-base-direction , then the second context would define that myHumanReadableProperty is a multilingual field, which would allow for this sort of mark up:

{
  "@context": ["https://www.w3.org/ns/credentials/v2", "https://language.example/contexts/v1"],
  "type": ["VerifiableCredential", "MyHumanReadableCredential"],
  "credentialSubject": {
    "myHumanReadableProperty": [{
      "value": "This is some human-readable text.",
      "lang": "en"
  }, {
      "value": "هذا بعض النص الذي يمكن قراءته بواسطة الإنسان.",
      "lang": "ar",
      "dir": "rtl"
  }]
}

It would also allow for the string to be expressed with an undefined language, like so:

{
  "@context": ["https://www.w3.org/ns/credentials/v2", "https://language.example/contexts/v1"],
  "type": ["VerifiableCredential", "MyHumanReadableCredential"],
  "credentialSubject": {
    "myHumanReadableProperty": "This is some human-readable text."
  }
}

I think it might be the above that is giving @aphillips and the i18n WG some concerns.

Option B: Sledgehammer - Use @language in the @context field

If an ecosystem is going to do full JSON-LD processing, then the following is possible:

{
  "@context": [
    "https://www.w3.org/ns/credentials/v2", 
    "https://language.example/contexts/v1", 
    "@language": "en"
  ],
  "type": ["VerifiableCredential", "MyHumanReadableCredential"],
  "credentialSubject": {
    "myHumanReadableProperty": "This is some human-readable text."
    "someBinaryData": "data:image/png;base64,IFdvcmsbG8sxkIQSGV=="
  }
}

Note the usage of "@language": "en" in the @context property above. Taking this approach, while setting a base language for all fields in the document, will also set the language for the someBinaryData property (which is not the outcome we'd want). It also has the added downside that processors that don't do full JSON-LD processing won't pick up on the @language tag: https://w3c.github.io/vc-data-model/#json-processing

Option C: JSON-LD - Use JSON-LD language features

We could just use the native JSON-LD language features:

{
  "@context": ["https://www.w3.org/ns/credentials/v2", "https://language.example/contexts/v1"],
  "type": ["VerifiableCredential", "MyHumanReadableCredential"],
  "credentialSubject": {
    "myHumanReadableProperty": [{
      "@value": "This is some human-readable text.",
      "@language": "en"
  }, {
      "@value": "هذا بعض النص الذي يمكن قراءته بواسطة الإنسان.",
      "@language": "ar",
      "@direction": "rtl"
  }]
}

This is very close to option A, as it achieves the desired outcome for any human-readable field. However, there are multiple implementers in the Working Group that do not want to do full JSON-LD Processing and will argue that properties that have @ in the front of them don't look like idiomatic JSON to developers and will thus be rejected. While there is disagreement on that assertion, we can't force people to use features they don't want to use. So, our goal to date is to try to ensure that the contexts provided are as close to idiomatic JSON as possible.

Option D: credentialLanguage - Specify a new language property for VCs

{
  "@context": ["https://www.w3.org/ns/credentials/v2", "https://language.example/contexts/v1"],
  "type": ["VerifiableCredential", "MyHumanReadableCredential"],
  "credentialLanguage": "en",
  "credentialSubject": {
    "myHumanReadableProperty": "This is some human-readable text."
  }
}

We could define a new credentialLanguage property, which would only work for VCs. This approach would address being able to set the default language for the credential such that an application could perform an informed guess regarding any text string it believes to be human readable. The downside here is that we can't express text direction (unless we also define a languageDirection property) and that this solution would be specific to VCs.

Option E: Translation File - Point to a language translation file

{
  "@context": ["https://www.w3.org/ns/credentials/v2", "https://language.example/contexts/v1"],
  "type": ["VerifiableCredential", "MyHumanReadableCredential"],
  "languageTranslationFile": {
    "id": "https://language.example/translations/mhrc.po",
    "digestMultibase": "uIG9mIHRoZSBVLlMuIEdvdmVybm"
  },
  "credentialSubject": {
    "myHumanReadableProperty": "This is some human-readable text."
  }
}

We could utilize the language translation file approach that is broadly used/deployed via GNU gettext. The file would contain translations for strings contained in the VC and could be secured using a hash of the external content. This has the benefit of using a system that has been in use for many decades. The downside would be all of the well known downsides of language translation files. For example, one issue is that since the translations are external to the VC, some translation values might drift over time, leading to translation failures.


Those are the alternatives I can think of... can anyone think of any other options we could evaluate?

@gkellogg
Copy link
Member

If there is the potential for someone to add @language or @direction as a global in the context (I see a case for @language, but not @direction), you should consider adding "@language": null to term definitions which should be base strings. As JSON-LD was specifically designed to allow things to be said in JSON that have firm interpretations, then I do suggest that you lean on the JSON-LD value properties (@langauge and @direction or term aliases).

This is like an alternative to your Options A/C, but to define fields as being non-lingual. As suggested, if you haven't already done so, you can provide global term definitions "lang": "@language" and "dir": "@direction" that allows value objects to use more "friendly" property names without loosing the ability for a JSON-LD processor to handle them properly.

See, for example Example 70 in JSON-LD 1.1 for setting @language to null for a specific term. This works for @direction, as well. In any case, adding these to term definitions, where appropriate, probably makes sense, to the degree that they're not already so defined.

A last note regarding @direction: The JSON-LD spec provides two informative mechanisms for turning these into RDF Literals, which is important for RDF Dataset Canonicalization. You should specify which one to use (I suggest the i18n namespace. RDF 1.2 will likely make base direction a core part of a literal.

@iherman
Copy link
Member

iherman commented Sep 11, 2023

@gkellogg wrote

[…] you can provide global term definitions "lang": "@language" and "dir": "@direction" that allows value objects to use more "friendly" property names without loosing the ability for a JSON-LD processor to handle them properly.

Which makes option C more palatable. After all, we do not use @type and @id either, using the same mechanism.

@iherman
Copy link
Member

iherman commented Sep 11, 2023

  • I am not in favor of adding new keywords to the vocabulary. We can use lang and dir (see previous comment) if necessary.
  • I am not sure whether we have to choose. Yes, language setting is tricky, but that is the reality out there. We can leave all these options open, and add a text to the spec that outlines the pros and cons for all of them. Application developers or authors will get it wrong sometimes, just like numerous Web pages get it wrong. A number of VC-s will be poorly defined, meaning the language will be, in effect, "und"; this isn't different from the Web either.

My proposal is to allow all of A/B/C, put some explanation text in there to show what the pros and cons are, and let it be.

@msporny msporny removed the ready for PR This issue is ready for a Pull Request to be created to resolve it label Sep 11, 2023
@BigBlueHat
Copy link
Member

The A/B/C options look great to me as well. However, I don't believe them to be "for JSON-LD processors only." They're equally as findable/traversable using JSON.parse() and some basic code to find or use those keys--which would be an identical effort if the group went with option D--which comes with "unicorn cost" as no one else knows what a credentialLanguage might be...nor how to use it.

I don't believe option A (scalpel) and C (JSON-LD) to be any more different than the preceding @ symbol.

From a JSON-LD processing perspective, the processing would be identical and have the same result (thanks to the aliasing in the context files).

Processing as "just JSON" is literally just the difference of looking for lang or @language in an object and while your JS code might end up with an extra bracket or two, it's hardly a material benefit to force the alias. Additionally, the @ prefix (with or without JSON-LD processing) is a great signal to any developer that "there's something more going on here".

I can see any of A/B/C working, and would be happy to help expand the JSON Processing section to describe what's needed at that level.

@iherman
Copy link
Member

iherman commented Sep 13, 2023

, it's hardly a material benefit to force the alias.

I am not a particular fan of the aliases myself, but we should be consistent. We removed the @ characters for type and id; we should do the same here, imho.

@BigBlueHat
Copy link
Member

, it's hardly a material benefit to force the alias.

I am not a particular fan of the aliases myself, but we should be consistent. We removed the @ characters for type and id; we should do the same here, imho.

Agreed on the need for consistency. I prefer to keep the layers distinct, but that's a personal preference. However, these "language value objects" are unique animals--with or without the alias.

We'd need to explain in the JSON Processing section that no other keys should appear in a language value object...or we'll end up with confused people putting new properties along side the value, lang, dir trio or using them "randomly" within the document (especially likely with value) in incompatible ways.

However we define it, we need to make sure there's conformance and explanatory for JSON Processors since JSON doesn't have this capability natively as JSON-LD does. The @ prefixes signal such capabilities really well, but since this is a unique media type with unique JSON processing requirements we can handle either.

@dlongley
Copy link
Contributor

I like @BigBlueHat's arguments here for not aliasing the language and related keywords.

@iherman
Copy link
Member

iherman commented Sep 14, 2023

We'd need to explain in the JSON Processing section that no other keys should appear in a language value object...or we'll end up with confused people putting new properties along side the value, lang, dir trio or using them "randomly" within the document (especially likely with value) in incompatible ways.

Isn't there a danger of misusing id or type in the same way? Also, then value should also be changed to @value. Isn't this too late, when we have had a @-less usage in the spec since 1.1 (see Example in the spec)?

Your arguments are absolutely correct, and these are the reasons why I personally prefer keeping the @ everywhere and not use the aliasing trick. But that particular boat has already sailed...

@shigeya
Copy link
Contributor

shigeya commented Sep 14, 2023

Firstly, @msporny, thank you very much for showing multiple choices in a readable way.

Option A/B/C looks good, and I agree with the discussions that the processing of option D is almost equivalent to other options.

Also, I'm curious about how multilingual construction can be with option D.

@msporny
Copy link
Member Author

msporny commented Sep 14, 2023

I added an option E -- using language translation files (to the original post above).

@aphillips
Copy link
Contributor

aphillips commented Sep 14, 2023

I want to ensure that what I18N is looking for is clear:

Any solution must provide:

  • each natural language string value must have a language associated with it
  • each natural language string value must have a direction associated with it
  • the associated values do not have to appear directly on the value: they can be from a document structure somewhere or a default

Solutions may provide:

  • a document default language
  • a document default direction

If document level defaults are provided:

  • there must be a way to override them for specific strings, although any given field is not required to do so.

There should also be a way to supply multiple languages within a document or to supply multiple additional languages (besides the default language). I believe this is already addressed.

Option A satisfies these requirements, but at the cost of being verbose.
Option B satisfies providing a document level default, but in a way that VC might find onerous for some users. With the other commits it would satisfy the requirements.
Option C satisfies the requirements, but possibly shares the shortcomings of Option B
Option D satisfies the requirements (with the other changes provided previously) by supplying the document level default (don't forget to add a default direction value)
Option E doesn't satisfy the requirements because any name or description fields would still be untagged. It would also require a second file be fetched. Note that some privacy folks think this might be a fingerprinting vector.

As a callout, Webapp Manifest is addressing their similar use case with some design changes (after our meeting with them this week). One of these looks like a language map and might serve as an "option F".

In this design there is a default language/direction (like option D) which applies to all NLS fields. NLS fields can optionally have lang and dir. E.g.:

"defaultLang": "en-US",
"defaultDir": "ltr",
...
"name": "I am in English",
"description": [ "value": "I am in British English", "lang": "en-GB"],
...
"name": [ "value": "استخدم في Bahrain مصر Kuwait!", "dir":"rtl"]

Multilingual is also an option that uses a different structure from what you currently have and which looks like this:

"name": [
   "en": [ "value": "some name", "dir": "ltr", "lang": "en-US"],
   "fr": [ "value": "some name in french" ],
   "ar": [ "value": "some name in arabic", "dir": "rtl" ],
   ... etc...
],

Note that dir and lang are both optional in this structure (lang is implied by the key, dir is inherited unless specified).

I haven't had a chance to track down the reference to this but will add it here later. I think it would be good if we can align best practices across multiple (even totally unrelated) formats, but don't insist that you do.

@iherman
Copy link
Member

iherman commented Sep 15, 2023

The issue was discussed in a meeting on 2023-09-14

  • no resolutions were taken
View the transcript

2.7. Internationalization Review for VCDM 2.0 (issue vc-data-model#1155)

See github issue vc-data-model#1155.

Manu Sporny: Let's clarify that normatie statements for use cases docment is requirements on VC Data Model.
� there was a conversation about this issue because other issues track those concerns more directly, then more conversation happened here.
� so what needs to happen to say that we have addressed this issue.

Sebastian Crane: this is a traffic issue so its convenient to keep it open.

See github issue vc-data-model#1264.

Manu Sporny: i agree with Sebastian. The other data point is that conversation started here, but moved to 1264.
� specifically that issue. We do have an outstanding concern.
� Namely what are we telling people to do about language strings.
� I'll put a link to the options.
� we can close or open. This issue: just one more item we need to resolve before CR.

Shigeya Suzuki: +1 on keeping this issue open.

Manu Sporny: the way we sorted the issues we have a PR, but it doesn't address all the language options.
� so we need to talk about this still.

Brent Zundel: if we need to talk about this in this phase, we should have it now.

Manu Sporny: thanks. The internationalization group asked us to specify how a default language for a document is specified.
� we responded by saying we have name & description in the base context, so we can use that as an example.
� we explain that in the spec today.

Manu Sporny: #1264 (comment).

Manu Sporny: They came back and said that he felt uncomfortable because we were not specifying a default language for the VC.
� That led to different proposal, each with different tradeoffs.
� options A, B, C, D, and E.
� I don't think we have time to go over all of these today.
� I think what we are doing in the spec is the best that we can do.
� but that depends on what the i18n group feels.

Dmitri Zagidulin: what is the difference between options A and C?

Brent Zundel: I don't know that we can avoid talking about the options.

Sebastian Crane: thank you manu for creating the issue with the clear options.
� option C is what I proposed a resolution for.
� I think we are close to consensus on this issue.

Manu Sporny: I can go through the options ...

Brent Zundel: since Sebastian thinks C might be a winner, let's start there.

Manu Sporny: Option C uses JSON-LD language features.
@value for value of a string.
@language and @direction.
� benefit is already in JSON-LD.
� drawback is that people who don't like JSON-LD might not like this option.
� So we need to hear back from people who want to use something else.
� also Option C doesn't set a base language for the document. That's not clear if the international WG will go for that.

Ivan Herman: I have a comment on the tech. but a practical comment first: Addison is around, so let's try to talk to him.

Brent Zundel: That's right. We can take advantage of TPAC.

Ivan Herman: on the technical side, the title is misleading because the JSON-LD language features are more than what is in option C.
� we can use JSON-LD features the way its used in 1.0 and we can set the direction for the whole file if we want (using JSON-LD).
� But I think we should not spend to much on this. JSON-LD has gone to great efforts to work out language with the i18n group.

Pierre-Antoine Champin: https://www.w3.org/TR/string-meta/.

Ivan Herman: So we should not cherry pick.
� There are not thousands of ways to do that in JSON either. We may have a long conversation (with some beer) wether to include an @ sign or alias that out.
� That's the only question for me that's really relevant (the aliasing).
� We know (from HTML) in many actors in countries where people will ignore these language features.
� That's life.

Sebastian Crane: I'd like to agree. Two things: we are providing the option to do language support correctly. We can't force it, but we can enable it.
� second, the idea of a JSON-LD only idea. There's nothing in JSON-LD that requires "full JSON processing" for these language features.
� So for those here with no interest in RDF, this shouldn't add any complexity.

Manu Sporny: this comes about because some implementers look at the @sign and freak out.

Dmitri Zagidulin: well so wait, why dont we alias out the @ sign?

Manu Sporny: so if no one is complaining about that, we can just adopt it.
� If we alias out the @ sign and we apply that against all VCs everywhere, then nobody can have a property named "language", which is prolematic.
� The other thing.. Ivan said we could just depend on the JSON-LD properties.
� there are examples where that would clearly be wrong.
� if we allow @language in the context, e.g., @langauge="es" at the top. That would apply that language to every text string in the document, including base64 encoded values, etc.
� so we need to provide guidance that doesn't lead to meaningless decoration.
� If people are good with @value, @language, and @direction we're good. Aliasing is ok, but not great.

Brent Zundel: If we were to proceed as mentioned, if we go with those @values, is there anyone who would be opposed to that?
� I'm not seeing any opposition, so I think this is read for PR.

Ivan Herman: JSON-LD scares the hell out of people sometimes because they are nervous about RDF. I have to emphasize what is in JSON-LD for the language has nothing to do with RDF.
� The features themselves, these are generic features that can be used for ANY JSON vocabulary.
� No magic or hidden RDF.
� We can haggle around the @ sign.
� Personally I prefer keeping it, but that's me.
� we have done that for id & type.

Dmitri Zagidulin: re: alias. I'd argue we have a better way than flipping a coin. We know there is signifcant pushback on @s.
� can we have a poll.

Manu Sporny: two things. If we decide to use JSON-LD keywords, we'll have to change the way name & description work.
� two: I'm concerned about aliasing "value". I'd feel better if we had that for a while, I'd feel better.

Andres Uribe: Is it possible to alias "lang_value" to "@value" ?

Manu Sporny: I do agree with dmitriz that there is an allergic reaction to seeing @ signs in JSON.
� Don't think its an easy answer. We should be ready to trigger another CR later.

Ivan Herman: I forgot to react to Manu about setting global language. From the JSON-LD point of view, that's not really a problem. Because we can specific that language doesn't apply for the datatype in the range of that specific property.
� for JSON only users they would ignore it.

Brent Zundel: POLL: we will use keywords @language, @direction, and @value for language and alias them to 'language', 'lang_direction' and 'lang_value'.

Andres Uribe: +1.

Gabe Cohen: +1.

Dmitri Zagidulin: +1 (though would much prefer 'direction' and 'value').

Sebastian Crane: +1 for one option, +1 for the other. I think that counts as abstaining, but I will definitely not oppose either option :).

Ted Thibodeau Jr.: +0.5.

David Chadwick: +1.

Ivan Herman: +1 (like dmitriz).

Joe Andrieu: 0.

NickLansleyGS1: +1.

Manu Sporny: +0.5 (with severe trepidation wrt. stomping on existing data models out there) -- also, language/direction/value (not what was mentioned).

Shigeya Suzuki: +1.

Phil Archer: -1.

Paul Dietrich: +1.

Juan Caballero: +1.

Jay Kishigami: +1.

osamu-n: osamu-n has joined #vcwg.

Dmitri Zagidulin: can Phil and Manu explain why dangerous? and what holes?

Phil Archer: I agree with Manu's comments, it's dangerous. but my participation here is minimal, so I understand.
� this could impact things in unintended ways.

Dmitri Zagidulin: 'id' and 'type' are also very common words.

Phil Archer: using a common word like value to mean something that other people don't use it for.

David Chadwick: I would be -1 if the alias was 'value' and not 'lang_value'.

Brent Zundel: clarification the proposal is for lang_value and lang_direction, not "value".

Andres Uribe: ditto to what DavidC said above.

Phil Archer: Ah... that's much better.

Paul Dietrich: agree DavidC.

Manu Sporny: also, let's not use underscores since none of our other properties have underscores :).

Manu Sporny: langString <-- would be better.

Sebastian Crane: The aliasing of @ to something else simply makes the @sign is implicit.

Manu Sporny: ivan said some stuff I didn't understand. I'd like to.

Dmitri Zagidulin: +1 camel case vs snake case.

Manu Sporny: If we alias to lang_string, lang_direction, then that's probably fine.
� it would be nicer if people didn't get all bent out of shape about @ signs.

Ivan Herman: +1 to camel, to be aligns with the style used for other properties in VCDM.

Manu Sporny: Also, what Phil said: there's a lot of data out there that already uses value and it would trigger a lot of confusion.
� The problem is @value means something more than just language.
� That means ... it's not a straightforward decisions.

Brent Zundel: we can keep going it.

Sebastian Crane: I'm happy to help with PR.

Pierre-Antoine Champin: manu, people can still use @value when lang_value does not make sense. But you can't always prevent them to shoot themselves in the foot if they want to.

Dmitri Zagidulin: to clarify, if we alias value to something else. It's the other direction. Aliasing from lang_value to @langauge, but you can still use @language elsewhere.

Manu Sporny: if you compact using the VC context and you have @value throughout your JSON-LD, all those @value will be changed.

Ivan Herman: if you make the alias an embedded context for a property.
� for every property that is potential language and you scope the context.
� That's what we suggesting is to not make it scoped.
� Option C is global, unscoped.

Pierre-Antoine Champin: oh, yes, compaction will mess it up :-(.

Dmitri Zagidulin: so why not use option A?

Ivan Herman: so for the other properties, you can make it null.

Manu Sporny: we need a deeper conversation and I don't know if we can get through this in 20 minutes.

Manu Sporny: option A /is/ what we're doing in the spec today :).

Brent Zundel: this is a before CR issue. Is that because there will be normative changes to the spec?

Dmitri Zagidulin: ok. and what is the problem with it today? just the @ signs?

Manu Sporny: yes. this has normative impact.

Brent Zundel: so we will add time for this during TPAC.
� Break for lunch! Back in 80 minutes with or without Brent.

Manu Sporny: no, it doesn't provide a language for the entire VC and it requires context authors to do scoped language stuff.

Dmitri Zagidulin: @manu - what do you mean by scoped language stuff? what will it require them to do?

this: https://github.com/w3c/vc-data-model/blob/main/contexts/credentials/v2#L122-L127.

Manu Sporny: (which, btw, I think is fine ^).

Dmitri Zagidulin: interesting.

@BigBlueHat
Copy link
Member

Here's a big reason we shouldn't alias @value to value: https://schema.org/value

Regardless, other aliases could be considered, but more names for things do increase the risk of collisions--and the introduction of various case approaches (lang_value vs. langValue) isn't really a great improvement over @value especially if they differ from other native VCDM term casing approaches.

@iherman
Copy link
Member

iherman commented Sep 19, 2023

Here's a big reason we shouldn't alias @value to value: https://schema.org/value

Good catch.

Regardless, other aliases could be considered, but more names for things do increase the risk of collisions--and the introduction of various case approaches (lang_value vs. langValue) isn't really a great improvement over @value especially if they differ from other native VCDM term casing approaches.

langValue sounds o.k. to me.

As for removing the @ character, I believe that ship has sailed a long time ago, when @id and @type were aliased. Consistency requires removing all @ characters.

@dlongley
Copy link
Contributor

dlongley commented Sep 19, 2023

Another argument against aliasing @value (in particular to langValue) is that there may be a future need to use @value with @type (as a datatype) ... rather than with @language. As long as langValue is a tightly scoped alias, this can be avoided, but it may be better to just use @value and @language (and @direction) directly anywhere language values should go because it could be easier. This would be a good reason to deviate from aliasing in these cases.

The aliasing around @id and @type could be considered specific to foundational ID and type information -- and not specific to "any keywords". There would be no inconsistency when viewed in this light.

@iherman
Copy link
Member

iherman commented Sep 27, 2023

The issue was discussed in a meeting on 2023-09-26

  • no resolutions were taken
View the transcript

2.6. Internationalization Review for VCDM 2.0 (issue vc-data-model#1155)

See github issue vc-data-model#1155.

Kristina Yasuda: This is an internationalization one.
… We keep adding / removing "ready for PR".
… So we discussed at PR how we would address this. We had a poll.
… We didn't assign anyone and we still don't have ready for PR.

Manu Sporny: See #1264 (comment).

See github issue vc-data-model#1264.

Manu Sporny: Yeah, this is also tied in with issue 1264. There kinda/sorta duplicates of one another.
… I'm worried about this one ... I think we need the i18n people in one of our meetings and we need to talk with them, back and forth, need to avoid doing something they would object to.
… Assigning a language for the whole VC is a problem and we don't want to do that.
… Addison has responded with something where he's basically saying, we have a number of options we've proposed that satisfy their requirements but it's not clear what the best one they'd like. We should bring them in to talk with them about it before moving forward.

Manu Sporny: #1264 (comment).

Manu Sporny: Let me link to Addison's response in IRC.
… He's basically saying, this is what the i18n WG is looking for and there's some MUSTs/SHOULDs/MAYs ... and he analyzes each option that is above, noting that there are a couple there ... just about every option except the last one satisfies what they want but it's not clear which they'd want.
… It's not clear how much of a hard line they are taking here on any approach. I'd like to get them on a call so we can just say once and for all what we're doing and then move on without worrying about any objections during transition.

Sebastian Crane: A few weeks ago, we had a call and I proposed a resolution, we didn't get to voting on that. The initial reception was unanimous reception within this WG, so I think the only thing to do is get the i18n people involved.
… There isn't much left with that issue then.
… It would just be implementation from then on.

Kristina Yasuda: Thanks. Quick question -- how is not using @language in @context aligned with using @language keyword for i18n?

Manu Sporny: They are not aligned.
… The i18n are saying: They want a document level default and I don't know how hard of a line we have on that and then our only option is going to be using @language in @context and that's got problems.
… JSON-only processing is more difficult and it will tag values that are not supposed to have languages like base64 values with a language tag.
… So, during the F2F we were saying be surgical, use the @language and @value and @direction stuff.
… We also said, maybe we'll alias that, but people came up with reasons we shouldn't alias.
… So I think what seabass said was to just use the @ language features in a targeted way and we just need to find out if i18n people would be ok with that approach.

Sebastian Crane: I would like to expand on that, I'm not a member of i18n WG at the moment. There's a technical reason not to do global language but there's also a reason that it's philosophical reason that it's not good, "you can enter" is the same meaning no matter what language you say it in.
… They are not just simple language documents. When you're using JSON-only processing you may not get to use those advanced RDF feature.s.
… Having the language translation features within the properties themselves is more elegant, you're not translating the credential itself.

Manu Sporny: agree with seabass.

Dave Longley: +1 to seabass.

Phillip Long: +1 to seabass2.

Kristina Yasuda: Ok, I will reach out to set up a meeting with i18n.

@iherman
Copy link
Member

iherman commented Oct 4, 2023

The issue was discussed in a meeting on 2023-10-03

  • no resolutions were taken
View the transcript

1. Internationalization WG review.

Kristina Yasuda: Special meeting due to feedback on Internationalization.
… Existing options had not been decided on.

Kristina Yasuda: Please introduce yourself Addison.

See github issue vc-data-model#1155.

See github issue vc-data-model#1264.

See github pull request vc-data-model#1271.

Addison Phillips: I'm the chair of the I18N group at the W3C.

Kristina Yasuda: five options? #1264 (comment).

Manu Sporny: The background of this: we've had guidance about supporting internationalization, with a design pattern for people to follow. In the 1.0 and 1.1 work, we haven't seen much adoption of the I18N features.
… For 2.0, we are adding two fields expected to be multilingual.

Manu Sporny: Here are the potential options that we're considering: #1264 (comment).

Manu Sporny: We had a number of options to consider for how to do it in 2.0.
… We're looking to get to consensus on the option that we should choose, and one that will satisfy the I18N group.

Addison Phillips: I had read through the summary of the discussion and will summarise here to ensure that we have a common understanding.
… In general, the I18N Group would like to see that any natural language string field has metadata about at least A: language and B: text direction.
… We aren't prescriptive about how that is performed. We also like to see a default language for documents.

Orie Steele: It would be good to get a stronger opinionated recommendation regarding approaches.

Kristina Yasuda: we kind of have 1.90 foot in LD world.

Dave Longley: I think we have both feet in LD world, we just want to use the simplest on-ramps.

Dave Longley: and adding @value, @language, and @direction in fields locally is the easiest way to do that.

Addison Phillips: I think it sounds like being somewhat in the Linked Data world as well as a more general specification produces some complications.
… We would like to understand more the concerns around global @language directives, because we are wondering whether these concerns apply to the wider LD community.
… From what we've learnt so far, the I18N group is trying to produce best practice recommendations to other groups.
… One of the ones that we've already been working on is quite different from your approach. I would like to share it with you.

Manu Sporny: On the topic of being both in the LD world and not, it seems like a subset of our community are less likely to adopt the specification when LD features are added.
… We've tried to reduce the LD features to a minimum up until now.

Orie Steele: -1 to asserting that "avoiding using LD is a possibility at this point".

Dave Longley: -1 that we can / are avoiding it, +1 that we're choosing the easiest on-ramps.

Orie Steele: -1 to being vague about understanding conforming documents (which are JSON-LD in compact form).

Manu Sporny: As for Option E (using a translation file), I think you mentioned that you were against it, and there seems to be agreement within this WG.

Addison Phillips: I would not object to translation files per se, but I would point out technical complications about multiple requests and resources. I think that doesn't sound like the right pattern for credentials.

Manu Sporny: Option E can eliminated then!
… For option D, I don't think this option has any advantages over using the LD method of a global @language, which is effectively the same in effect.

Addison Phillips: It is common for us to recommend that specifications do this. It would be better if there were generic mechanisms, but specification-specific fields are OK.

Dave Longley: -1 to option D.

Dave Longley: -1 to option E.

Shigeya Suzuki: For the record: option E is externalization, and it will not be possible unless we define internal way to express it IMO.

Dave Longley: +1 to eliminate E.

Kristina Yasuda: Are there any objections to eliminating option E?

Andres Uribe: +1 to eliminate E.

Manu Sporny: +1 to eliminate option E.

Shigeya Suzuki: I'm fine with eliminating option E for now..

Phillip Long: -1 to option E & D.

Joe Andrieu: +1 to eliminate E.

Manu Sporny: +1 to eliminate option D (but keep it around as a backup plan).

Addison Phillips: I would suggest that you could keep it for a 'backup'.

Andres Uribe: I think that's the default if we can't get consensus of anything else.

Sebastian Crane: I wanted to mention option E with the translation files, sometimes they look really good on paper, but in practice, lots of complications.
… networked translations, even when installed on computer, there are still lots of issues, GNU style translation -- .pot files -- translates based on literal value of string, but as linguists say, there are cases where you can have same words which mean two semantically different things and language files as used in GNU world don't have opportunity to disambiguate those. Number of complicates here with Option E.

Shigeya Suzuki: I don't want to spend time time on this, but the way gettext/po used is studied well and in some non-english area esp. in CJK area, it's useful.

Sebastian Crane: I agree with addison: they can be used correctly but I think that is unlikely to be the case in the VC world.

Ivan Herman: Option D means having two properties: language and text direction. We need both on the default level.
… Is this the general view as well?

Manu Sporny: We do need to express language AND direction.

Manu Sporny: yes, that's the general agreement, I believe.

Addison Phillips: Indeed, I agree we would need to have this.

Manu Sporny: Here is option C:.

    "credentialSubject": {
        "myHumanReadableProperty": [{
            "@value": "This is some human-readable text.",
            "@language": "en"
        }, {
            "@value": "هذا بعض النص الذي يمكن قراءته بواسطة الإنسان.",
            "@language": "ar",
            "@direction": "rtl"
        }]

Dave Longley: +1 to this option (C, I believe), i think it's the simplest and will work generally for any natural language field.

Manu Sporny: We would express the value of the string, the language, and the text direction. I believe this meets the requirement that addison illustrated.

Dmitri Zagidulin: +1 to option C, works well for multi-language credentials in Edu land.

Phillip Long: +1 to Option C, as it does indeed work well in edu-land.

Manu Sporny: We've mainly been discussing whether to alias the "@x" terms. Sebastian proposed options C on two occasions, and there were no objections raised. I believe it addresses all concerns except for the ability to specify a global default.

Sebastian Crane: +1 to Option C obviously :).

Manu Sporny: It's not easy to test this, as multiple languages are optional.

Manu Sporny: +1 to speaking to Option C in the specification.

Addison Phillips: I'm concerned that whilst the 'SHOULD' and 'MAY' are good supports for internationalisation, there will still be completely unlabeled strings.
… I would like it to be possible to know a default, for when people don't want to put all the extra syntax in.

Manu Sporny: Can we ask if the group is OK exploring option C further?

Kristina Yasuda: If anyone is strongly in objection to option C, please speak now.

Sebastian Crane: To repeat for addisons' benefit -- VCs don't inherently have a language... language on field such as name/description is for human holder of VC in a wallet application. When you have the RDF world, link things together based on ontological truth, actual meaning...
… you don't necessarily want to apply a language to the specific credential, you want to apply language to description of credential... that's why I like option C -- translate those human-readable values, credential itself doesn't have a language.

Orie Steele: Conforming documents are represented in JSON-LD.... the philosophical concept of credentials is not helpful... JSON-LD will have text that is in a human readable language (both the term definitions, the text behind them, and their literal values).

Addison Phillips: Yes, important observation, locale-neutral data... when people talk about these things, name/description is how humans interact... can't look at other things and talk meaningfully about them... credential has BS of science -- those names/descriptions are of natural language pieces... want natural language to be associated with those parts, not other data.
… challenge is that machine generates these things, people writing code may or may not be willing to generate multiple language versions, or they may not wish to obtain and serialize information on per-field basis... if you're willing to say MUST, then we're good.
… I think that's an important distinction: one wants to have language-neutral data if at all possible. A complication is that humans can't talk meaningfully about the pure data, only about the natural language descriptions.

David Chadwick: I think MUST is fine, but not sufficient. Let's say you have a degree from a Japanese university and has language metadata, that degree credential is still not readable by a typical English person.
… I believe that C is necessary, but doesn't completely solve the internationalisation concerns.

Andres Uribe: I'm definitely supportive of Option C. In addition, I would like to see aliasing. I don't really understand why aliasing will cause problems with JSON-LD, so I would appreciate an explanation here.

Dave Longley: aliasing @value will alias it for everything, not just language values.

Manu Sporny: The short answer is that @value is also used for non-natural-language fields. We can't just aliases it globally without making other fields have unwanted language features.

Dave Longley: so making it say langValue (or whatever) for non-language values will be weird / confusing.

Orie Steele: The comment about "re-compacting" / "compacting" is critical for the WG to understand.

Manu Sporny: For that reason, I would be strongly opposed to aliasing if option C can be sufficient. It is only a single character difference.

Orie Steele: I'm not sure that there is understanding here... and we should clarify.

Andres Uribe: Thank you, that answered my question.

Manu Sporny: We would end up getting our alias appearing in unexpected places.

Kristina Yasuda: I would like to ensure that other options are considered as well and we are running out of time.

Manu Sporny: I'm afraid that we're not going to be able to get to "MUST always use @value/$MD_CODE$/$MD_CODE$" when expressing human-readable strings.

Kristina Yasuda: Let's discuss option B and A.

Dmitri Zagidulin: +1 manu.

Manu Sporny: The suggestion I'm hearing is to remove option A. It just allows us to use 'prettier' values, but doesn't have any advantage.

Orie Steele: If folks don't understand "compact vs non compact LD"... they don't understand what a conforming document is.... so we should be cautious requiring "non compact" processing of languages, because they spec does not require people to understand that.

Dave Longley: if option B is putting @language as a default language in the context then -1 to that, it corrupts the data.

Manu Sporny: Option B provides a document-level default. The issue that we would need to flag is non-natural-language fields being classed as a specific language, such as Base64 data being marked as natural language.
… We could make Option B a fallback to Option C, but that has downsides for the JSON-LD context architecture.
… I believe the options here are Option B+C - OR - Option C+D.

Addison Phillips: You can't prevent people from serialising @language globally. You could deprecate that behaviour of course.

Orie Steele: IMO, if you can't stop people from doing something, its considered best practice to give them guidance... and not be silent.

Sebastian Crane: I'd like to talk about Option C only.
… This is an implementation consideration, authenticate users, use existing libraries -- if those tools made it as easy to set a default in the code and have the serialization of fields automatic, as writing the serialized language feature at the top, then people would use that feature. In contrast to HTML, people were hand-writing it... but due to cryptography involved, people aren't hand-writing VCs. Lack of global language feature could.

Manu Sporny: be side-stepped in implementations.

Kristina Yasuda: I do not like the idea of let's rely on the library to implement this correctly..

Shigeya Suzuki: +1 kristina.

Ivan Herman: We could argue the same thing about HTML, as few people write HTML by hand. What's the proportion of tools that produce linguistically undefined documents? Perhaps addison knows.

Shigeya Suzuki: It's depends on complexity of the output. for a simple VCs, it's not necessary to depends on huge library. not all people have freedom on memory and energy usage.

Ivan Herman: Maybe putting the language metadata in all fields is a bit naive.
… I am not particularly partisan to the technique, but I think it's important to have something for global language.

Ted Thibodeau Jr.: can we globally say "language: undefined" or "language: various" or similar?

Addison Phillips: To Sebastian's point, if you say that it's a MUST and all the libraries implement it, maybe it would be moot. I'm not sure that you could expect that response without a MUST.

Kristina Yasuda: can everyone put your favorite option on IRC.

Kristina Yasuda: C+D.

Ivan Herman: C+B.

Dmitri Zagidulin: C+D for me.

Sebastian Crane: C.

Dave Longley: -1 to C+B because it corrupts all string fields that are not natural language fields (which is a common thing in VCs).

David Chadwick: c+d.

Manu Sporny: C+B.

Phillip Long: C+B or if we're ranking 1, C+B, 2, B+D, 3, C.

Dave Longley: +1 to just C, I don't think there's a significant difference in MUST/SHOULD with C and having a default language with D, the people that don't want to do it won't do it either way -- and only tools will stop them.

Dmitri Zagidulin: wait D is in the VC core context or in the VC itself?

Joe Andrieu: +1 to C.

Manu Sporny: D is in the VC itself.

Kristina Yasuda: D is in VC itself I think.

Shigeya Suzuki: I think slightly C+B better but C+D is also acceptable.

Dave Longley: that's an insufficient description of B and D ...

Dmitri Zagidulin: can we remind people of the difference between D and B?

Manu Sporny: D creates a new feature, B uses an existing JSON-LD feature.

Dave Longley: B will use a JSON-LD feature that will apply a language to EVERY string field, even non-language fields.

Dmitri Zagidulin: in that case, C+B.

Orie Steele: I think I agree with what dimitri is saying though...

Dave Longley: D will invent something new for VCs but only apply it to natural language text fields.

Orie Steele: ^ yeah... that.

Phillip Long: Ori is up.

Ivan Herman: dlongley said that every field will get a language tag with B. However, if we had LD tags for datatypes, that won't be an issue.

Dmitri Zagidulin: agree with ivan.

Ivan Herman: It's not as bad as it looks considering the existence of those JSON-LD datatypes.

Dmitri Zagidulin: an app somehow interpreting language & direction on a base64 string or whatever is /not/ a realistic problem.

Dave Longley: I agree with you that we should use datatypes, but the JOSE and COSE parts do not have data types defined.

Orie Steele: also not necessary... to do... because the data model is COMPACT JSON_LD !!!!

Sebastian Crane: It is a bit involved, I'll write to the mailing list, can we delay a vote for day or two to engage w/ email.

Kristina Yasuda: We appreciate Addison's time. Thank you!


@msporny
Copy link
Member Author

msporny commented Nov 4, 2023

This issue (to recommend that authors don't use @language in @context) has failed to gain consensus. I'm closing this issue in favor of #1335, which will most likely recommend option B -- which is to use @language and @direction in the @context array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
before-CR blocked i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. pr exists
Projects
None yet
Development

No branches or pull requests

9 participants