Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices for multilingual values #5

Open
ajs6f opened this issue Dec 7, 2018 · 9 comments
Open

Best practices for multilingual values #5

ajs6f opened this issue Dec 7, 2018 · 9 comments

Comments

@ajs6f
Copy link
Member

ajs6f commented Dec 7, 2018

As evidenced by reports, there is some confusion about how to use multilingual data values alongside language maps. @pchampin noted that using an alias is a good way to work through this, and @BigBlueHat noted (link to minutes forthcoming) that this is the approach taken by Web Annotations. We should offer some examples of this practice, probably in the context of the (long-promised) Primer.

@gkellogg
Copy link
Member

gkellogg commented Dec 7, 2018

Riffing off of @pchampin's example in w3c/json-ld-syntax#91 (comment), we might use data indexing to aid access:

{
  "@context": {
    "occupation": { "@id": "ex:occupation", "@type": "rdf:HTML", "@container": "@data" },
    "description": "ex:description"
  },
  "name": "Yagyū Muneyoshi",
  "occupation": {
    "ja": "<span lang=\"en\">Ninja in japanese: <span lang=\"jp\">忍者</span>",
    "en": "<span lang=\"en\">Ninja in english: <span lang=\"en\">Ninja</span>",
    "cs": "<span lang=\"en\">Ninja in czech: <span lang=\"cs\"> Nindža </span>"
  }
}

This allows data indexing and consistent use of HTML values.

@iherman
Copy link
Member

iherman commented Dec 8, 2018

But... what would the generated RDF look like? One cannot add a language tag to a typed literal:-(

@gkellogg
Copy link
Member

gkellogg commented Dec 8, 2018

Its not a language tag, it’s a data index which has no RDF representation.

It’s useful for creating structural indexes.

@iherman
Copy link
Member

iherman commented Dec 8, 2018

Oops... well, this is one of those surprise effect that @ajs6f was talking about yesterday: I missed the "@container": "@data" and thought it was language. Ie, if I am an author not looking into the details of the context, I can be a bit misled.

Yes, it is legal; I do not think it is good practice.

@BigBlueHat
Copy link
Member

@iherman
Copy link
Member

iherman commented Apr 6, 2019

This issue was discussed in a meeting.

  • RESOLVED: highlight the need for work is ongoing, but it should present what can be done today via language/data maps and/or using HTML (or other) micro-syntax for expressing multiple language
View the transcript Multilingual Values
Benjamin Young: https://github.com/w3c/json-ld-syntax/issues/105
Benjamin Young: Another easy one ;)
… this one is about how JSON-LD currently works, and our past decisions to use HTML for multi lingual values (strings with multiple languages)
… so use straight up HTML, which is not ideal
… Looking at text level semantics HTML, but that’s for the future.
… so what do we need to propose in the primer to close the issue?
… related - there’s no way to do multi-language language maps
Rob Sanderson: it seems we should split this into a primer issue
… eg how do you use language tags
… and what do you do with multiple languages
… and then have a syntax issue around gkellogg’s issue for the normative specs
Benjamin Young: …about, is it an error to have English and Japanese in a string that is stated to be only one of those
Ivan Herman: What was put there by gregg sounds like a solution, but a bit misleading. The use of language tags gives the wrong impression — should be just indexes
Ivan Herman: Language tags are defined by ISO
Rob Sanderson: “<span lang="en">Ninja in japanese: 忍者@ja
Rob Sanderson: I agree ivan. to your question, the RDF would look like that:
Rob Sanderson: "Ninja in japanese: 忍者"@ja^^rdf:langString
Rob Sanderson: this has been my issue for 5+ years
… language tags must be langString
Ivan Herman: an RDF issue that is not ours to solve
… Lots of nice discussions in dbooth’s repo, but it should happen in RDF not here
… same as missing base direction
… we can only set a single language. And this is the same as base direction, shouldn’t touch it
Rob Sanderson: +1 to ivan
Benjamin Young: RDF is woefully broken in this way, but Gregg’s proposal of HTML + language map would be desirable by JSON developers
Rob Sanderson: https://iiif.io/api/presentation/3.0/#44-html-markup-in-property-values
Benjamin Young: If built to contain HTML, they’re not going to take it into RDF, so a little misuse has advantages
Ivan Herman: q=
Benjamin Young: our audience is interested in JSON, with a side plate of a graph
Rob Sanderson: I put this link in earlier https://iiif.io/api/presentation/3.0/#44-html-markup-in-property-values
… it uses exactly what gkellogg describes
… it is common and exactly what people want to be able to do
Ivan Herman: The funny thing is what you wrote is legal but ugly RDF – a microsyntax for a string, which is outside of RDF or JSON-LD
… it happens to be a subsyntax of HTML
… don’t need anything in the syntax document to do this, its a private agreement between parties
Rob Sanderson: +1 to Ivan
Ivan Herman: this is probably the only thing we can do
… so no issue in the syntax document
… it’s an ugly but best practice given the current technologies
Pierre-Antoine Champin: Going to propose a crazy idea, in the line of what Ivan said. We don’t need to change RDF, we could define a custom datatype. langString is syntactic sugar for a standard datatype for a more ugly microsyntax of the language inside the value
… we could define a more complex but similar datatype. That’s the crazy idea :) We could instrument it in RDF, with another container type, so that what gregg proposed would generate the appropriate structure
… but it’s quite some work
Ivan Herman: technically … yes … and now I put on the W3C hat, it’s outside of our charter. This would be a RDF datatype.
Pierre-Antoine Champin: What about JSON data type?
Ivan Herman: JSON is closer to our charter. But language isn’t.
… it would be a lot of work … the flood gates would be open. Ruby, direction, etc.
Benjamin Young: https://w3c.github.io/string-meta/
Benjamin Young: worth pausing on the JSON data type. I hear the concerns … is there a way around them? This string-meta document from i18n suggests JSON-LD as a solution for multi-language use
… feel that there’s an opportunity here
… And if we miss it, there’ll be a lot of terrible looking JSON-LD
… I see that it evokes process specters, but it comes up a lot
… The genie won’t go back into the bottle. So any hope of this?
Ivan Herman: Don’t remember the issue, but got into a long discussion with the editors. The examples are mostly wrong.
Benjamin Young: https://github.com/w3c/string-meta/issues/27
Benjamin Young: also w3c/string-meta#13
Ivan Herman: I understand the problem. Would love for the problem to be solved, but outside our influence
Benjamin Young: oh…and w3c/string-meta#23
Ivan Herman: I don’t see any other proper way, other than having it done at the RDF level.
Benjamin Young: …and another w3c/string-meta#11
Rob Sanderson: The bigger risk is to build on shifting sands and have RDF come up with a different syntax that’s incompatible with whatever we come up with
… should instead use it as a way to highlight the need, and potentially a micro-chartered group to solve it for RDF
Benjamin Young: Not ready to recharter, or make a new datatype. Rob proposes to kick it to another group and then an update to JSON-LD. Not a solution, but don’t want to lose the actions
… to close the issue we should state what can be done
… but need to be clear as to what /should/ be done that’s not confusing
Jeff Mixter: +1 to that
Proposed resolution: highlight the need for work is ongoing, but it should present what can be done today via language/data maps and/or using HTML (or other) micro-syntax for expressing multiple language (Benjamin Young)
Rob Sanderson: +1
Benjamin Young: +1
Jeff Mixter: +1
Ivan Herman: +1
Tim Cole: +1
Pierre-Antoine Champin: +1
Simon Steyskal: +1
Adam Soroka: +1
Resolution #5: highlight the need for work is ongoing, but it should present what can be done today via language/data maps and/or using HTML (or other) micro-syntax for expressing multiple language
Ivan Herman: procedural question - if we close the issue, then I think we will lose it for the bp doc. For the time being we don’t have an editor for the document. So don’t want it lost.
… should be raised in the BP repo
Rob Sanderson: +1
Benjamin Young: +1
Ivan Herman: should go through the issues to make sure we don’t lose them
Benjamin Young: Agreed – open editorial issues on BP?
… keep these initial discussion in the syntax doc, to not have the comments scattered
Ivan Herman: Wouldn’t close this one
Simon Steyskal: https://github.com/w3c/json-ld-bp/issues
Benjamin Young: not until there’s another issue to write it up
Ivan Herman: editor will write it up as they see best
Benjamin Young: And it’s the top of the hour
… thanks for all the input

@azaroth42 azaroth42 transferred this issue from w3c/json-ld-syntax Jul 12, 2019
@ajs6f
Copy link
Member Author

ajs6f commented Feb 7, 2020

Do we need a short new section on multilingual value issues?

@azaroth42
Copy link

Possible routes:

  • Language maps -- needs the value to be langString
  • Node: i18n Text Direction nodes / anno Textual Body / crm LinguisticObject
  • Embedded HTML values -- not common practice
  • Data indexing -- works, but doesn't survive round tripping through RDF

Other topics to include:

  • Discussion of @none as a not-language.

@iherman
Copy link
Member

iherman commented Feb 14, 2020

This issue was discussed in a meeting.

  • No actions or resolutions
View the transcript Multilingual Patterns
Rob Sanderson: #5
Rob Sanderson: adam had noted that there is some confusion about how to use multilingual data values alongside language maps
Ivan Herman: I think two things are intertwined here
… the first is the use of language map, possibly with direction,
… the second is the use HTML literals.
… I would prefer to separate them in BP.
… gkellogg’s proposal was a hack to use almost the same syntax for two cases,
… which is pretty convoluted. It works, but should this be BP?
Gregg Kellogg: in one case, this is a language map; in the other case, this is data indexing.
Ivan Herman: yes, but using language tags for data-indexing is misleading.
… It mislead me.
Gregg Kellogg: language maps reflect in the RDF abstract syntax; data indexing is lost in the process.
Ivan Herman: the example is convoluted because it uses rdf:HTML,
… which I don’t think is very frequent.
Rob Sanderson: should we also discuss @none in this context?
Ivan Herman: yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Non TR Work
Development

No branches or pull requests

5 participants