New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RTL text examples lose their RTL marker #244
Comments
Also pinging @xfq and @TzviyaSiegman besides @r12a |
I believe the point is that pure JSON does not know the notion of bidi setting. If I copy the string (either the pure Hebrew or a mixed Latin & Hebrew) from the correct HTML rendering in the example, the string is stored in memory in "logical" order. Taking the examples in the text, if I copy the Hebrew text only, the string in memory begins with the character "ה"; if I take the full example then with the character "H". If I then copy these strings into JSON, the editor displaying my JSON displays the text using the directionality of the first character in the string, although it correctly displays the Hebrew part from right to left. Here is what I get in my text editor: {
"pure hebrew" : "היא שפת סימון",
"mixed hebrew": "HTML היא שפת סימון"
} As far as the mixed Hebrew/Latin text is concerned, this display is wrong, because the HTML text should appear on the right, but JSON does not know better. It just displays, using the basic Unicode directionality, the characters as they come. This is exactly what the direction setting compensate in our manifest and which results in the correct HTML display. My own conclusion is that Example 2 is fine, and the note after the example explain things correctly... |
Not normally dealing with i18n issues, I'm a bit confused about what is going on here. The direction in the Oddly, this no longer matches the stated incorrect rendering below where the So if the issue is the If the issue is that the text in the code example is not rtl, that should be by design since the But if the issue is that the code example should show the text as rtl, then I can always manually re-add the Anyone more knowledgeable about these things want to weigh in on what is intended here? |
To add my non-expert view... In memory the character string is stored as: 'H', 'T', 'M', 'L', 'ה', 'י', 'א', etc. However, if I take a screen editor for JSON, and I copy the string, it will display it as: "HTML היא שפת סימון", just like in #244 (comment) and like in the TR document itself: (this is a screen dump from Firefox on the HTML היא שפת סימון. In other words, what the first image in #244 (comment) shows is the way the string are laid out in memory and in the source of JSON, but not the way anybody would see the very same JSON in, say, a screen editor. Which also means that there is no real "good" or "bad" here, because to show the first version (ie, the dump of the memory layout) only in the spec would actually be misleading: nobody would ever see the content this way in a JSON editor. If I am right (which is a big 'if') I would suggest leaving the document as is, and the explanation note should include a few words just about that, i.e., that what is shown in the example is not what is stored in Memory, but how JSON would be shown by any user facing tool. And we can all rest. (If you look at the markdown source of this comment you will see that I had to use a bunch of html |
I guess what confuses me is why the |
@mattgarrish I think "that's not what we want to begin with" is the unclear part. Do we want, in the example:
If my analysis above is correct, the two are not the same. (And none are the desired display of the text.) (I do not know and do not really care as a reader about what combination of My proposal is to pick one of the two above and make it clear to the reader that these two exist, and we picked one. All this is dependent on whether I was right or whether I just made a fool of myself... |
This is the thing. It seems like we're going beyond what is relevant to publication manifests to explain character encoding in files vis-a-vis display. In any case, the example isn't "broken", so let's not get hung up on this. |
I agree. I would actually propose not to change the spec now, and, when published, turn this issue into an official erratum, asking for a possible clarification note. Changing the spec at this stage (ie, between PR and REC) is touchy. The maintenance WG can then decide later what to do about it... |
Note: I have some experience with Bidi.
|
Part of the confusion here is the issue misquotes the source. The actual source is:
|
@iherman @mattgarrish welcome to the topsy-turvy world of bidi code examples, where it's difficult to straighten things out meaningfully. First, i don't see any bdo element in the source text, so i'm not sure what that part of the discussion is about. What people expect to see when the value of example 2 is presented to them for reading in a page or application is indeed the 2nd string in the note just below examples 1 & 2, ie. the overall reading direction is RTL, but the characters are read in the direction of the arrows below: Your value in example 2 is currently showing what the text would look like if it were viewed in a LTR context (which it is), and this breaks the sense of the text. You'd only expect to see this if you looked at the source code, and your editor was set up for LTR. If you change nothing, the example would look ok in a Hebrew translation of this page (it would be to the left of the colon). But in your spec, it's not very informative, visually as is. It doesn't show the order of the characters in memory, it just shows some messed up ordering of directional runs. Note that this quickly becomes even more broken-looking if you have more than 2 directional runs in the example. This exposes a typical quandry for examples of code containing bidi text: should you display it as the end user would expect to see it, or broken as someone with a LTR editor would see it? I usually do neither. I display the characters in the order they are stored in memory, from left to right, and tell the reader that i'm doing that. To do this, you need to, behind the scenes of your spec example, add a The choice is yours, but hopefully this clarifies the situation a little. |
@r12a See Lines 434 to 440 in b5c50f7
|
Right, this is what we have (and maybe was contributed by you?). The problem is respec strips it while pretty printing the We should probably open an issue against respec as this seems likely to reoccur. We only have a REC left to publish, but I wouldn't want to be manually fixing But, @iherman, should we try and fix the source and add a note now, or should we leave this for an erratum to explain? |
As a reader my preference would be that in the Rec you manually fix up the generated source, showing each of the variants noted for the code with a note about what it is: common rendering, memory order, "expected" correct rendering - and maybe adding a link to a more thorough explanation of the whole issue of when bidi override is necessary... |
I share your unease about changing the final document (ie, after respec processing) but if we also submit an issue on respec, that can be updated the next time we want to publish. As Richard said in #244 (comment) if we display, in the example, a text that reflects the memory content (his second image), then we also create a source of confusion because that cannot be reproduced in one's own JSON text editor. Which is another source of confusion: we actually show a JSON portion that the user never sees and cannot really produce! We cannot really get it right :-(. To be honest, at this moment, I am undecided. If we decide to do the change, I think that it would be necessary to add a minor note in the example about the ins and outs of all this, i.e., to explain what you really see in the example.
Whether we can do it: I have asked the "Director" whether this type of change can be done at this stage in the first place. |
Coming from the "Director": yes, it is o.k. for us to do this change, if we feel comfortable, and we can leave it as an Erratum. @mattgarrish, do you think you can come up with a PR that includes the changes together with an explanation note? (The problem is of course that the file has to be edited post-respec :-( |
The RTL text in examples, despite markup like
isn't marked as RTL in the final rendered docs. My first guess is that respec (perhaps through the code prettifier) is removing the markup.
The text was updated successfully, but these errors were encountered: