-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Better RTL detection #3845
Comments
This would be slow tho, and not easy to implement. |
Once per message? (or even once per keystroke?) doesn't sound too slow. It's not something you continually need to do per frame, and if you choose the more UX-y per keystroke, you can just "remember" the current counters for the message on the side, and just increase the relevant one. As for numbers/emoji/media/symbols, they do not count for either RTL or LTR. (just like they don't today, if the first character is a number, it would look at the second character, etc) Can you point me towards the relevant piece of code where the direction is selected? I'm willing to PR this. |
I can try to alter the way it works for messages in bubbles, but not in the message input field. Is the problem there as well? |
The problem is there as well, yes. And I do think that it's good if we can change it there as well, but the bubbles are more prominent (you write once, read many times and by many people). Here's an example illustrating what I'm after: https://jsfiddle.net/cv45ku2s/. This could be further optimized to only consider one character at a time, etc. If you have proper abstractions for the input/transcript combo, it might be a tiny bit trickier, but likely not by much. |
It is standardized that the base direction of a given message is determined by the first character. If you do not like the direction in a specific case you can fix it by adding a U+200E LEFT-TO-RIGHT MARK or U+200F RIGHT-TO-LEFT MARK. See the Unicode Standard Annex № 9 or the German Wikipedia article on bidirectional control characters for other possibilities. The maintainers should check if the Unicode bidirectionality algorithm is implemented correctly and not invent an own one – after all, it is unevitable that the people at Unicode have planned more shrewdly than any chat application developer can ever do, and also there are libraries for displaying BiDi well – it is quite disappointing that you can run ldd on the Telegram executable and not find any references to HarfBuzz or Pango in’t. And I am serious in this case, go check it, for I have been rather unlucky using the bidirectional control characters from the General Punctuation block inside of Telegram. Other possibilities would arise if Telegram would support full HTML at least by explicit enabling (not only Markdown), because HTML contains tags and attributes to manipulate bidirectional positioning. |
@Socialdarwinist Harfbuzz should be used internally for text shaping in Qt, so it should be used in Telegram as well — perhaps no references because everything is linked statically in the executable. |
@Socialdarwinist Your links are broken (Connection Refused), which is a bit alarming on unicode.org. Aside from that,
Really? "Enter a character that's not found in any keyboard" is your solution? What about mobile (if/when it reaches there)?
How did you come to that conclusion? The people in unicode are omniscient now? I completely disagree that just because there's a standard you have to implement it, even if it's suboptimal. Especially in a chat environment, it doesn't necessarily makes sense to only take the first (relevant) character into account. |
It is because of the same reason whereby open-source software is supposed to be better: If many have interests in it working, many people look on the things. And for Unicode, there are very many proficient people in the environment looking onto things in many stages before publication. If there is something wrong in Unicode, the world has to be blamed. I point out that you claim that the bidirectionality standard is suboptimal while there is no better one visible from your side at least – cocky. If you know an improvement, you can surely initiate the Unicode process to implement it.
As you might know keyboards do not contain characters but keyboard layouts do. /usr/share/X11/xkb/symbols/ara can easily get a new layout, especially as the default keyboard layout has much room free. I have already played some weeks with the thought of adding bidi signs to it. If somebody fancies to be faster than me, this is what my thoughts have collected to be added to the symbols/ara file in XKB:
Additionally, there is direct Unicode Input possible in GTK+ (Ctrl+Alt+U) and even better in IMEs like there is in IBus and Fcitx with search by name. For macOS it is also possible to add keyboard layouts. For Windows, people are doomed for using that system, as keyboard layouts there are binary. One can find keyboard layouts installable on Windows, but the compatibility does not appear to last. People choose to be dependent on Microsoft’s grace, that is what they get. |
Using the percent of RTL or LTR characters to determine the direction is unpredictable and very bad UX unless you are working with large text paragraphs. This can be seen on Twitter which seems to implement a similar algorithm and you can’t really tell (without counting the characters in your head) if a tweet will end up left to right or right to left. The first strong character algorithm is at least predictable and can be controlled without rewriting the text to have a different character count. The accessibility of control characters should not be an issue, the application can easily have a RTL/LTR button/shortcut/whatever that inserts RLM/LRM in front of the text before rendering it. |
@khaledhosny That's actually a really good option as well. Some button or a control to insert the appropriate Unicode character to control direction, assuming those are supported by all major systems, is something I can definitely get behind. |
I have now outpoured my scheme of a new default Arabic keyboard layout. Contriving this has taken my day, and I have yet to put the real Arabic characters to the comments instead of (or in addition to?) transcriptions now used, but with my experience of bringing about XKB layouts it has worked at the first try, so I have published it now this evening; I just keep it a few days for digesting it and to give you’ll the opportunity to evaluate it – the new version of xkeyboard-config is scheduled for the 31th of September. @khaledhosny @behdad or I don’t know who else, call your polyglot mates to have a look at it! I have mapped the bidirectional control characters to it except the overriding ones (I don’t think LRO and RLO are supposed to be regularly used for text?) and as there has been much unused room on four levels I have mapped all characters additionally used in the Arabic scripts of the Pashto, Sindhi, Punjabi, Urdu, Kashmiri, Turkic and other languages next to the Arabic and Persian letters that have been present in the keyboard layout before my engagement (the same way I have mapped virtually the whole Cyrillic to a Russian-based layout). I think you can comment at that gist for specific remarks about the layout, as those would be beyond the topic here. As for this issue here, when that keyboard layout is shipped the issue is solved on Linux – now that, as I have just while writing this comment seen, the default Persian layout already includes the embedding and override characters, and the default Hebrew one the RIGHT-TO-LEFT MARK and the LEFT-TO-RIGHT MARK, and my edition of the Arabic default keyboard layout stretches the signs out. I dare assume that the suggestion of writing bidirectional characters directly via the keyboard is sore persuasive. But the OP is from Israel according to his profile, so it becomes even more amusing to hear him complain about propositions to “Enter a character that's not found in any keyboard”, as the Hebrew base layout contains:
What, people can’t help themselves because they use Windows or macOS? Then this issue has to be closed because it is an operating system issue. |
Hey there! We're automatically closing this issue since there was no activity in this issue since 398 days ago. We therefore assume that the user has lost interest or resolved the problem on their own. Closed issues that remain inactive for a long period may get automatically locked. Don't worry though; if this is in error, let us know with a comment and we'll be happy to reopen the issue. Thanks! (Please note that this is an automated comment.) |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Howdy!
A recurring theme I've noticed in most chat programs today (Telegram included 😢) is that text direction of a given message is determined by the first (relevant) character, which is a shame because:
So I propose a better algorithm which isn't much more complicated to detect the desired direction of a message, count the number of character in each language/direction (excluding links), and the one with more characters in it wins. More formally:
Examples of this algorithm can be seen with Google Hangouts (which is the only chat I can tell that actually has a smarter algorithm than "look at the first character").
Of course, this is a proposal and the concrete algorithm is open to change, but I think that it's a very good compromise between code complexity and correctness.
The text was updated successfully, but these errors were encountered: