-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot visually differenciate between Latin & Cyrillic characters #343
Comments
When taginfo updates its database it also gets the unicode database of characters. My plan was to "somehow" do something like what your I think the question here is: What would actually be useful to show (as compared to just interesting)? What problem are we trying to solve? Then we can think about how to do that. |
As a person who speaks Bulgarian (which uses Cyrillic letters) I can offer some insight.
That's true, but it would be nice to know when Cyrillic and Latin letters are mixed in the same key value. Usually that's something that should be avoided (there may be edge cases but it's definitely an error when it happens on
It would be useful to show the glyph and character name like uniwhat shows them.
Mixing Latin and Cyrillic letters in the same key (mostly |
Using different font or colours for entire values would be not only not useful, but also harmful, for reading mixed-language values would become harder and slower (think numbers). What could be useful is flagging values with words that contain letters from different alphabets. For example, |
On Fri, 03 Dec 2021 14:07 +01:00, Jochen Topf ***@***.***> wrote:
What problem are we trying to solve? Then we can think about *how* to do that.
Someone in Bulgaria noticed the 2 different `addr:city` values. They wanted to fix the OSM data. If they pressed the “Overpass Turbo”/“JOSM” link they could open it and fix it. But they didn't know which one was right, and which was wrong. I copy & pasted from taginfo website and ran it through `uniwhat` to figure out which was right. That's a problem. I'm not sure how to solve it...
|
On Fri, 03 Dec 2021 14:07 +01:00, Jochen Topf ***@***.***> wrote:
My plan was to "somehow" do something like what your
`uniwhat` examples shows. But I never could figure out what to do
exactly and how to show it.
What about another tab, which shows the detailed break down of the unicode characters in the key, and value, which shows similar output to `uniwhat`? Then you are not deciding “Latin alphabet is right”, you are merely providing a deep dive into the “binary” representation of the tag.
In the case of homoglyphs, a mapper can deduce which is latin & which is cyrllic.
|
This tab shows a table with all unicode characters used in the key/tag/relation. See #343
This was one of those "how hard can it be?" things I thought I can quickly do... Took me two days of fiddling around with strange tables of unicode scripts, properties, etc. But now there is a new "Characters" tab on key/tag/relation pages which show a table of all characters used along with the script and unicode general category and unicode name of that code point. |
https://twitter.com/Pinboard/status/761656824202276864 Thanks, I think this solves the root cause of this issue. 🙂 |
Currently there are 970
addr:city=Русе
, and 125addr:city=Pyce
. The first is in Cyrillilc, the second is a mistake using latin characters that look like the cyrillic.Currently, a user cannot tell apart which is which because they all look the same in taginfo. I think this is a lacking feature, but I'm unsure what the solution is.
<span title=…
/<acronym>
tags on all the letters in a key/value (seems a bit heavy)and
The text was updated successfully, but these errors were encountered: