-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the street name and sign data processing include language and pronunciations #4268
Conversation
…onunciations_mb_v2
@@ -59,6 +59,70 @@ uint32_t GetMultiPolyId(const std::multimap<uint32_t, multi_polygon_type>& polys | |||
return index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes in the database are breaking right, meaning old code can't use new database (or maybe it can because we just added a couple columns?) but new code certainly cant use old databases. i think this is ok because it doesnt mean compatibility is broken for routing but for data building. i just wanted to call it out. maybe worth putting in the pr description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinkreiser no you can use an old db but you will be missing languages. I just tested this with PA with an old db and it did not crash, but of course no languages are returned.
…mb_v2' into gk_add_languages_pronunciations_mb_v2
i guess mac builds are now broken project wide... i freaking hate CI. i know they have to change and update and stuff but im completely sick of it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok 2 more changes, fix the order of the entries in changelog and undo the formatting of the taginfo.json (you switched from 2 spaces to 3)
I remember an email about a deprecated resource class and I kinda thought I/you PR'd that, but maybe I just mentioned it in some chat and forgot about it. Should be an easy fix. I'm more concerned over M1-only from Jan 24 on, but yeah, that pretty much falls in line with
|
…lhalla/valhalla into gk_add_languages_pronunciations_mb_v2
@kevinkreiser Sorry about that....fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need 4 copies of the same tests?
in gtest its easy to do permutations where you just vary some component of the test for each run. in this case the pronunciation tag. i see you also use it for the enums from baldr and the directories but its really very trivial to make these test function generic and then call them with permutations. makes it much much easier to maintain, especially when we are talking about a test which is 3k lines
You could use parametrized fixtures in gtest. Something like this to call the tests with different languages. |
@mandeepsandhu yep that was my plan. ty |
…lhalla/valhalla into gk_add_languages_pronunciations_mb_v2
@kevinkreiser done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is waaaaay tooooo much code. there has to be more succinct ways of doing this :)
Issue
Update the street name and sign data processing include language and pronunciations
Data processing updates
The heart of this work involves processing
name:<lg>
anddestination:<lg>
tags. This includes all variations of each of those tags (e.g.,name:left:<lg>, name:right<lg>, official_name:<lg>, destination:street:lang:<lg>, destination:ref:lang:<lg>, etc.
) Within the tag the lg stands for the language.Language updates
In order to determine the languages for an LL we utilized the default_language that is defined for a country or providence/state. The problem with this is the fact that some areas in the world are multi-lingual and don't have one language (e.g., in Belgium they support Dutch, French, and German and in Switzerland they speak German, French, Romanish, and Italian). However, the value of the default_language would usually contain only one language. In order to resolve this issue, the administrative builder was updated to handle these special cases like Brussels via processing relations with
boundary=political
withpolitical_division=linguistic_community
to get areas that are bilingual. However, this logic was still not enough as other areas did not have these special polygons. Therefore, we added an "override" for the languages. So, in the future if we determine that an area is multilingual and we want to support additional language tags in this area. All we have to do is add them to thesupported languages
list. Our list currently consists of the following:Wales = cy
United Kingdom = en
Ireland = ga
Northern Ireland = ga
Japan = ja and en
Canada = en and fr
Belarus = ru and be
Singapore = en, zh, ms, and ta
Saudi Arabia = ar and en
Using the languages for the keys
The processing of the new keys is based on the languages that were determined to be "good" for that country or area. OSM users will typically add languages for names and destinations in all parts of the world even though that language may not be spoken in that country. For example, 5th Avenue in NYC has name:ru=5-я авеню. Obviously, we do not want to process the Russian name here. Therefore, using the default languages we can toss the tags with languages that we don't want to support. Moreover, we can create a hierarchy for our languages. For instance, Canada supports both English and French. However, in Ottawa, English will be first and in the Québec province French will be first.
Edge Cases
We will now support names where they differ depending on which side of the street you are driving on. When combined with the multi-languages for some areas, it gets very complex. In this example, the official Dutch name differs depending on the municipality. Basically, the border of the towns runs down the middle of the road and on the right side the Dutch name of the street differs from the left side of the street. In part, this leads to the bizarre situation that the street on the Molenbeek side is called Steenweg op Gent and on the Koekelberg side Gentsesteenweg. However, the French name of the street does not change at all.
Data Before/After Examples
Chaussée de Gand - Steenweg op Gent/Gentsesteenweg Example
Notice that all dashes are removed and processed correctly.
Driving left to right the Dutch name should be Steenweg op Gent and the French street name(Chaussée de Gand) does not change. Notice that before we used to return the name tag with dashes and did not have the French and Dutch street names split up.
Before
After
Driving right to left the Dutch name should be Gentsesteenweg and the French street name(Chaussée de Gand) does not change.
Before
After
name:forward and name:backward is now processed correctly.
Notice in this example we have name:forward and name:backward tags set; however, before we would just process the name tag.
Before - Waltonville Road and Quarry Road returned regardless of direction
After - Quarry Road correctly returned
Before - Waltonville Road returned regardless of direction
After - Waltonville Road correctly returned.
Multilingual names are now processed.
Notice in this example the name tag has both Welsh and English set. Since we are in Wales we allow both of these languages and process them both.
Before - The name and name:en tag are both returned for the street: Stryd y Castell / Castle Street/Castle Street
After - Stryd y Castell and Castle Street both processed correctly and a language of cy is set for Stryd y Castell and en for Castle Street
Contributors @gknisely @dgearhart