Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is unk_v? #2

Closed
hugovk opened this issue Sep 19, 2015 · 6 comments
Closed

What is unk_v? #2

hugovk opened this issue Sep 19, 2015 · 6 comments

Comments

@hugovk
Copy link
Contributor

hugovk commented Sep 19, 2015

For example:

name:unk_v: [
"Finlandía",
"Finljandija",
"Finrando",
"Suomen Tasavalta"
],

http://whosonfirst.mapzen.com/spelunker/id/85633143/

Are they unknown names? Should they be removed and placed into the correct language name?

For example, "Suomen Tasavalta" is the full, official, Finnish name for the "Republic of Finland", which is already listed under name:fin_v.


As a side issue, name:fin_v has some strange things in it:

"Pohjois-suomen" means "North Finland's", "Suomea" refers to the language, "Suomeen" is "to Finland", "Suomeksi" is "in Finnish", "Suomelta" is "from Finland", etc. "Suomen Tasavalta" is probably the only one wanted.

The short, common name "Suomi" is already in name:fin_p.

Further, please could you define in this README how WOF uses _v and _p and other suffixes, if different from WOE?

Specifically, where should the full, official, Finnish name "Suomen Tasavalta" be defined, and where the common, shorter Finnish name "Suomi"?


Meta question: should these issues generally be in whosonfirst-names or whosonfirst-data?

@thisisaaronland
Copy link
Member

A few comments:

  • unk_ looks like a Geoplanet-ism and should really be und_ or undefined, per this:

http://www.loc.gov/standards/iso639-2/php/English_list.php

That und_ could have a suffix (preferred, variant, etc) is a bit weird on the face of it since you know... undefined. One thing at a time.

  • Where appropriate, things in unk_ or und_ should be placed in their correct language buckets.
  • Currently there are no deviations from the basic Geoplanet convention but that doesn't mean there aren't errors (or lapses) of judgement in data imported from Geoplanet (Suomen Tasavalta vs. Suomi)
  • Currently we only define ISO-639-2 language codes, but we will shortly be incorporating support for script subtags:

http://www.w3.org/International/questions/qa-choosing-language-tags
http://www.w3.org/International/articles/language-tags/
https://tools.ietf.org/html/rfc5646

The convention, historically, has been to use two-letter codes for country/subtags notation but given the fact that we are already using the Geoplanet syntax we may just stick with three letter codes. It will all be documented though :-)

Finally, this is a good place for syntax/convention specific issues or questions.

@thisisaaronland
Copy link
Member

Related:

https://github.com/whosonfirst/py-mapzen-whosonfirst-languages#mapzenwhosonfirstlanguagessubtags

I suspect what we will do is follow the lead of the Unicode Consortium and publish a "conformance" and "conversion" document:

http://www.unicode.org/reports/tr35/#BCP_47_Conformance

The only question (I think) at this point is whether to expand the "_p/s/v" suffixes in to use "x" (subtag) extensions with a fully qualified label (preferred, colloquial, variant) ...

@thisisaaronland
Copy link
Member

Also this (which is not being implemented anywhere yet):

https://github.com/whosonfirst/py-mapzen-whosonfirst-names

@stepps00
Copy link
Member

I'm going to close this issue, as this should no longer appear in the Finland record, or any other record in Who's On First.

Name properties should now follow one of the following conventions:

name:{iso lang code}_x_preferred
name:{iso lang code}_x_variant
name:{iso lang code}_x_historical
name:{iso lang code}_x_colloquial
name:{iso lang code}_x_unknown

See also: https://github.com/whosonfirst/whosonfirst-properties/blob/master/properties/name.md

@laszbalo
Copy link

laszbalo commented Sep 1, 2018

Hi, I am dealing with UAE localities at the moment, and the following name key comes up very often:

name:unk_x_variant 

e.g. Abu Dhabi or ضدنة

According to the IANA' language subtag registry, the unk language subtag refers to the language spoken by the Enawené-Nawé people.

Provided the frequency ot the name:unk_x_variant key and the fact that the Enawené-Nawé language is spoken only by a handful of indigenous people in the Brazilian rain-forest, it seems unlikely that the unk language subtag refers to the above language.

Am I safe to assume that unk is the unknown value inherieted from Geoplanet?

@nvkelso
Copy link
Member

nvkelso commented Sep 1, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants