-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add population from GeoNames using concordance cross-walk #212
Comments
We have imported almost no data directly from Geonames. Quattroshapes is missing that information. If there needs to be a ticket to import GN stuff would you please create a different ticket. |
(Quattroshapes only includes population values when the source provided them, for all others it provides a 0 population. This is the case with London, with a 0 |
hi @nvkelso, I have an external contributor who is interested in having missing population data imported from geonames where concordances exist. is this planned as part of the work which you are doing this quarter around alt-names? also, is it possible to reopen this ticket? I'm confused about why it was closed and what a 'different ticket' would look like, this one seems to describe the issue sufficiently? |
@missinglink Contributions towards GeoNames.org population imports are welcome :) I've reopened this issue and retitled it. We are still planning on doing population work this quarter for WOF, but are focused more on alternate names at the moment. Since GeoNames.org changes, and since WOF add more features / more concordances on a regular basis, the contribution should be two fold:
For the PR, https://github.com/whosonfirst-data/whosonfirst-data/pull/754/files is a good reference:
The linked script is private and only generally applicable to this issue for the
@stepps00 is there anything else we should think about? |
I'll have a think about this.. but off the top of my head, we also need to include a |
thanks for re-opening! re: the first bullet point, the simplest way to source the populations from geonames is: awk -F"\t" '{ print $1 " " $15 }' allCountries.txt | grep -v ' 0$' > populations.txt this will create a 6MB file with two columns, the geonames id on the left and the population on the right. $ head -n5 populations.txt
3039154 1052
3039162 9448
3039163 8022
3039604 2363
3039676 3467 that file is small enough to be loaded in to memory and it would be a simple O(1) hashmap lookup to do per record when the imports are run, as you mentioned it would also require one or two IF statements to handle special conditions. I could write the code if it's helpful, @stepps00 could you point me to the right place? I would like to do it in an existing script which is already run regularly. I was hoping to simply add a |
hey @missinglink - that's great.. the script I've previously used for population imports for #240 can be found here: https://github.com/mapzen/whosonfirst-toolbox/blob/master/scripts/issue-240-wof_population.py That code used existing properties to build new population properties, but could easily be adjusted to import data from a secondary GN file. |
This script does the same GN concordance mapping that you suggested (starting at line 75). |
Scheduled for week of September 4. |
Related to: #240. |
We should always add a |
Output file of the |
Per IRL discussion with @nvkelso:
Since we have both Statoids and QS branches outstanding, we should wait to import anything in the test branch (we may be able to harvest more gn concordances for more records once all PRs are merged). |
This work is being tracked in: https://github.com/whosonfirst-data/whosonfirst-data/tree/stepps00/qs-point-import |
some geonames imports/concordances are missing the associated population metadata.
eg:
for the record above I'd expect the WOF record to have a
qs:pop
of7556900
I'm not sure about the scale of the problem, I found this when running our acceptance test suite and found that it affects searches for: brooklyn, london, portland, paris etc.
we are using the gn population data in pelias to score more populous places higher in the results (eg London UK vs. London ON) so it's important that we can get this fixed ASAP.
thanks!
[edit] maybe I'm getting confused between
qs:pop
andgn:pop
, is there such a thing asgn:pop
?The text was updated successfully, but these errors were encountered: