Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update geometries from geonames? #1

Open
missinglink opened this issue Mar 14, 2021 · 4 comments
Open

update geometries from geonames? #1

missinglink opened this issue Mar 14, 2021 · 4 comments

Comments

@missinglink
Copy link

The geometries in this repo are currently all 0,0.

I had a quick look at it this morning and it looks like we can source point geometries for ~21k of the ~25k postcodes in this repo from Geonames:

sqlite3 whosonfirst-data-postalcode-pl-latest.db -separator $'\t' $'SELECT id, json_extract(body, \'$.properties."wof:name"\') from geojson' > wof.txt
aria2c http://www.geonames.org/export/zip/PL.zip
join -12 -22 <(sort -k2 wof.txt) <(sort -k2 PL.txt) | head 
00-001 421385723 PL Warszawa Mazowieckie Warszawa 52.25 21 4
00-002 421412029 PL Warszawa Mazowieckie Warszawa 52.25 21 4
00-003 421395245 PL Warszawa Mazowieckie Warszawa 52.25 21 4
00-004 421395247 PL Warszawa Mazowieckie Warszawa 52.25 21 4
00-005 421395249 PL Warszawa Mazowieckie Warszawa 52.25 21 4
00-006 421395251 PL Warszawa Mazowieckie Warszawa 52.25 21 4
00-007 421395253 PL Warszawa Mazowieckie Warszawa 52.25 21 4
00-008 421395257 PL Warszawa Mazowieckie Warszawa 52.25 21 4
00-009 421395259 PL Warszawa Mazowieckie Warszawa 52.25 21 4
00-010 421395261 PL Warszawa Mazowieckie Warszawa 52.25 21 4

(the second column is the WOF ID)

@missinglink
Copy link
Author

I've not worked with these Geonames postcode files before, my initial impression is that the coordinates appear to be very low precision and duplicated.

Not sure if this is 'normal' although it might indicate an error in the data or how Geonames processes it.

@stepps00
Copy link

The initial postalcode import (6+ years ago) into Who's On First came via GeoPlanet, which mostly contained geometries with 0,0 coordinates.

GeoNames exists as a source in Who's On First already, so assuming the same license applies to GeoNames' postalcode data, we should be able to import.

@missinglink In previous imports of GeoNames' locality data, I've noticed the same low geometric precision. IMO, this is preferred over the 0,0 coordinates though. Would you be interested in crafting a PR with updates? If not, I can try to spend some time on this later in the week.

@missinglink
Copy link
Author

Yeah I can do a PR no problem, I'm just holding off waiting for a reply from the OP with their local knowledge of the postcodes and their accuracy before continuing. I'd prefer not to waste time if the GN postcodes are really bad.

My inclination is more toward "nothing is better than wrong" so I'd prefer to have 0,0 than very low precision positions.

The advantage to us is that we can exclude null island geometries easily but don't have a way of excluding low quality/precision geometries.

The problem with having some really good quality geoms and some really low quality geoms is that people always judge the quality of your dataset/product/service by the worst example they can find, and then usually make assumptions that the rest is of the same quality 🤷‍♂️

@missinglink
Copy link
Author

For example there are multiple postcodes in the GN dataset with the coords 52.25,21.

Sorry I should have really done my homework before opening this issue but the OP was claiming they were significantly better in GN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants