Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose wof:population property #240

Closed
4 of 5 tasks
trescube opened this issue Mar 31, 2016 · 22 comments · Fixed by #754
Closed
4 of 5 tasks

Expose wof:population property #240

trescube opened this issue Mar 31, 2016 · 22 comments · Fixed by #754
Assignees

Comments

@trescube
Copy link
Contributor

trescube commented Mar 31, 2016

The Pelias team has been pulling population data from the gn:population property and have just realized that we need to pull from zs:pop10 for presumably neighborhoods. Exposing a wof:population property would be great for an authoritative field and reducing complexity in our logic.

Note from @nvkelso, with exact logic described in #240 (comment):

  • No PIPing required as this will be a property only change, for all administrative records in the generic whosonfirst-date repo.
  • No modification of existing properties, only addition of the following common properties.
  • Add new wof:population
    • a positive integer value
    • when not known don't set property
  • Add new src:population
    • a string from one of registered the wof:sources
    • should always be present when wof:population is present (e.g., if not known then default to unknown)
  • Add new src:population:date
    • a EDTF date, with default of uuuu when unknown – most of our current data is unknown dates.
    • always present if there is wof:population value
  • Add new src:population:method
    • not for this commit
    • string value of: census, estimate, etc
  • Add new wof:population_rank
    • a calculated integer when wof:population is present, but...
    • if no present wof:population, then okay for us to fill this in anyhow manually, which is how many features coming from Quattroshapes got their ranks.

Question:

  • What do we do with historic population time series?
  • I propose punting that for another day, but here's an idea we talked about:

Optional data structure where wof:population_timeseries is an array, like:

'wof:population_timeseries': [
     { population: X1, source: Y1, date: Z1, method: A1, population_rank: B1}, 
     { population: X2, source: Y2, date: Z2, method: A2, population_rank: B2}, 
     { population: X3, source: Y3, date: Z3, method: A3, population_rank: B3}
}
@stepps00
Copy link
Contributor

stepps00 commented Aug 2, 2016

@trescube - we have a sample for this work in the above commits, it'll take a bit more work to iterate over all other features and set this new property.

Is there a location that you'd like us to start beginning this wof:population field work?

@nvkelso
Copy link
Contributor

nvkelso commented Aug 2, 2016

I heard rumors of New York and it's boroughs?

@stepps00
Copy link
Contributor

stepps00 commented Aug 2, 2016

@nvkelso that was an example I just came up with... but we could start there.

@trescube
Copy link
Contributor Author

Sorry, lost track of this, but NYC boroughs would be great!

@stepps00 stepps00 self-assigned this Aug 17, 2016
@stepps00
Copy link
Contributor

We expect new popluation data to come via Statoids using the hasc:id concordance.

Related issues: #581 and #380

@nvkelso
Copy link
Contributor

nvkelso commented Jan 23, 2017

We'd add:

"wof:population" and "src:population".

Might also be nice to include the approximate year the population was from, and if from census, estimate?

@missinglink
Copy link
Contributor

missinglink commented Apr 18, 2017

hey, I'm trying to extract population info from WOF and the function currently looks like this:

function getPopulation( wof ) {
       if( wof['mz:population'] ){ return wof['mz:population']; }
  else if( wof['gn:population'] ){ return wof['gn:population']; }
  else if( wof['zs:pop10'] ){      return wof['zs:pop10']; }
  else if( wof['qs:pop'] ){        return wof['qs:pop']; }
  else if( wof['wk:population'] ){ return wof['wk:population']; }
}

it sounds like I will also need to add src:population, is there a way we could merge these fields (in order of preference) in to the mz:population property on the data end, so we can avoid having to do this as a data consumer?

@missinglink
Copy link
Contributor

oh wait, is mz:population a thing or is that what wof:population is?

@missinglink
Copy link
Contributor

we need taginfo for WOF! :)

@missinglink
Copy link
Contributor

missinglink commented Apr 18, 2017

here's some stats on the availability of population fields across the admin entities in WOF:

      1 qs_pop
      3 wof:population
    116 meso:pop
    261 ne:pop_est
   5946 wk:population
  17422 qs:gn_pop
  40075 gn:pop
  43572 zs:pop10
  72319 qs:pop_sr
  91734 gn:population
 184150 qs:pop

@nvkelso
Copy link
Contributor

nvkelso commented Apr 18, 2017

Thanks for the nudge, this is on our "next" list for the quarter.

Population property priorities

I rank them (top is more important):

      0 mz:population (a hypothetical override)
      3 wof:population (default)
   5946 wk:population (because Wikipedia has more active editing)
  91734 gn:population (because GeoNames could have later updates over QS)
  40075 gn:pop (this is an alias / alternate procedure for gn:population)
 184150 qs:pop (this was a one time build that went stale, but has more overall coverage)
  17422 qs:gn_pop (we know this to be stale, and it should already be disambiguated into qs:pop)
  43572 zs:pop10 (this just works in USA for neighbourhoods)
    116 meso:pop (this is very rare, I'm hesitant to use it)
    0 statoids:population (this will be imported with Statoids work)
    261 ne:pop_est (old data, and at different resolutions)

Population ranks:

qs:pop_sr isn't a population, it's a population rank and is useful to know relative population range when there's no known population value.

We should promote this to wof:population_rank – as I recommend sort first by the rank, then by wof:population.

  72319 qs:pop_sr

From very old Natural Earth and Quattroshapes notes...

Where the values represent he following ranges:

  • 14: 10m+ urban areas
  • 13: 5m to 10m Urban areas
  • 12: 1m to 5m Urban areas
  • 11: 500k to 1m
  • 10: 200k to 500k
  • 9: 100k to 200k
  • 8: 50k to 100k
  • 7: 20k to 50k
  • 6: 10k to 20k
  • 5: 5k to 10k
  • 4: 2k to 5k
  • 3: 1k to 2k
  • 2: 200 to 1k
  • 1: Less than 200
  • 0: locale (no population) / or less than 200 people? << Or places that only are visible at 50k scale or larger (but not hoods)

VisualBasic code for ArcMap Field Calculator advanced area (I know, right!?):

a = [wof:population]

if( a >= 10000000 ) then
x = 14
elseif( a >= 5000000 ) then
x = 13
elseif( a >= 1000000 ) then
x = 12
elseif( a >= 500000 ) then
x = 11
elseif( a >= 200000 ) then
x = 10
elseif( a >= 100000 ) then
x = 9
elseif( a >= 50000 ) then
x = 8
elseif( a >= 20000 ) then
x = 7
elseif( a >= 10000 ) then
x = 6
elseif( a >= 5000 ) then
x = 5
elseif( a >= 2000 ) then
x = 4
elseif( a >= 1000 ) then
x = 3
elseif( a >= 200 ) then
x = 2
elseif( a > 0 ) then
x = 1
else
x = 0
end if

Junk:

I think this is just an error and shouldn't be included in logic:

 1 qs_pop

@missinglink
Copy link
Contributor

👍 very useful, thanks!

@missinglink
Copy link
Contributor

one more thing to add to this ticket is that populations should always be positive integers, I am finding values such as:

"ne:pop_est":-99,

@nvkelso
Copy link
Contributor

nvkelso commented Apr 20, 2017 via email

@nvkelso
Copy link
Contributor

nvkelso commented Apr 28, 2017

Canada first, rest of world next ;)

@missinglink
Copy link
Contributor

missinglink commented May 16, 2017

heya, I also noticed that the population counts are sometimes off/old, eg:

Los Angeles (85923517, population ~4M)
London (101750367, population ~8M)

looking through https://en.wikipedia.org/wiki/Megacity it looks like the correct counts should be more in the region of 18.5M & 13.8M.

@nvkelso
Copy link
Contributor

nvkelso commented May 23, 2017

The counts will be off, because we record the population for the incorporated locality, but the "megacity" population will be for the metropolitan area (which only has a few samples in WOF now). We can setup some relationships between the "central" city/cities of the metropolitan area to help with this.

@nvkelso nvkelso changed the title Expose wof:population property BLOCKED: Expose wof:population property Jun 30, 2017
@nvkelso
Copy link
Contributor

nvkelso commented Jun 30, 2017

Blocked on mesoshape imports here: #581.

@stepps00 stepps00 changed the title BLOCKED: Expose wof:population property Expose wof:population property Sep 7, 2017
@stepps00
Copy link
Contributor

stepps00 commented Sep 7, 2017

This is no longer "blocked" - much of this work is being done through admin0, admin1, and admin2 imports of Statoids data.

@stepps00
Copy link
Contributor

The Statoids PRs have now been merged, which included many new population properties.

@nvkelso - would you like to close this issue or keep it open as an ongoing issue?

@nvkelso
Copy link
Contributor

nvkelso commented Oct 18, 2017

Once this PR is merged, we'll be good to close this issue: #824

@stepps00
Copy link
Contributor

stepps00 commented Nov 1, 2017

#824 is closed, closing this issue.

@stepps00 stepps00 closed this as completed Nov 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants