Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various: Simpler default geometries #1139

Merged
merged 0 commits into from Sep 5, 2018
Merged

Conversation

stepps00
Copy link
Contributor

@stepps00 stepps00 commented Apr 17, 2018

Fixes #1072. Fixes #834.

Work performed in this PR:

  • Move existing geometries to alt-geometries for any file over 10MB in size.
  • Add pointers/corrected all src:geom and src:geom_alt properties.
  • Update default geometries
  • Include new records (1158869063.geojson, 1158869033.geojson, 1158869045.geojson, 1158869069.geojson) that were recently added via Canada: Update counties #1235.
  • Update .gitattributes file to remove New Zealand default file and add New Zealand alt file.
  • File follow-up issue to move all alt-geometries to a new repository, see Various: Simpler default geometries #1139 (comment).
  • Dissolve new NZ geometry
  • Check validity of all geometries

@stepps00 stepps00 self-assigned this Apr 17, 2018
@stepps00
Copy link
Contributor Author

@nvkelso @thisisaaronland - any thoughts on where to store the existing geometry in the ~120MB New Zealand file? I could store it as an alt geometry, though git-lfs would still be required.

@nvkelso
Copy link
Contributor

nvkelso commented Apr 18, 2018

The original, super detailed New Zealand geometry came from Quattroshapes so someone can always just go find it again there. I'm fine removing it completely from WOF by not storing it as an alt-geom (and not even stashing it on S3 somewhere).

@thisisaaronland
Copy link
Contributor

It is late and I am tired so something something something... but:

Fundamentally the issue is that there will always be geometries (or records) that will bump up against the limits of service X, or at least the potential will always exist so in that regard things like NZ because they force us to think about how to reconcile things.

My preference would be to create an alt or groundtruth sister repository where the expectation is that you may need to use LFS, at least in the short-term and especially if that means we can purge the need for LFS from "core".

But like I said, I am tired...

@nvkelso
Copy link
Contributor

nvkelso commented Apr 18, 2018

Proposal:

  • Add new whosonfirst-data/whosonfirst-data-lfs repo to collect large files from the main data repo and persist them in an LFS managed repo. All other repos would no longe be lfs managed.
  • WOF features with "big geometry" (≥ 50 mb) in the whosonfirst-data/whosonfirst-data repo (but really any other related whosonfirst-data org repo?) would indicate there are additional alt geoms available by listing them in "src:geom_alt" but with a new lfs- prefix.
  • Add documentation to say alt geoms in the src list that have lfs- prefix should be found in that other repo.

@stepps00
Copy link
Contributor Author

I'm fine with that proposal. Right now we have two geometries over 50MB -- one for Australia, and one for New Zealand. If we go this route, I'll double check all records' file sizes before moving anything to a new repo and update this PR accordingly.

@thisisaaronland
Copy link
Contributor

I'm fine with this at first blush. I'd like a couple hours to think through any possible issues with the naming conventions.

@thisisaaronland
Copy link
Contributor

Sorry this fell through the cracks. I had a think about this over the weekend. I think the correct way to deal with this is to create a new set of whosonfirst-data-alt-* repositories that follow the same pattern as the existing repos.

  • whosonfirst-data continues to be the catch-all for core admin places and gets a whosonfirst-data-alt repo
  • Everything else gets a whosonfirst-data-alt-PLACETYPE-COUNTRY-MAYBE-REGION repos, as needed.

Calling stuff -lfs is a bit too specific and the issue is really:

  • There are alternate geometries that you may but probably don't care about
  • They might be really big
  • They will require extra work because computers (and the present)

It also means that its easy to create/check/work with "alt" directories since it's a simple, consistent check on the repo name (if placetype == "alt" ...)

@stepps00
Copy link
Contributor Author

stepps00 commented May 7, 2018

Sounds like a plan.. two questions:

  • Should we consider a file size threshold for alt-geometry files that get added to the alt geometry repositories? Maybe anything over 50MB?

  • How should we call out these alt geometry files in the properties? Right now, we have src:geom and src:geom_alt that point to other geojson files in the relative file path, but nothing to indicate an alt geometry is in a separate repository?

@thisisaaronland
Copy link
Contributor

To the first point: I don't think so. There will always file limits somewhere and ground truth geometries will always get bigger as sensors/measurements get more precise.

To the second point: The plan here is to move all alt geometries in to whosonfirst-data-alt-* repos, so we just have add an addendum to the docs.

@nvkelso
Copy link
Contributor

nvkelso commented May 9, 2018

I like this approach.

In terms of workflow, suggest merging existing PRs that are in-flight first as they have alt-geom changes that will be easier to deal with once than 2ce.

@stepps00 stepps00 changed the title WIP: Simpler default geometries [wip] Various: Simpler default geometries Jun 29, 2018
@stepps00
Copy link
Contributor Author

PR is ready for review. I've opened #1297 to track moving all alt-geometry files to a new repository.. this PR swaps geometries around within existing directories. I've also opened #1296 as a follow-up PIP task.

This PR swaps around geometries for records with large file sizes, cleans up properties (current=1, names, etc), alt-geometry files, and updated the .gitattributes file so git-lfs tracks the NZ alt file.

/cc @nvkelso

@stepps00 stepps00 changed the title [wip] Various: Simpler default geometries Various: Simpler default geometries Aug 31, 2018
@stepps00 stepps00 requested a review from nvkelso August 31, 2018 21:16
@nvkelso
Copy link
Contributor

nvkelso commented Aug 31, 2018

Please fixup New Zealand – it looks like a bad dissolve that left interior holes as stipples along the interior admin1 boundaries.

Please also run the same validity checks as the last PR to make sure these new geoms don't have any problems.

@nvkelso
Copy link
Contributor

nvkelso commented Aug 31, 2018

This includes the dozen+ feature changes I expected to see.

I've spot checked file size and they seem reasonable to me. Spot checked property changes is is reasonable.

The .gitignore file change LGTM.

Will review again once NZ geom is cleaned up and shapes are validated.

@stepps00
Copy link
Contributor Author

The most recent commit validates all geometries (three were caught as invalid).. I also swapped the bunk NZ (whosonfirst-sourced) geometry with the existing Natural Earth alt geometry, updated the src: props and removed the Natural Earth alt file.

@nvkelso
Copy link
Contributor

nvkelso commented Sep 4, 2018

So one last thing...

Please set all the simplified records so they include a revgeo property that points to the revgeo alt geometry that contains the high res original shape.

Copy link
Contributor

@nvkelso nvkelso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this one was a long time in the making!

Add the revgeo property pointer and then we're good to go.

@stepps00
Copy link
Contributor Author

stepps00 commented Sep 5, 2018

Most recent commit adds the reversegeo pointer to the more precise quattroshapes geometry.

@stepps00 stepps00 merged this pull request into master Sep 5, 2018
@stepps00 stepps00 deleted the stepps00/simpler-geoms branch September 5, 2018 00:30
@stepps00 stepps00 mentioned this pull request Sep 5, 2018
@thisisaaronland
Copy link
Contributor

Where is the repo with the large geometries?

@stepps00
Copy link
Contributor Author

stepps00 commented Sep 5, 2018

This PR moves the large geometries to alt files.. I'll track the new repo in #1297. This will be done as soon as I clear out the last of the open PRs.

I think we are all in agreement that alt-geometry files will be moved to a new whosonfirst-data-alt repository, so that'll be the first pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants