Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import NAIP data into a postgres database #23

Closed
andrewljohnson opened this issue May 10, 2016 · 2 comments
Closed

import NAIP data into a postgres database #23

andrewljohnson opened this issue May 10, 2016 · 2 comments

Comments

@andrewljohnson
Copy link
Contributor

andrewljohnson commented May 10, 2016

Putting the data in Postgres seems like a good mid-game/end-game move. Do this after we put up deeposm.org, want to scale, and/or want to provide a place for researchers to run arbitrary experiments.

Benefits include:

  • make rotating tiles easier (issue rotate training images  #24) - current pipeline could me modified too, but more hacky
  • maybe easier to do bounding box queries, than if data is cached in a non-relational way from NAIPs to disk
  • enable an API that would allow for more arbitrary training data, in less disk space
@andrewljohnson andrewljohnson changed the title provide training data in arbitrary batches provide training data in arbitrary batches (fix memory bulge in data pipeline) May 13, 2016
@silberman
Copy link
Contributor

silberman commented May 13, 2016

...we provide a function where you could say "give me 40,000 random tiles from within these long,lat bounding boxes, and label them using these labeller functions I want to try", and that would be fast.

(A "labeller" function being something that takes a long,lat bounding box and returns some numpy array, with simple ones that return 1:1 arrays of one of the RGBI bands, or more complex ones like the has_center_road and its various permutations, or ones that map 64x64 to 4x4 binary has-road, or has-tennis, etc)

We could still cache it if we want, (as is being discussed in #30 ), though I think we can ditch all the NAIP-specific details, and just save to NetCDF the arrays that are going straight into tensorflow, plus some metadata about the experiment if we want.

@andrewljohnson andrewljohnson changed the title provide training data in arbitrary batches (fix memory bulge in data pipeline) import NAIP data into a postgres database May 29, 2016
@andrewljohnson
Copy link
Contributor Author

merging with other infrastructure issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants