Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automate/improve infrastructure #59

Open
andrewljohnson opened this issue Jun 22, 2016 · 7 comments
Open

automate/improve infrastructure #59

andrewljohnson opened this issue Jun 22, 2016 · 7 comments

Comments

@andrewljohnson
Copy link
Contributor

andrewljohnson commented Jun 22, 2016

This issue describes how deeposm works now. Then it describes changes needed to improve the infrastructure. Also see notes/scripts on these issues: #8, #23, #30, and #39 (these issues were closed to merge with this issue, not completed).

Test Data and Training

  • one app does both the data prep, and neural net training
  • it uses a GPU/Tensorflow on a Linux box in my office
  • findings are then uploaded to S3

Display on deeposm.org

  • when a deeposm.org page is loaded, it checks S3, grabs findings, and updates the database
  • deeposm.org shows where DeepOSM detects mis-registered roads

Issues with this Setup

  • The scripts to gather data, train, and upload findings should run on a cycle, not manually when I press a button
  • The data prep and training modules should be separate - DeepOSM has one monster Dockerfile.gpu-devel that includes GDAL, Tensorflow, and more. This makes the build fragile and hard to deploy.
  • Actual work includes:
    • move the analysis to AWS, run on a cycle
    • set up a cron job to have deeposm.org check for new findings
    • parallelize the analysis, so we can do deeper nets and more area
    • import NAIPs into Postgres, instead of hacking them up and caching files
    • use Overpass or other approach to getting OSM data, instead of hacking up PBF extracts with Osmium
@anandthakker
Copy link

use Overpass or other approach to getting OSM data, instead of hacking up PBF extracts with Osmium

@andrewljohnson have you considered OSM QA Tiles?

@andrewljohnson
Copy link
Contributor Author

@anandthakker I'm probably wrong, but I decided it doesn't really help me to use "tiled" data, because my source imagery isn't projected to Mercator, or tiled into a TMS pyramid?

When I did my first run at this, I used Mapzen vector tiles, plus TMS imagery tiles (since they align). But when I switched to NAIPs, it seemed to make more sense (less code) to clip some line strings to the bounds of my NAIPs or arbitrary tiles.

So that's also why I think my end solution is a planet DB with an API, and maybe that API is Overpass, or something simple I just cook up for this use case?

@anandthakker
Copy link

Ah -- yeah, I see what you mean. I went the tiled route for skynet-data basically because between existing tiled satellite source (Mapbox Satellite, DG, etc.) and OSM QA tiles, I figured I could skip a lot of the work. Since you've already handled chopping up the NAIP images, I agree that using tiles provides less of a benefit... Although, I guess one upside to going the tiled route would be that maybe it would be easier, in the future, to swap in a different imagery source (without your having to host/maintain/process it as much). Might not be worth it if you don't see yourself going that route, though

@brandongalbraith
Copy link

Have you considered running the analysis at Digital Ocean and pushing the results into S3 from there? The compute at DO is significantly cheaper than at AWS.

@andrewljohnson
Copy link
Contributor Author

andrewljohnson commented Jul 22, 2016

@brandongalbraith The analysis runs at home on a Linux box I have.

also: I might try out some Google ML infrastructure, just got my invite yesterday.

@brandongalbraith
Copy link

brandongalbraith commented Jul 22, 2016

@andrewljohnson

Sorry about that! I inferred from above:

Actual work includes:
move the analysis to AWS, run on a cycle

that the analysis was planned to run in AWS. Happy to help break everything apart to scale it out, my day gig is devops/infrastructure.

@andrewljohnson
Copy link
Contributor Author

@brandongalbraith i realized that after I answered too fast :)

I guess the right answer is I hadn't much thought about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants