automate/improve infrastructure #59

andrewljohnson · 2016-06-22T16:45:36Z

This issue describes how deeposm works now. Then it describes changes needed to improve the infrastructure. Also see notes/scripts on these issues: #8, #23, #30, and #39 (these issues were closed to merge with this issue, not completed).

Test Data and Training

one app does both the data prep, and neural net training
it uses a GPU/Tensorflow on a Linux box in my office
findings are then uploaded to S3

Display on deeposm.org

when a deeposm.org page is loaded, it checks S3, grabs findings, and updates the database
deeposm.org shows where DeepOSM detects mis-registered roads

Issues with this Setup

The scripts to gather data, train, and upload findings should run on a cycle, not manually when I press a button
The data prep and training modules should be separate - DeepOSM has one monster Dockerfile.gpu-devel that includes GDAL, Tensorflow, and more. This makes the build fragile and hard to deploy.
Actual work includes:
- move the analysis to AWS, run on a cycle
- set up a cron job to have deeposm.org check for new findings
- parallelize the analysis, so we can do deeper nets and more area
- import NAIPs into Postgres, instead of hacking them up and caching files
- use Overpass or other approach to getting OSM data, instead of hacking up PBF extracts with Osmium

anandthakker · 2016-06-22T16:50:19Z

use Overpass or other approach to getting OSM data, instead of hacking up PBF extracts with Osmium

@andrewljohnson have you considered OSM QA Tiles?

andrewljohnson · 2016-06-22T16:55:38Z

@anandthakker I'm probably wrong, but I decided it doesn't really help me to use "tiled" data, because my source imagery isn't projected to Mercator, or tiled into a TMS pyramid?

When I did my first run at this, I used Mapzen vector tiles, plus TMS imagery tiles (since they align). But when I switched to NAIPs, it seemed to make more sense (less code) to clip some line strings to the bounds of my NAIPs or arbitrary tiles.

So that's also why I think my end solution is a planet DB with an API, and maybe that API is Overpass, or something simple I just cook up for this use case?

anandthakker · 2016-06-22T17:02:52Z

Ah -- yeah, I see what you mean. I went the tiled route for skynet-data basically because between existing tiled satellite source (Mapbox Satellite, DG, etc.) and OSM QA tiles, I figured I could skip a lot of the work. Since you've already handled chopping up the NAIP images, I agree that using tiles provides less of a benefit... Although, I guess one upside to going the tiled route would be that maybe it would be easier, in the future, to swap in a different imagery source (without your having to host/maintain/process it as much). Might not be worth it if you don't see yourself going that route, though

brandongalbraith · 2016-07-22T03:59:14Z

Have you considered running the analysis at Digital Ocean and pushing the results into S3 from there? The compute at DO is significantly cheaper than at AWS.

andrewljohnson · 2016-07-22T04:00:24Z

@brandongalbraith The analysis runs at home on a Linux box I have.

also: I might try out some Google ML infrastructure, just got my invite yesterday.

brandongalbraith · 2016-07-22T04:05:20Z

@andrewljohnson

Sorry about that! I inferred from above:

Actual work includes:
move the analysis to AWS, run on a cycle

that the analysis was planned to run in AWS. Happy to help break everything apart to scale it out, my day gig is devops/infrastructure.

andrewljohnson · 2016-07-22T04:54:30Z

@brandongalbraith i realized that after I answered too fast :)

I guess the right answer is I hadn't much thought about it.

andrewljohnson added help wanted Infrastructure labels Jun 22, 2016

CloCkWeRX mentioned this issue Aug 8, 2016

Deeposm.org UI: Add "Flag as fixed" control #67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automate/improve infrastructure #59

automate/improve infrastructure #59

andrewljohnson commented Jun 22, 2016 •

edited

Loading

anandthakker commented Jun 22, 2016

andrewljohnson commented Jun 22, 2016

anandthakker commented Jun 22, 2016

brandongalbraith commented Jul 22, 2016

andrewljohnson commented Jul 22, 2016 •

edited

Loading

brandongalbraith commented Jul 22, 2016 •

edited

Loading

andrewljohnson commented Jul 22, 2016

automate/improve infrastructure #59

automate/improve infrastructure #59

Comments

andrewljohnson commented Jun 22, 2016 • edited Loading

Test Data and Training

Display on deeposm.org

Issues with this Setup

anandthakker commented Jun 22, 2016

andrewljohnson commented Jun 22, 2016

anandthakker commented Jun 22, 2016

brandongalbraith commented Jul 22, 2016

andrewljohnson commented Jul 22, 2016 • edited Loading

brandongalbraith commented Jul 22, 2016 • edited Loading

andrewljohnson commented Jul 22, 2016

andrewljohnson commented Jun 22, 2016 •

edited

Loading

andrewljohnson commented Jul 22, 2016 •

edited

Loading

brandongalbraith commented Jul 22, 2016 •

edited

Loading