Refactor RMSE_test calculation using pygmt.grdtrack #149

weiji14 · 2019-06-01T20:45:01Z

Simplifying and hopefully speeding up the code used to get interpolated grid values at certain groundtruth points. Previously used !gmt grdtrack which is a bash hack, but now there's pygmt.grdtrack that is a whole lot nicer! Instead of having temporary .csv and .nc files, we could simply pass in a pandas.DataFrame and xarray.DataArray in memory to pygmt.grdtrack, and get a pandas.DataFrame output we can analyze straightaway!

TODO:

Hotfix missing-one-row from pandas.csv load, oh the I/O misery! (1f16898)
Input pandas.DataFrame directly into grdtrack instead of using intermediate .xyz file (e44c7c6)
Attempt to input xarray.DataArray directly into grdtrack instead of using intermediate .nc file (66b97d2)

Minor mistake in the Root Mean Square Error (RMSE) calculation due to 1 row not being loaded with pandas.read_csv. RMSE between groundtruth xyz tracks and interpolated grids (as reported in e27ac4a) rounded to 4 decimal points should be: - for groundtruth - 7.3182 instead of 7.3183 - for baseline bicubic - 63.7952 instead of 63.7947 - for deepbedmap3 v0.8.0 - 47.0130 instead of 47.0186 This also affects earlier commits, but as it is only 1 row missing out of 42212, there is only a small decimal difference that should not really affect the previous conclusions/interpretations.

review-notebook-app · 2019-06-01T20:45:13Z

Check out this pull request on ReviewNB: https://app.reviewnb.com/weiji14/deepbedmap/pull/149

Visit www.reviewnb.com to know how we simplify your Jupyter Notebook workflows.

Refactor get_deepbedmap_test_result function to not have intermediate .xyz or .xyzi files anymore, and more importantly, replace `!gmt grdtrack` with `pygmt.grdtrack`! That way, black will finally play nicely when linting srgan_train.py and deepbedmap.py, and there might be some speed improvements. Also removed the redo_testtrack flag. Somewhat unfortunately, that means our I/O time savings might not matter overall, as we've removed caching of the track_test.xyz file in hyperparameter tuning experiments. Who cares, since there's less hideous code to maintain!

Extends xarray with a special accessor for automated projection parsing! Package repository at https://github.com/fmaussion/salem. Also added required dependency joblib.

Patches 863d3be with a relatively easy fix in data_prep.ascii_to_xyz. [pyproj](https://github.com/pyproj4/pyproj) v2.2.0 now defines a CRS from string and does reprojection a bit differently than before in v2.1.3. See also: - [Release notes](https://github.com/pyproj4/pyproj/releases/tag/v2.2.0rel) - [Commits]pyproj4/pyproj@v2.1.3rel...v2.2.0rel)

Bumps [rasterio](https://github.com/mapbox/rasterio) from 1.0.13 to 1.0.23. Supersedes #117. Also show (a part of) failing tests in test_ipynb.ipynb that needs to be resolved in #150. - [Release notes](https://github.com/mapbox/rasterio/releases) - [Changelog](https://github.com/mapbox/rasterio/blob/master/CHANGES.txt) - [Commits](rasterio/rasterio@1.0.13...1.0.23)

Adjusting the bounding box positions of (some) tiles by half a pixel (roughly 500m). Fixed a poorly written unit test for the data_prep.get_window_bounds function introduced in ab0295e right before release v0.4.0, and refactored the function to make it work correctly with newer libraries. Strangely enough, after recreating the geojson tiles, the coordinate changes are only for the Operation Ice Bridge/CRESIS Basler and TO grids, and we still have exactly 2347 tiles! Specifically, we've switched from using xarray.open_rasterio to xarray.open_dataarray as xarray's rasterio backend fails to read NetCDF georeferencing information properly with rasterio >= 1.0.14 (see #84). The projection information is monkey-patched into the xarray grid using salem. Most important bit is to ensure that we go from xarray's centre-based pixel coordinate system to rasterio's corner-based pixel coordinate system (accomplished with dataset.salem.grid.corner_grid) via a manually set Affine transformation.

To make the data_prep.selective_tile more understandable, I've 'simplified' the doctest with a diagonal array, and actually used more geographically correct (corner-based) pixel bounds. Also updated test_data_prep.py to use the v0.7.0 release grid instead of v0.4.0. Issue with rasterio.open not having proper affine transformation on netcdf files solved via the most ludricrous method of all - importing xarray before rasterio... Will submit a bug report after this, but in the meantime, the code can stay mostly intact, phew! Next step is to possibly remove salem, and refactor the selective_tile function a bit more actually, by using xarray.open_rasterio and the xarray.DataArray.sel by xy slice method.

Fragments appear to be the same with 63b3ef1?!! Just reuploading to quilt (new hash is 4d3b5a17f63b35212b5a0a210d9b88059a9ff88ad2d5b2de15d9535f48002e13 at https://quiltdata.com/package/weiji14/deepbedmap), not sure if there might be some minor changes. Gave up on refactoring data_prep.get_window_bounds (salem still seems to be needed...) and data_prep.selective_tile (lazy to figure out resampling of xarray grids). Anyways, time to get on with some neural network model training!

Closes #150 Shift pixel coordinates of tiles using proper affine transformation.

Towards more fine-grained cropping of our image tiles! Basically have data_prep.selective tile do the crop using exact geographic coordinate slice ranges instead of having to convert (sometimes imprecisely) to image-based coordinates. Uses xarray's subset 'sel'(ection) method which does away with the mess that is rasterio.windows and affine transformations. REMA tiles doesn't seem to require gapfilling anymore so we've temporarily disabled gapfilling (raise NotImplementedError) until it is needed for getting tiles for the whole of Antarctica again. Also using a nicer interpolation method in data_prep.selective_tile, especially relevant for W2_data aka the MEASURES Surface Ice Velocity which is resampled from 450m to 500m (since a8863e4). Still resampling billinearly, but interpolation at the cropped tile's edges take into account pixels beyond the border if available. I've actually inspected these new Ice Velocity tiles manually and they look awesome! Might help with the strange high-level checkerboard artifacts. Side effect is that interpolation runs slowly (mitigated somewhat by using dask), until we can vectorize the whole function properly.

Neural network wasn't training properly, and I tracked it down to the REMA input rasters having low NaN-like values... Found out proper way to get dask DataArray masks using dask.array.ma module, and so we can reintroduce gapfilling in data_prep.selective_tile, this time using dask/xarray to vectorize the operations. The gapfilled raster is also interpolated better along the edges as in 7fd3345 which might help with the neural network training later. Quilt hash updated from 9c8cb530df6340e257e18008b59b9d7b5f701fd9e5cef2c8436984ae49cff237 to b0b090ca35271d41ea1cf5e6afa0e6c6a3da34193c00444963dde7ad20eb7331. Not passing in a gapfill_raster_filepath (when it is needed) now errors out with nicer debugging plots that have EPSG:3031 projected coordinates on the axes!

Oh the two weeks of background work just to come to this, pygmt.grdtrack and whatnot! Our srgan_train.get_deepbedmap_test_result function now evaluates the trained neural network model's predicted grid directly in memory instead of on a file! Lots of temporary file related boilerplate code removed, woohoo! There's been plenty of background work to get this in-memory grid to be georeferenced correctly (see #150, 7fd3345, 4a074d9), but it all ends up making the evaluation more accurate, and hopefully faster to run and cleaner to scale up. Note also that the deepbedmap.get_image_and_bounds function has been refactored and renamed to get_image_with_bounds. The previous bounds was using xarray's centre-based pixel coordinates when it should have returned rasterio-style corner-based pixel coordinates used by data_prep.selective_tile. This is resolved using good ol' salem, and there's some extra code to handle getting the bounds for multiple inputs (instead of relying on salem's .extent function). The bounds themselves are now stored as an attribute inside the groundtruth xarray grid. As we are returning an xarray.DataArray grid instead of a numpy.array, we can use xarray's .plot() method to plot merged groundtruth grids (that are not on a regular grid) in a less funny way. There is also an 'indexers' parameter introduced to enable manually getting bounding boxes exactly divisible by 4, a quirky requirement of DeepBedMap... The hardcoded bounding box view of Antarctica used in our deepbedmap.feature integration test has been updated in alignment with all the changes above (pixel offsets, manual crops, etc). New 'weiji14/deebedmap/model/test' tiles have been uploaded to quilt, and the new quilt hash to use is df0d28b24283c642f5dbe1a9baa22b605d8ae02ec1875c2edd067a614e99e5a4. Also patching 4a074d9 to fix data_prep.selective_tile's masking not handling fp16 NaN conversions as the Synthetic HighRes geotiff was acting up, and remove NaN checks after gapfilling as it trips up the big DeepBedMap tiller. What else? Phew!

Closes #149 Refactor RMSE_test calculation using pygmt.grdtrack.

weiji14 added the enhancement ✨ New feature or request label Jun 1, 2019

weiji14 added this to the v0.9.0 milestone Jun 1, 2019

weiji14 self-assigned this Jun 1, 2019

weiji14 added 5 commits June 2, 2019 08:25

➕ Add salem

863d3be

Extends xarray with a special accessor for automated projection parsing! Package repository at https://github.com/fmaussion/salem. Also added required dependency joblib.

weiji14 added this to In progress in GeoDataScience Board Jun 4, 2019

weiji14 added 4 commits June 6, 2019 23:22

🔀 Merge branch 'shifting_tiles' into refactor_grdtrack (#150)

c38f74a

Closes #150 Shift pixel coordinates of tiles using proper affine transformation.

weiji14 force-pushed the refactor_grdtrack branch from 51a504f to 7fd3345 Compare June 12, 2019 13:07

weiji14 added 2 commits June 13, 2019 11:51

weiji14 force-pushed the refactor_grdtrack branch from a0ab7c7 to 66b97d2 Compare June 13, 2019 15:25

weiji14 marked this pull request as ready for review June 13, 2019 15:29

weiji14 merged commit 66b97d2 into master Jun 13, 2019

GeoDataScience Board automation moved this from In progress to Done Jun 13, 2019

weiji14 added a commit that referenced this pull request Jun 13, 2019

🔀 Merge branch 'refactor_grdtrack' (#149)

3e2c512

Closes #149 Refactor RMSE_test calculation using pygmt.grdtrack.

weiji14 deleted the refactor_grdtrack branch June 13, 2019 15:32

weiji14 mentioned this pull request Jun 15, 2019

RMSE_test calculation does not sample points along groundtruth grid edges properly #152

Closed

weiji14 added the model 🏗️ Pull requests that update neural network model label Jun 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor RMSE_test calculation using pygmt.grdtrack #149

Refactor RMSE_test calculation using pygmt.grdtrack #149

weiji14 commented Jun 1, 2019 •

edited

review-notebook-app bot commented Jun 1, 2019

Refactor RMSE_test calculation using pygmt.grdtrack #149

Refactor RMSE_test calculation using pygmt.grdtrack #149

Conversation

weiji14 commented Jun 1, 2019 • edited

review-notebook-app bot commented Jun 1, 2019

weiji14 commented Jun 1, 2019 •

edited