Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor RMSE_test calculation using pygmt.grdtrack #149

Merged
merged 12 commits into from Jun 13, 2019
Merged

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented Jun 1, 2019

Simplifying and hopefully speeding up the code used to get interpolated grid values at certain groundtruth points. Previously used !gmt grdtrack which is a bash hack, but now there's pygmt.grdtrack that is a whole lot nicer! Instead of having temporary .csv and .nc files, we could simply pass in a pandas.DataFrame and xarray.DataArray in memory to pygmt.grdtrack, and get a pandas.DataFrame output we can analyze straightaway!

TODO:

  • Hotfix missing-one-row from pandas.csv load, oh the I/O misery! (1f16898)
  • Input pandas.DataFrame directly into grdtrack instead of using intermediate .xyz file (e44c7c6)
  • Attempt to input xarray.DataArray directly into grdtrack instead of using intermediate .nc file (66b97d2)

Minor mistake in the Root Mean Square Error (RMSE) calculation due to 1 row not being loaded with pandas.read_csv. RMSE between groundtruth xyz tracks and interpolated grids (as reported in e27ac4a) rounded to 4 decimal points should be:
- for groundtruth - 7.3182 instead of 7.3183
- for baseline bicubic - 63.7952 instead of 63.7947
- for deepbedmap3 v0.8.0 - 47.0130 instead of 47.0186

This also affects earlier commits, but as it is only 1 row missing out of 42212, there is only a small decimal difference that should not really affect the previous conclusions/interpretations.
@weiji14 weiji14 added the enhancement ✨ New feature or request label Jun 1, 2019
@weiji14 weiji14 added this to the v0.9.0 milestone Jun 1, 2019
@weiji14 weiji14 self-assigned this Jun 1, 2019
@review-notebook-app
Copy link

Check out this pull request on ReviewNB: https://app.reviewnb.com/weiji14/deepbedmap/pull/149

Visit www.reviewnb.com to know how we simplify your Jupyter Notebook workflows.

Refactor get_deepbedmap_test_result function to not have intermediate .xyz or .xyzi files anymore, and more importantly, replace `!gmt grdtrack` with `pygmt.grdtrack`! That way, black will finally play nicely when linting srgan_train.py and deepbedmap.py, and there might be some speed improvements.

Also removed the redo_testtrack flag. Somewhat unfortunately, that means our I/O time savings might not matter overall, as we've removed caching of the track_test.xyz file in hyperparameter tuning experiments. Who cares, since there's less hideous code to maintain!
Extends xarray with a special accessor for automated projection parsing! Package repository at https://github.com/fmaussion/salem. Also added required dependency joblib.
Patches 863d3be with a relatively easy fix in data_prep.ascii_to_xyz. [pyproj](https://github.com/pyproj4/pyproj) v2.2.0 now defines a CRS from string and does reprojection a bit differently than before in v2.1.3. See also:
- [Release notes](https://github.com/pyproj4/pyproj/releases/tag/v2.2.0rel)
- [Commits]pyproj4/pyproj@v2.1.3rel...v2.2.0rel)
Bumps [rasterio](https://github.com/mapbox/rasterio) from 1.0.13 to 1.0.23. Supersedes #117. Also show (a part of) failing tests in test_ipynb.ipynb that needs to be resolved in #150.
- [Release notes](https://github.com/mapbox/rasterio/releases)
- [Changelog](https://github.com/mapbox/rasterio/blob/master/CHANGES.txt)
- [Commits](rasterio/rasterio@1.0.13...1.0.23)
Adjusting the bounding box positions of (some) tiles by half a pixel (roughly 500m). Fixed a poorly written unit test for the data_prep.get_window_bounds function introduced in ab0295e right before release v0.4.0, and refactored the function to make it work correctly with newer libraries. Strangely enough, after recreating the geojson tiles, the coordinate changes are only for the Operation Ice Bridge/CRESIS Basler and TO grids, and we still have exactly 2347 tiles!

Specifically, we've switched from using xarray.open_rasterio to xarray.open_dataarray as xarray's rasterio backend fails to read NetCDF georeferencing information properly with rasterio >= 1.0.14 (see #84). The projection information is monkey-patched into the xarray grid using salem. Most important bit is to ensure that we go from xarray's centre-based pixel coordinate system to rasterio's corner-based pixel coordinate system (accomplished with dataset.salem.grid.corner_grid) via a manually set Affine transformation.
@weiji14 weiji14 added this to In progress in GeoDataScience Board Jun 4, 2019
To make the data_prep.selective_tile more understandable, I've 'simplified' the doctest with a diagonal array, and actually used more geographically correct (corner-based) pixel bounds. Also updated test_data_prep.py to use the v0.7.0 release grid instead of v0.4.0.

Issue with rasterio.open not having proper affine transformation on netcdf files solved via the most ludricrous method of all - importing xarray before rasterio... Will submit a bug report after this, but in the meantime, the code can stay mostly intact, phew! Next step is to possibly remove salem, and refactor the selective_tile function a bit more actually, by using xarray.open_rasterio and the xarray.DataArray.sel by xy slice method.
Fragments appear to be the same with 63b3ef1?!! Just reuploading to quilt (new hash is 4d3b5a17f63b35212b5a0a210d9b88059a9ff88ad2d5b2de15d9535f48002e13 at https://quiltdata.com/package/weiji14/deepbedmap), not sure if there might be some minor changes. Gave up on refactoring data_prep.get_window_bounds (salem still seems to be needed...) and data_prep.selective_tile (lazy to figure out resampling of xarray grids). Anyways, time to get on with some neural network model training!
Closes #150 Shift pixel coordinates of tiles using proper affine transformation.
Towards more fine-grained cropping of our image tiles! Basically have data_prep.selective tile do the crop using exact geographic coordinate slice ranges instead of having to convert (sometimes imprecisely) to image-based coordinates. Uses xarray's subset 'sel'(ection) method which does away with the mess that is rasterio.windows and affine transformations. REMA tiles doesn't seem to require gapfilling anymore so we've temporarily disabled gapfilling (raise NotImplementedError) until it is needed for getting tiles for the whole of Antarctica again.

Also using a nicer interpolation method in data_prep.selective_tile, especially relevant for W2_data aka the MEASURES Surface Ice Velocity which is resampled from 450m to 500m (since a8863e4). Still resampling billinearly, but interpolation at the cropped tile's edges take into account pixels beyond the border if available. I've actually inspected these new Ice Velocity tiles manually and they look awesome! Might help with the strange high-level checkerboard artifacts. Side effect is that interpolation runs slowly (mitigated somewhat by using dask), until we can vectorize the whole function properly.
Neural network wasn't training properly, and I tracked it down to the REMA input rasters having low NaN-like values... Found out proper way to get dask DataArray masks using dask.array.ma module, and so we can reintroduce gapfilling in data_prep.selective_tile, this time using dask/xarray to vectorize the operations. The gapfilled raster is also interpolated better along the edges as in 7fd3345 which might help with the neural network training later. Quilt hash updated from 9c8cb530df6340e257e18008b59b9d7b5f701fd9e5cef2c8436984ae49cff237 to b0b090ca35271d41ea1cf5e6afa0e6c6a3da34193c00444963dde7ad20eb7331. Not passing in a gapfill_raster_filepath (when it is needed) now errors out with nicer debugging plots that have EPSG:3031 projected coordinates on the axes!
Oh the two weeks of background work just to come to this, pygmt.grdtrack and whatnot! Our srgan_train.get_deepbedmap_test_result function now evaluates the trained neural network model's predicted grid directly in memory instead of on a file! Lots of temporary file related boilerplate code removed, woohoo! There's been plenty of background work to get this in-memory grid to be georeferenced correctly (see #150, 7fd3345, 4a074d9), but it all ends up making the evaluation more accurate, and hopefully faster to run and cleaner to scale up.

Note also that the deepbedmap.get_image_and_bounds function has been refactored and renamed to get_image_with_bounds. The previous bounds was using xarray's centre-based pixel coordinates when it should have returned rasterio-style corner-based pixel coordinates used by data_prep.selective_tile. This is resolved using good ol' salem, and there's some extra code to handle getting the bounds for multiple inputs (instead of relying on salem's .extent function). The bounds themselves are now stored as an attribute inside the groundtruth xarray grid. As we are returning an xarray.DataArray grid instead of a numpy.array, we can use xarray's .plot() method to plot merged groundtruth grids (that are not on a regular grid) in a less funny way. There is also an 'indexers' parameter introduced to enable manually getting bounding boxes exactly divisible by 4, a quirky requirement of DeepBedMap...

The hardcoded bounding box view of Antarctica used in our deepbedmap.feature integration test has been updated in alignment with all the changes above (pixel offsets, manual crops, etc). New 'weiji14/deebedmap/model/test' tiles have been uploaded to quilt, and the new quilt hash to use is df0d28b24283c642f5dbe1a9baa22b605d8ae02ec1875c2edd067a614e99e5a4. Also patching 4a074d9 to fix data_prep.selective_tile's masking not handling fp16 NaN conversions as the Synthetic HighRes geotiff was acting up, and remove NaN checks after gapfilling as it trips up the big DeepBedMap tiller. What else? Phew!
@weiji14 weiji14 marked this pull request as ready for review June 13, 2019 15:29
@weiji14 weiji14 merged commit 66b97d2 into master Jun 13, 2019
GeoDataScience Board automation moved this from In progress to Done Jun 13, 2019
weiji14 added a commit that referenced this pull request Jun 13, 2019
Closes #149 Refactor RMSE_test calculation using pygmt.grdtrack.
@weiji14 weiji14 deleted the refactor_grdtrack branch June 13, 2019 15:32
@weiji14 weiji14 added the model 🏗️ Pull requests that update neural network model label Jun 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ✨ New feature or request model 🏗️ Pull requests that update neural network model
Projects
Development

Successfully merging this pull request may close these issues.

None yet

1 participant