New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor RMSE_test calculation using pygmt.grdtrack #149
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Minor mistake in the Root Mean Square Error (RMSE) calculation due to 1 row not being loaded with pandas.read_csv. RMSE between groundtruth xyz tracks and interpolated grids (as reported in e27ac4a) rounded to 4 decimal points should be: - for groundtruth - 7.3182 instead of 7.3183 - for baseline bicubic - 63.7952 instead of 63.7947 - for deepbedmap3 v0.8.0 - 47.0130 instead of 47.0186 This also affects earlier commits, but as it is only 1 row missing out of 42212, there is only a small decimal difference that should not really affect the previous conclusions/interpretations.
Check out this pull request on ReviewNB: https://app.reviewnb.com/weiji14/deepbedmap/pull/149 Visit www.reviewnb.com to know how we simplify your Jupyter Notebook workflows. |
Refactor get_deepbedmap_test_result function to not have intermediate .xyz or .xyzi files anymore, and more importantly, replace `!gmt grdtrack` with `pygmt.grdtrack`! That way, black will finally play nicely when linting srgan_train.py and deepbedmap.py, and there might be some speed improvements. Also removed the redo_testtrack flag. Somewhat unfortunately, that means our I/O time savings might not matter overall, as we've removed caching of the track_test.xyz file in hyperparameter tuning experiments. Who cares, since there's less hideous code to maintain!
Extends xarray with a special accessor for automated projection parsing! Package repository at https://github.com/fmaussion/salem. Also added required dependency joblib.
Patches 863d3be with a relatively easy fix in data_prep.ascii_to_xyz. [pyproj](https://github.com/pyproj4/pyproj) v2.2.0 now defines a CRS from string and does reprojection a bit differently than before in v2.1.3. See also: - [Release notes](https://github.com/pyproj4/pyproj/releases/tag/v2.2.0rel) - [Commits]pyproj4/pyproj@v2.1.3rel...v2.2.0rel)
Bumps [rasterio](https://github.com/mapbox/rasterio) from 1.0.13 to 1.0.23. Supersedes #117. Also show (a part of) failing tests in test_ipynb.ipynb that needs to be resolved in #150. - [Release notes](https://github.com/mapbox/rasterio/releases) - [Changelog](https://github.com/mapbox/rasterio/blob/master/CHANGES.txt) - [Commits](rasterio/rasterio@1.0.13...1.0.23)
Adjusting the bounding box positions of (some) tiles by half a pixel (roughly 500m). Fixed a poorly written unit test for the data_prep.get_window_bounds function introduced in ab0295e right before release v0.4.0, and refactored the function to make it work correctly with newer libraries. Strangely enough, after recreating the geojson tiles, the coordinate changes are only for the Operation Ice Bridge/CRESIS Basler and TO grids, and we still have exactly 2347 tiles! Specifically, we've switched from using xarray.open_rasterio to xarray.open_dataarray as xarray's rasterio backend fails to read NetCDF georeferencing information properly with rasterio >= 1.0.14 (see #84). The projection information is monkey-patched into the xarray grid using salem. Most important bit is to ensure that we go from xarray's centre-based pixel coordinate system to rasterio's corner-based pixel coordinate system (accomplished with dataset.salem.grid.corner_grid) via a manually set Affine transformation.
To make the data_prep.selective_tile more understandable, I've 'simplified' the doctest with a diagonal array, and actually used more geographically correct (corner-based) pixel bounds. Also updated test_data_prep.py to use the v0.7.0 release grid instead of v0.4.0. Issue with rasterio.open not having proper affine transformation on netcdf files solved via the most ludricrous method of all - importing xarray before rasterio... Will submit a bug report after this, but in the meantime, the code can stay mostly intact, phew! Next step is to possibly remove salem, and refactor the selective_tile function a bit more actually, by using xarray.open_rasterio and the xarray.DataArray.sel by xy slice method.
Fragments appear to be the same with 63b3ef1?!! Just reuploading to quilt (new hash is 4d3b5a17f63b35212b5a0a210d9b88059a9ff88ad2d5b2de15d9535f48002e13 at https://quiltdata.com/package/weiji14/deepbedmap), not sure if there might be some minor changes. Gave up on refactoring data_prep.get_window_bounds (salem still seems to be needed...) and data_prep.selective_tile (lazy to figure out resampling of xarray grids). Anyways, time to get on with some neural network model training!
Closes #150 Shift pixel coordinates of tiles using proper affine transformation.
Towards more fine-grained cropping of our image tiles! Basically have data_prep.selective tile do the crop using exact geographic coordinate slice ranges instead of having to convert (sometimes imprecisely) to image-based coordinates. Uses xarray's subset 'sel'(ection) method which does away with the mess that is rasterio.windows and affine transformations. REMA tiles doesn't seem to require gapfilling anymore so we've temporarily disabled gapfilling (raise NotImplementedError) until it is needed for getting tiles for the whole of Antarctica again. Also using a nicer interpolation method in data_prep.selective_tile, especially relevant for W2_data aka the MEASURES Surface Ice Velocity which is resampled from 450m to 500m (since a8863e4). Still resampling billinearly, but interpolation at the cropped tile's edges take into account pixels beyond the border if available. I've actually inspected these new Ice Velocity tiles manually and they look awesome! Might help with the strange high-level checkerboard artifacts. Side effect is that interpolation runs slowly (mitigated somewhat by using dask), until we can vectorize the whole function properly.
weiji14
force-pushed
the
refactor_grdtrack
branch
from
June 12, 2019 13:07
51a504f
to
7fd3345
Compare
Neural network wasn't training properly, and I tracked it down to the REMA input rasters having low NaN-like values... Found out proper way to get dask DataArray masks using dask.array.ma module, and so we can reintroduce gapfilling in data_prep.selective_tile, this time using dask/xarray to vectorize the operations. The gapfilled raster is also interpolated better along the edges as in 7fd3345 which might help with the neural network training later. Quilt hash updated from 9c8cb530df6340e257e18008b59b9d7b5f701fd9e5cef2c8436984ae49cff237 to b0b090ca35271d41ea1cf5e6afa0e6c6a3da34193c00444963dde7ad20eb7331. Not passing in a gapfill_raster_filepath (when it is needed) now errors out with nicer debugging plots that have EPSG:3031 projected coordinates on the axes!
Oh the two weeks of background work just to come to this, pygmt.grdtrack and whatnot! Our srgan_train.get_deepbedmap_test_result function now evaluates the trained neural network model's predicted grid directly in memory instead of on a file! Lots of temporary file related boilerplate code removed, woohoo! There's been plenty of background work to get this in-memory grid to be georeferenced correctly (see #150, 7fd3345, 4a074d9), but it all ends up making the evaluation more accurate, and hopefully faster to run and cleaner to scale up. Note also that the deepbedmap.get_image_and_bounds function has been refactored and renamed to get_image_with_bounds. The previous bounds was using xarray's centre-based pixel coordinates when it should have returned rasterio-style corner-based pixel coordinates used by data_prep.selective_tile. This is resolved using good ol' salem, and there's some extra code to handle getting the bounds for multiple inputs (instead of relying on salem's .extent function). The bounds themselves are now stored as an attribute inside the groundtruth xarray grid. As we are returning an xarray.DataArray grid instead of a numpy.array, we can use xarray's .plot() method to plot merged groundtruth grids (that are not on a regular grid) in a less funny way. There is also an 'indexers' parameter introduced to enable manually getting bounding boxes exactly divisible by 4, a quirky requirement of DeepBedMap... The hardcoded bounding box view of Antarctica used in our deepbedmap.feature integration test has been updated in alignment with all the changes above (pixel offsets, manual crops, etc). New 'weiji14/deebedmap/model/test' tiles have been uploaded to quilt, and the new quilt hash to use is df0d28b24283c642f5dbe1a9baa22b605d8ae02ec1875c2edd067a614e99e5a4. Also patching 4a074d9 to fix data_prep.selective_tile's masking not handling fp16 NaN conversions as the Synthetic HighRes geotiff was acting up, and remove NaN checks after gapfilling as it trips up the big DeepBedMap tiller. What else? Phew!
weiji14
force-pushed
the
refactor_grdtrack
branch
from
June 13, 2019 15:25
a0ab7c7
to
66b97d2
Compare
weiji14
added a commit
that referenced
this pull request
Jun 13, 2019
Closes #149 Refactor RMSE_test calculation using pygmt.grdtrack.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Simplifying and hopefully speeding up the code used to get interpolated grid values at certain groundtruth points. Previously used
!gmt grdtrack
which is a bash hack, but now there'spygmt.grdtrack
that is a whole lot nicer! Instead of having temporary.csv
and.nc
files, we could simply pass in apandas.DataFrame
andxarray.DataArray
in memory topygmt.grdtrack
, and get apandas.DataFrame
output we can analyze straightaway!TODO:
pandas.DataFrame
directly intogrdtrack
instead of using intermediate.xyz
file (e44c7c6)xarray.DataArray
directly intogrdtrack
instead of using intermediate.nc
file (66b97d2)