Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMSE_test calculation does not sample points along groundtruth grid edges properly #152

Closed
weiji14 opened this issue Jun 15, 2019 · 0 comments · Fixed by #155
Closed

RMSE_test calculation does not sample points along groundtruth grid edges properly #152

weiji14 opened this issue Jun 15, 2019 · 0 comments · Fixed by #155
Assignees
Labels
bug 🪲 Something isn't working data 🗃️ Pull requests that update input datasets
Milestone

Comments

@weiji14
Copy link
Owner

weiji14 commented Jun 15, 2019

Commit 054e295 in #151 highlighted a bug introduced in #149. Basically, pygmt.grdtrack differs in sampling points along the edges depending on whether we use a xarray.DataArray or NetCDF file grid input. See image below showing points sampling the 2007tx.nc grid, specifically the 2007t1.txt area.

Difference in pygmt.grdtrack sampled datapoints when run on raw xarray.DataArray and NetCDF file

Yes we do crop the 2007tx.nc grid by one pixel on the left, bottom, right and top (to make the image shape divisible by 4), but there's still some serious discrepancies.

Number of points:

  • Actual total from 2007t1.txt + 2007tr.txt = 42995 points
  • pygmt.grdtrack sample on NetCDF file = 38112 points
  • pygmt.grdtrack sample on xr.DataArray = 37829 points

Strangely enough, running it on an xr.DataArray captures more points on the top and bottom (y-direction) whereas running it on a NetCDF file captures more points on the left and right (x-direction).

How to fix

Adjust data_prep.xyz_to_grid to not use tight bounds from data_prep.get_region. Maybe buffer the input bounds by 250m * 3 pixels (the mask we set when running pygmt.surface) before running blockmedian and surface. This should mean we get closer to the actual total of 42995 points regardless of whether we run it on an xarray.DataArray or NetCDF file

If not, then it might be a good idea to report this upstream to whoever wrote that wrapper for pygmt.grdtrack 😅

@weiji14 weiji14 added bug 🪲 Something isn't working data 🗃️ Pull requests that update input datasets labels Jun 15, 2019
@weiji14 weiji14 self-assigned this Jun 15, 2019
@weiji14 weiji14 added this to the v0.9.2 milestone Jun 15, 2019
weiji14 added a commit that referenced this issue Jun 15, 2019
Patch 054e295 so that the RMSE_test calculated in deepbedmap.ipynb matches that of srgan_train.get_deepbedmap_test_result. Basically run pygmt.grdtrack on an xarray.DataArray grid only, rather than on an xr.DataArray grid in srgan_train.ipynb and a NetCDF file grid in deepbedmap.ipynb that produces slightly different results! Main issue with this is that the grdtrack algorithm samples less points than before, from 38112 down to 37829 now. This is because of how the edges of the grid are not properly sampled. Issue is documented in #152.
weiji14 added a commit that referenced this issue Jun 17, 2019
Update 2D, 3D and histogram plots in deepbedmap.ipynb for the 2007tx.nc test area using our newly trained Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) model from https://www.comet.ml/weiji14/deepbedmap/0b9b232394da42e394998b112f628696. Had to change our residual_scaling hyperparameter default setting from 0.2 to 0.15 in a few places following the last commit in 83e956d. Like really, we need to find a way to set the correct residual_scaling and num_residual_blocks settings when loading from a trained .npz model file.

Showcasing the best RMSE_test result of 43.57 achieved in the 2nd hyperparameter tuning frenzy in 83e956d. Note that the result is actually about 45.59 if we account for the borders properly (see issue #152), making the result not too different from the 45.35 reported in e27ac4a. However, the peak of the elevation error histogram is actually closer to that of the groundtruth with a mean of -25.37 instead of -94.75 (i.e. nearer to 0)! There's some checkerboard artifacts sure, and the errors at the 4 corners are off the chart for some reason, but I think we're definitely getting somewhere!!
weiji14 added a commit that referenced this issue Jun 21, 2019
Creating new groundtruth NetCDF grids using GMT surface, replacing the ones last created in b90bd74 in #112. Besides having updated to the GMT 6.0.0rc1 tagged release, the main change here is with using nicely rounded bounds (to 250 units in EPSG:3031) instead of arbitrary decimal points. This will really help resolve some of the problems with points not being included in our RMSE_test calculations near the grid's edges (see #152), and integer coordinates are just nicer to debug won't you say?

Specifically, the data_prep.get_region was refactored to use `gmt info -I xxx.csv` instead of pure pandas, returning an `xmin/xmax/ymin/ymax` string that has an extended region optimized for `gmt surface`. There is a "surface [WARNING]: Your grid dimensions are mutually prime.  Convergence is very unlikely" which I'm just gonna ignore for now. Note that data_prep.ascii_to_xyz was one-line patched to drop NaNs as there were some points in the WISE_ISODYN_RadarByFlight.XYZ file with missing elevation (z) values (since #112...) that was messing up gmt.info in the refactored data_prep.get_region. Unit tests have been modified accordingly, and the grids in the integration tests are now downloaded/created in folder /tmp to avoid messing with the actual files in highres. Matplotlib plots of the grids in data_prep.ipynb have been updated, and the new grids will be released in v0.9.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working data 🗃️ Pull requests that update input datasets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant