RMSE_test calculation does not sample points along groundtruth grid edges properly #152

weiji14 · 2019-06-15T14:53:21Z

Commit 054e295 in #151 highlighted a bug introduced in #149. Basically, pygmt.grdtrack differs in sampling points along the edges depending on whether we use a xarray.DataArray or NetCDF file grid input. See image below showing points sampling the 2007tx.nc grid, specifically the 2007t1.txt area.

Yes we do crop the 2007tx.nc grid by one pixel on the left, bottom, right and top (to make the image shape divisible by 4), but there's still some serious discrepancies.

Number of points:

Actual total from 2007t1.txt + 2007tr.txt = 42995 points
pygmt.grdtrack sample on NetCDF file = 38112 points
pygmt.grdtrack sample on xr.DataArray = 37829 points

Strangely enough, running it on an xr.DataArray captures more points on the top and bottom (y-direction) whereas running it on a NetCDF file captures more points on the left and right (x-direction).

How to fix

Adjust data_prep.xyz_to_grid to not use tight bounds from data_prep.get_region. Maybe buffer the input bounds by 250m * 3 pixels (the mask we set when running pygmt.surface) before running blockmedian and surface. This should mean we get closer to the actual total of 42995 points regardless of whether we run it on an xarray.DataArray or NetCDF file

If not, then it might be a good idea to report this upstream to whoever wrote that wrapper for pygmt.grdtrack 😅

The text was updated successfully, but these errors were encountered:

Patch 054e295 so that the RMSE_test calculated in deepbedmap.ipynb matches that of srgan_train.get_deepbedmap_test_result. Basically run pygmt.grdtrack on an xarray.DataArray grid only, rather than on an xr.DataArray grid in srgan_train.ipynb and a NetCDF file grid in deepbedmap.ipynb that produces slightly different results! Main issue with this is that the grdtrack algorithm samples less points than before, from 38112 down to 37829 now. This is because of how the edges of the grid are not properly sampled. Issue is documented in #152.

Update 2D, 3D and histogram plots in deepbedmap.ipynb for the 2007tx.nc test area using our newly trained Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) model from https://www.comet.ml/weiji14/deepbedmap/0b9b232394da42e394998b112f628696. Had to change our residual_scaling hyperparameter default setting from 0.2 to 0.15 in a few places following the last commit in 83e956d. Like really, we need to find a way to set the correct residual_scaling and num_residual_blocks settings when loading from a trained .npz model file. Showcasing the best RMSE_test result of 43.57 achieved in the 2nd hyperparameter tuning frenzy in 83e956d. Note that the result is actually about 45.59 if we account for the borders properly (see issue #152), making the result not too different from the 45.35 reported in e27ac4a. However, the peak of the elevation error histogram is actually closer to that of the groundtruth with a mean of -25.37 instead of -94.75 (i.e. nearer to 0)! There's some checkerboard artifacts sure, and the errors at the 4 corners are off the chart for some reason, but I think we're definitely getting somewhere!!

Creating new groundtruth NetCDF grids using GMT surface, replacing the ones last created in b90bd74 in #112. Besides having updated to the GMT 6.0.0rc1 tagged release, the main change here is with using nicely rounded bounds (to 250 units in EPSG:3031) instead of arbitrary decimal points. This will really help resolve some of the problems with points not being included in our RMSE_test calculations near the grid's edges (see #152), and integer coordinates are just nicer to debug won't you say? Specifically, the data_prep.get_region was refactored to use `gmt info -I xxx.csv` instead of pure pandas, returning an `xmin/xmax/ymin/ymax` string that has an extended region optimized for `gmt surface`. There is a "surface [WARNING]: Your grid dimensions are mutually prime. Convergence is very unlikely" which I'm just gonna ignore for now. Note that data_prep.ascii_to_xyz was one-line patched to drop NaNs as there were some points in the WISE_ISODYN_RadarByFlight.XYZ file with missing elevation (z) values (since #112...) that was messing up gmt.info in the refactored data_prep.get_region. Unit tests have been modified accordingly, and the grids in the integration tests are now downloaded/created in folder /tmp to avoid messing with the actual files in highres. Matplotlib plots of the grids in data_prep.ipynb have been updated, and the new grids will be released in v0.9.2.

weiji14 added bug 🪲 Something isn't working data 🗃️ Pull requests that update input datasets labels Jun 15, 2019

weiji14 self-assigned this Jun 15, 2019

weiji14 added this to the v0.9.2 milestone Jun 15, 2019

weiji14 mentioned this issue Jun 21, 2019

Re-grid and re-tile groundtruth data with coordinates rounded to 250 #155

Merged

3 tasks

weiji14 closed this as completed in #155 Jun 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RMSE_test calculation does not sample points along groundtruth grid edges properly #152

RMSE_test calculation does not sample points along groundtruth grid edges properly #152

weiji14 commented Jun 15, 2019

RMSE_test calculation does not sample points along groundtruth grid edges properly #152

RMSE_test calculation does not sample points along groundtruth grid edges properly #152

Comments

weiji14 commented Jun 15, 2019