Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support negative values #35

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

brandomr
Copy link

@brandomr brandomr commented Nov 4, 2019

What's New

This PR allows unrasterize to process tiffs which may have negative values. This is accomplished by updating the _reassign_pixel_values function to take in the threshold set by the Unrasterizer class and using this as a cutoff to avoid underflow.

This updates the more basic approach of just assuming a threshold of 0 for this function.

This is useful for maps which may be measuring rates, for example population change. Some areas gain population and some lose it, but without this fix unrasterize will only select positive change pixels.

@@ -126,7 +126,7 @@ def _sort_pixels(band):
)

@staticmethod
def _reassign_pixel_values(band, pixels, raw_pixel_values=[]):
def _reassign_pixel_values(threshold, band, pixels, raw_pixel_values=[]):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I have a slight preference for the argument order band, pixels, threshold, ...

@@ -126,7 +126,7 @@ def _sort_pixels(band):
)

@staticmethod
def _reassign_pixel_values(band, pixels, raw_pixel_values=[]):
def _reassign_pixel_values(threshold, band, pixels, raw_pixel_values=[]):
"""Adjust values of selected pixels so that their sum is preserved.

Parameters
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update the docstring with a description of the new function argument.

total_selected = np.sum(raw_pixel_values, dtype=np.float32)
return [
out = [
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This can be returned directly.

# Avoid underflow by ignoring negative values.
total = np.sum(band[band > 0.0], dtype=np.float32)
# Avoid underflow by ignoring values below the threshold.
total = np.sum(band[band > threshold], dtype=np.float32)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that passing threshold=self.threshold works.

The example notebook uses an Unrasterizer(mask_width=10, threshold=0.2) object. This function would then ignore all pixels with a population density < 0.2, so that they would not contribute to the total that is distributed among the selected pixels. In other words, the total sum of the selected pixels' values after applying this function would not equal the total sum across all pixels.

@tetraptych
Copy link
Owner

I'm not convinced that setting the new threshold parameter equal to self.threshold works -- see the inline comment.

That said, it's been a long time since I looked at this code in any depth. I assume from the code comments that I was running into mathematical underflow when the 0.0 value pixels were included in the summations.

So perhaps it would make sense to include a parameter akin to minimum_valid_pixel_value and then use the mask total = np.sum(band[(band > minimum_valid_pixel_value) | (-band < -minimum_valid_pixel_value)], dtype=np.float32) (or something clever using np.abs).

Also note that the _sort_pixels method selects pixels in descending order of pixel value, meaning that pixels with large negative values will be selected last. This might also require some changes to support negative values (such as selecting pixels in descending order of their absolute magnitude).

@brandomr
Copy link
Author

brandomr commented Nov 7, 2019

@tetraptych thanks for taking a look! I am tracking your comments and I think you're right, my solution is too simple and would not account for the actual total value above threshold. However, in my experimentation with the code on master branch I see found this related issue that likely needs to be addressed before worrying about negative values:

import rasterio
import numpy as np
from unrasterize import WindowedUnrasterizer

# read in example raster
raster_path = 'data/Belize/BLZ_ppp_v2b_2015_UNadj.tif'
raster_data = rasterio.open(raster_path)

# read the first band
band = raster_data.read(1)

# run WindowedUnrasterizer
unrasterizer = WindowedUnrasterizer(mask_width=5, threshold=0.2)
unrasterizer.select_representative_pixels(raster_data)

# compare values from original band with the output of WindowedUnrasterizer
np.sum(band[band>0.2], dtype=np.float32)
out: 268649.16

np.sum(unrasterizer.selected_values)
out: 300592.34

Note that the actual sum of the value of the band below threshold is 268649.16 but unrasterizer returns a total value of 300592.34.

This is because pixels are not selected when below threshold, but if they happen to be in a window-band with pixels that are selected, they end up in the total value for the band and thus pixels from that window are upscaled to include the pixel values below threshold. (case 1)

However, for windows where the band has no pixels that meet threshold, no pixels are selected and therefore no values counted from that window-band at all. (case 2)

So, pixels below threshold are actually sometimes counted (case 1) but sometimes not (case 2).

One solution to this is to add a line here that sets the value for any pixel below threshold to 0:

band[band <= self.threshold] = 0

This ensures that pixels whose value are below threshold are never counted. Then the line to prevent underflow could be adjusted to:

total = np.sum(band[band != 0.0], dtype=np.float32)

This doesn't solve my issue around negatives but I think is a step in the right direction. I do now see the issue with sorting and agree using absolute values for the sort would work well.

:sigh: this turned out to be far more nuanced than I expected!

Thoughts?

…s (see new Jupyter Notebook using CHIRPS rainfall anomaly data). Additionally, this adds an option where the user can specify whether they wish to sum or average across pixels.
@brandomr
Copy link
Author

@tetraptych I've pushed some additional updates to this PR.

Negative values are now allowed using np.abs as you suggested in the sorting process. Additionally, I've made an update that allows the user to optionally specify agg which can be either sum (default) or average. This is helpful for working with things like rainfall data (see my new example notebook where summation yields meaningless information.

Let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants