-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: Cannot cast array data from dtype('O') to dtype('int64') when applying histogram to image time series #9
Comments
Thanks for the bug report @rbavery - this package is very new, so issues like this are very important for ironing out the api and documentation. Your case is most simple to the tutorial example histogram over a single axis. The only difference is that you have one extra dimension. from xhistogram.xarray import histogram
import numpy as np
import xarray as xr
ny, nx = 100, 100
nt = 44
data_arr = xr.DataArray(np.random.randn(nt,ny,nx),
dims=['time', 'y', 'x'],
name='blue reflectance')
rmin, rmax, nbins = -4, 4, 50
bin_arr = np.linspace(rmin, rmax, nbins)
histogram(data_arr, bins=[bin_arr], dim=['x','y']) This works and produces an output that looks like this
In words, this calculates the histogram over the x,y dimensions for each timestep. I assume that is what you wanted to do. There were a few things wrong with your original example:
In summary, the usage of xhistogram in this scenario is simpler than you were assuming.(I hope I have not misunderstood your goals.) Perhaps it would be worth adding this example to the tutorial, as it seems like a very common use case (histogram over image timeseries). |
Thanks a bunch for the clarification, I misunderstood the docstring. For the bins argument, this is the comment if Dask should be used
I took that to mean a list of arrays is needed, with each array corresponding to a DataArray object, since the docstring also states that it accepts multiple DataArray objects and I was working with Dask arrays.
I think it would help to clarify when multiple DataArray objects and bin arrays should be supplie d as inputs, with documentation examples of each. And I think a time series example would be helpful to include, I can make a PR if you'd like. My goal is actually not to get the counts of reflectance values that fall within each histogram bin but to return a series of 2D arrays where each pixel is labeled numerically by the bin index of the bin that contains the pixel value. I use this series of labels to calculate the means of each bin of each blue reflectance image and also the corresponding red reflectance means for each bin for each image. This is why I need the labelled array and can't just calculate the means following the last example in the xhistogram tutorial. scipy.stats.binned_statistic returns the binnumber, or the bin label for each pixel, which is what my original function in this stack overflow post relies on. I realized that xhistogram doesn't fit my use case a bit late but still appreciate the help and am glad to discover a new useful tool. |
Thanks a lot for the feedback! We will try to update the docs with this in mind.
It sound like you want numpy.digitize. |
Thanks for the suggestion, this looks to be exactly what I need! I've updated my stack overflow post accordingly, still having trouble applying numpy.digitize with apply_ufunc: https://stackoverflow.com/questions/57419541/how-to-use-apply-ufunc-with-numpy-digitize-for-each-image-along-time-dimension-o |
I've tried to create a minimal reproducible example (similar to the tutorial, which I'm able to run), but working with a 3D DataArray. But I get an error:
I'm not sure what is causing this error.
The text was updated successfully, but these errors were encountered: