-
Notifications
You must be signed in to change notification settings - Fork 67
Open
Description
In Python I have a 3D numpy array which is a stack of image data and I would like to calculate nanmean
. I tried two different ways using xtensor-python:
template<typename T>
inline xt::pytensor<T, 2> nanmeanImages(const xt::pytensor<T, 3>& arr) {
auto shape = arr.shape();
auto mean = xt::pytensor<T, 2>({shape[1], shape[2]});
for (std::size_t j=0; j < shape[1]; ++j) {
for (std::size_t k=0; k < shape[2]; ++k) {
T count = 0;
T sum = 0;
for (std::size_t i=0; i < shape[0]; ++i) {
auto v = arr(i, j, k);
if (! std::isnan(v)) {
count += T(1);
sum += v;
}
}
mean(j, k) = sum / count;
}
}
return mean;
}
template<typename T>
inline xt::pytensor<T, 2> nanmeanImagesOld(const xt::pytensor<T, 3>& arr) {
return xt::nanmean<T>(arr, {0}, xt::evaluation_strategy::immediate);
}
and benchmarked with the following Python code:
data = np.ones((64, 1024, 1024), dtype=np.float32)
data[::2, ::2, ::2] = np.nan
t0 = time.perf_counter()
ret_cpp = xt_nanmean_images(data)
dt_cpp = time.perf_counter() - t0
t0 = time.perf_counter()
ret_cpp = xt_nanmean_images_old(data)
dt_cpp_old = time.perf_counter() - t0
The result is
nanmean_images with <class 'numpy.float32'> - dt (cpp): 0.1258, dt (cpp) old: 0.1573
I guess the first one is faster because xt::nanmean
uses xt::nansum
, xt::count_nonnan
which needs to loop over the big array twice. Also xt::count_nonnan
is twice as expensive as xt::nansum
for whatever reason. I compile xtensor
with xsimd
and do not see any improvement. But I am quite new to xsimd
and not sure whether I did everything correctly.
I would like to further improve the performance by using tbb
. I am not sure whether it is the best way to go and would like to ask your opinion. Thanks a lot!
Metadata
Metadata
Assignees
Labels
No labels