Performance of xt::nanmean

In Python I have a 3D numpy array which is a stack of image data and I would like to calculate `nanmean`. I tried two different ways using xtensor-python:

```cpp
template<typename T>
inline xt::pytensor<T, 2> nanmeanImages(const xt::pytensor<T, 3>& arr) {

  auto shape = arr.shape();
  auto mean = xt::pytensor<T, 2>({shape[1], shape[2]});
  
  for (std::size_t j=0; j < shape[1]; ++j) {
    for (std::size_t k=0; k < shape[2]; ++k) {
      T count = 0;
      T sum = 0;
      for (std::size_t i=0; i < shape[0]; ++i) {
        auto v = arr(i, j, k);
        if (! std::isnan(v)) {
          count += T(1);
          sum += v;
        }
      }
      mean(j, k) = sum / count;
    }
  }

  return mean;
}


template<typename T>
inline xt::pytensor<T, 2> nanmeanImagesOld(const xt::pytensor<T, 3>& arr) {
  return xt::nanmean<T>(arr, {0}, xt::evaluation_strategy::immediate);
}
```

and benchmarked with the following Python code:

```
        data = np.ones((64, 1024, 1024), dtype=np.float32)
        data[::2, ::2, ::2] = np.nan

        t0 = time.perf_counter()
        ret_cpp = xt_nanmean_images(data)
        dt_cpp = time.perf_counter() - t0

        t0 = time.perf_counter()
        ret_cpp = xt_nanmean_images_old(data)
        dt_cpp_old = time.perf_counter() - t0
```
The result is

```
nanmean_images with <class 'numpy.float32'> - dt (cpp): 0.1258, dt (cpp) old: 0.1573
```

I guess the first one is faster because `xt::nanmean` uses `xt::nansum`, `xt::count_nonnan` which needs to loop over the big array twice. Also `xt::count_nonnan` is twice as expensive as `xt::nansum` for whatever reason. I compile `xtensor` with `xsimd` and do not see any improvement. But I am quite new to `xsimd` and not sure whether I did everything correctly.

I would like to further improve the performance by using `tbb`. I am not sure whether it is the best way to go and would like to ask your opinion. Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance of xt::nanmean #201

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance of xt::nanmean #201

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions