## Performance of `xesmf` vs `xarray-regrid`

Compare the two conservative methods using a moderately-sized synthetic dask dataset of about 4GB.

In [18]:
import dask.array as da
import xarray as xr
import xesmf

import xarray_regrid

bounds = dict(south=-90, north=90, west=-180, east=180)

source = xarray_regrid.Grid(
    resolution_lat=0.25,
    resolution_lon=0.25,
    **bounds,
).create_regridding_dataset()

target = xarray_regrid.Grid(
    resolution_lat=1,
    resolution_lon=1,
    **bounds,
).create_regridding_dataset()


def source_data(source, chunks, n_times=1000):
    data = da.random.random(
        size=(n_times, source.latitude.size, source.longitude.size),
        chunks=chunks,
    ).astype("float32")

    data = xr.DataArray(
        data,
        dims=["time", "latitude", "longitude"],
        coords={
            "time": xr.date_range("2000-01-01", periods=n_times, freq="D"),
            "latitude": source.latitude,
            "longitude": source.longitude,
        }
    )

    return data


## Chunking

Test "pancake" (chunked in time) and "churro" (chunked in space) chunks of different sizes. The "small" versions are about 4 MB, and the "large" are about 100 MB.

In [19]:
chunk_schemes = {
    "pancake_small": (1, -1, -1),
    "pancake_large": (25, -1, -1),
    "churro_small": (-1, 32, 32),
    "churro_large": (-1, 160, 160),
}

In [20]:
# For larger grids, generating weights is quite expensive
xesmf_regridder = xesmf.Regridder(source, target, "conservative")



## Timings

Run timings for different chunkings schemes and with NaN skipping enabled and disabled, across both libraries. Compare the ratio of `xesmf / xarray-regrid` to see the speedup factor of using this library.

In [21]:
import time

import pandas as pd

pd.options.display.precision = 1


def do_regrid(data, target, skipna):
    data.regrid.conservative(target, skipna=skipna).compute()


def do_xesmf(data, target, skipna):
    xesmf_regridder(data, skipna=skipna).compute()


def timing_grid(func, repeats=2):
    times = pd.DataFrame(
        index=chunk_schemes.keys(),
        columns=["skipna=False", "skipna=True"],
    )
    for name, chunks in chunk_schemes.items():
        data = source_data(source, chunks)
        for skipna in [False, True]:
            execution_times = []
            for _ in range(repeats):
                start = time.perf_counter()
                func(data, target, skipna)
                end = time.perf_counter()
                execution_times.append(end - start)
            # Sometimes the first execution is a little slower
            times.loc[name, f"skipna={skipna}"] = min(execution_times)

    return times


regrid_times = timing_grid(do_regrid)
xesmf_times = timing_grid(do_xesmf)
ratio = xesmf_times / regrid_times


  result_var = func(*data_vars)
  result_var = func(*data_vars)
  result_var = func(*data_vars)
  result_var = func(*data_vars)
  result_var = func(*data_vars)
  result_var = func(*data_vars)
  result_var = func(*data_vars)
  result_var = func(*data_vars)


## Results

With current implementations, `xesmf` is slightly faster for large pancake-style chunks. `xarray-regrid` is much faster for small chunks, especially churro-style.

These tests were run on an 8-core Intel i7 Ubuntu desktop:

In [22]:
ratio

Unnamed: 0,skipna=False,skipna=True
pancake_small,3.7,7.2
pancake_large,0.6,1.1
churro_small,14.2,16.9
churro_large,1.8,2.4
