In [1]:
# mypy: disable-error-code="no-untyped-def, no-untyped-call"
# pyright: reportUnusedExpression=false, reportDuplicateImport=false, reportUnusedImport=false
# pylint: disable=wrong-import-position,pointless-statement,wrong-import-order,reimported

import os
from pathlib import Path

os.environ["NUMBA_CACHE_DIR"] = next(
    str(p / ".numba_cache") for p in Path.cwd().parents if p.name == "cmomy"
)

In [2]:
%%html
<style>
    div.output_stderr {
    display: none;
    }
</style>


In [3]:
import numpy as np

np.set_printoptions(precision=4)

# Usage

The basic usage of cmomy is to manipulate central moments.  
We measure two quantities pretend quantities.  The 'energy' and the 'position' of a thing.  We'll construct the average value for each record. Lets say 100 samples each.  

In [4]:
import numpy as np
import xarray as xr

import cmomy

rng = cmomy.default_rng(seed=0)

nsamp = 100
energy = xr.DataArray(rng.random(nsamp), dims=["samp"])

position = xr.DataArray(rng.random((nsamp, 3)), dims=["samp", "dim"])

# weight associated with each sample and each record
w = xr.DataArray(rng.random(nsamp), dims=["samp"])

# average over the samples
ce = cmomy.wrap_reduce_vals(energy, weight=w, dim="samp", mom=3)
cp = cmomy.wrap_reduce_vals(position, weight=w, dim="samp", mom=3)



In [5]:
print("energy")
ce

energy


In [6]:
print("position")
cp

position


Here, we've used the constructor {meth}`cmomy.wrap_reduce_vals`.  There are a host of other constructors to create {class}`cmomy.CentralMomentsData` and {class}`cmomy.CentralMomentsArray` objects.  Take a look at the docs for further info.

## Dataset

We can also wrap {class}`~xarray.Dataset` objects.  This has some limitations, as the structure is more complicated.  In particular, methods that depend on a fixed array shape, like {meth}`~cmomy.CentralMomentsData.shape`, etc, are not available for wrapped {class}`~xarray.Dataset`.

In [7]:
ds = xr.Dataset({"energy": energy, "position": position})
ctot = cmomy.wrap_reduce_vals(ds, weight=w, dim="samp", mom=3)
ctot

In [8]:
# verify that this is the same result as the DataArray version above...
xr.testing.assert_equal(ctot["energy"].obj, ce.obj)
xr.testing.assert_equal(ctot["position"].obj, cp.obj)

## Chunked data

You can also use ``dask`` chunked data (assuming [``dask``](https://docs.dask.org/en/stable/) is installed).  For example:

In [9]:
ctot_chunk = cmomy.wrap_reduce_vals(ds.chunk({"dim": -1}), weight=w, dim="samp", mom=3)
ctot_chunk

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Dask graph,1 chunks in 5 graph layers,1 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 32 B 32 B Shape (4,) (4,) Dask graph 1 chunks in 5 graph layers Data type float64 numpy.ndarray",4  1,

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Dask graph,1 chunks in 5 graph layers,1 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,96 B,96 B
Shape,"(3, 4)","(3, 4)"
Dask graph,1 chunks in 6 graph layers,1 chunks in 6 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 96 B 96 B Shape (3, 4) (3, 4) Dask graph 1 chunks in 6 graph layers Data type float64 numpy.ndarray",4  3,

Unnamed: 0,Array,Chunk
Bytes,96 B,96 B
Shape,"(3, 4)","(3, 4)"
Dask graph,1 chunks in 6 graph layers,1 chunks in 6 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


To compute the final result, use {meth}`~cmomy.CentralMomentsData.compute`:

In [10]:
ctot_chunk.compute()

In [11]:
xr.testing.assert_equal(ctot.obj, ctot_chunk.obj)

## Basic attributes

Notice that there are three `shape` parameters associated with a {class}`~cmomy.CentralMomentsData` object:
 
* {attr}`~cmomy.CentralMomentsData.mom_shape` : shape of the moments.  For single variable, tuple (mom+1,).  For comoments, (mom_0+1, mom_1+1)
* {attr}`~cmomy.CentralMomentsData.val_shape`: shape of the 'values' part of the data.  For scalar data, `val_shape = ()`.  For vector data, this is the shape of the passed observation data.
* {attr}`~cmomy.CentralMomentsData.shape`: total shape of wrapped moments `shape = val_shape + mom_shape`

In [12]:
for name, c in {"energy": ce, "position": cp}.items():
    print(
        f"""
{name} shapes:
    mom_shape: {c.mom_shape}
    val_shape: {c.val_shape}
    shape    : {c.shape}
"""
    )


energy shapes:
    mom_shape: (4,)
    val_shape: ()
    shape    : (4,)


position shapes:
    mom_shape: (4,)
    val_shape: (3,)
    shape    : (3, 4)



To access the underlying data, use the {meth}`cmomy.CentralMomentsData.to_numpy` method.  The structure is:

* `values[...,0]`: total weight
* `values[...,1]`: average/mean value
* `values[...,k>1]`: `kth` central moment.

In [13]:
ce.to_numpy()

array([ 5.3105e+01,  5.5314e-01,  8.6378e-02, -5.7464e-03])

To access all the central moments (zeroth and first included), use the {meth}`~cmomy.CentralMomentsData.cmom` method.

In [14]:
ce.cmom()

Likewise, the central moments can be converted to raw moments using the {meth}`~cmomy.CentralMomentsData.rmom` method.  This uses the {func}`~cmomy.convert.moments_type` function behind the scenes.  

In [15]:
# <x**k>
ce.rmom()

Additionally, there are {class}`xarray.DataArray` like attributes

In [16]:
ce.coords

Coordinates:
    *empty*

In [17]:
ce.attrs

{}

In [18]:
ce.sizes

Frozen({'mom_0': 4})

In [19]:
ce.dims

('mom_0',)

To access the underlying data use {meth}`~cmomy.CentralMomentsData.obj` attribute to access the {class}`~xarray.DataArray` style data, or {meth}`~cmomy.CentralMomentsData.to_numpy` method to access the underlying {class}`~numpy.ndarray`.

In [20]:
ce.obj

In [21]:
ce.to_numpy()

array([ 5.3105e+01,  5.5314e-01,  8.6378e-02, -5.7464e-03])

## Manipulating (co)moments

So we have our averages.  Cool.  Not very special.  But what if instead we repeat our experiment.  Let's say we did the experiment 10 times each time giving a single record.   Then our data would look like

In [22]:
nsamp = 100
nrec = 10
energy = xr.DataArray(rng.random((nrec, nsamp)), dims=["rec", "samp"])
position = xr.DataArray(rng.random((nrec, nsamp, 3)), dims=["rec", "samp", "dim"])

# weight associated with each sample and each record
w = xr.DataArray(rng.random((nrec, nsamp)), dims=["rec", "samp"])

# average over the samples
ce = cmomy.wrap_reduce_vals(energy, weight=w, dim="samp", mom=3)
cp = cmomy.wrap_reduce_vals(position, weight=w, dim="samp", mom=3)

Consider just the energy.  We suspect that there is some correlation between the experiments (they where done in rapid succession).  So we'd like to consider two adjacent experiments as one experiment.  For this, we can use the {meth}`~cmomy.CentralMomentsData.reduce` method with the `block` parameter.

In [23]:
ce

In [24]:
ce.reduce(block=2, dim="rec")

Instead, we can resample the already averaged data using {meth}`cmomy.CentralMomentsData.resample_and_reduce`.  We produce a 20 new samples taken from the original (averaged) data.

In [25]:
ce_resamp = ce.resample_and_reduce(dim="rec", sampler={"nrep": 20})
ce_resamp

This is different than the usual 'resample values'.  This is also available if the original data is available.

In [26]:
# consider 'all' the data for this
energy_stack = energy.stack(c=["rec", "samp"])
weight_stack = w.stack(c=["rec", "samp"])

out = cmomy.wrap_resample_vals(
    energy.stack(c=["rec", "samp"]),
    weight=w.stack(c=["rec", "samp"]),
    dim="c",
    sampler={"nrep": 20},
    mom=3,
)
out

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.


We see that the deviation in the moments is similar using the two resampling methods:

In [27]:
out.obj.sel(mom_0=slice(1, None)).std("rep")

In [28]:
ce_resamp.obj.sel(mom_0=slice(1, None)).std("rep")

We can also reduce our original data across all the records using {meth}`~cmomy.CentralMomentsData.reduce`

In [29]:
ce.reduce(dim="rec")