In [None]:
%load_ext autoreload
%autoreload 2
%config Completer.use_jedi = False

In [None]:
from pprint import pprint

from tiled.client import from_uri
import matplotlib.pyplot as plt
import matplotlib as mpl

In [None]:
mpl.rcParams['mathtext.fontset'] = 'stix'
mpl.rcParams['font.family'] = 'STIXGeneral'
mpl.rcParams['text.usetex'] = True
plt.rc('xtick', labelsize=12)
plt.rc('ytick', labelsize=12)
plt.rc('axes', labelsize=12)
mpl.rcParams['figure.dpi'] = 300

# Basic Tutorial

The [AIMM post-processing pipeline](https://github.com/AI-multimodal/aimm-post-processing) is built around the `Operator` object. The `Operator`'s job is to take a `client`-like object and execute a post-processing operation on it. The specific type of operation is defined by the operator. All metadata/provenance is tracked.

In [None]:
from aimm_post_processing import operations

Connect to the `tiled` client. This one is the [aimmdb](https://github.com/AI-multimodal/aimmdb) hosted at [aimm.lbl.gov](https://aimm.lbl.gov/api). Note that my API key is stored in an environment variable, `TILED_API_KEY`. 

In [None]:
CLIENT = from_uri("https://aimm.lbl.gov/api")

In [None]:
CLIENT["uid"]

## Unary operators

A [unary operator](https://en.wikipedia.org/wiki/Unary_operation) takes a single input. This input specifically refers to the fact that these operators only act on a single data point (meaning a `DataFrameClient`) at a time. We'll provide some examples here.

First, lets get a single `DataFrameClient` object:

In [None]:
df_client = CLIENT["dataset"]["newville"]["edge"]["K"]["element"]["Co"]["uid"]["Bt5hUbgkfzR"]
type(df_client)

### The identity

The simplest operation we can perform is nothing! Let's see what it does. First, feel free to print the output of the `df_client` so you can see what's contained. Using the `read()` method will allow you to access the actual data, and the `metadata` property will allow you to access the metadata:

In [None]:
df_client.read()  # is a pandas.DataFrame
df_client.metadata  # is a python dictionary

The identity operator is instantiated and then run on the `df_client`.

In [None]:
op = operations.Identity()
result = op(df_client)

Every result of any operator will be a dictionary with two keys: `"data"` and `"metadata"`, which correspond to the results of `read()` and `metadata` above. The data is the correspondingly modified `pandas.DataFrame` object (which in the case of the identity, is of course the same as what we started with). The metadata is custom created for a derived, post-processed object.

First, let's check that the original and "post-processed" data are the same.

In [None]:
assert (df_client.read() == result["data"]).all().all()

Next, the metadata:

In [None]:
result["metadata"]

First, a new unique id is assigned. Second, given this is a derived quantity, the previous original metadata is now gone in place of a `post_processing` key. This key contains every bit of information needed for provenance, including the parents (which is just one in the case of a unary operator), the operator details (including code version), any keyword arguments used during instantiation, and the datetime at which the opration was run. We use the [MSONable](https://pythonhosted.org/monty/_modules/monty/json.html) library to take care of most of this for us.

We can compare against the original metadata to see the differences.

In [None]:
df_client.metadata

### Standardizing the grids

Often times (and especially for e.g. machine learning applications) we need to interpolate our spectral data onto a common grid. We can do this easily with the `StandardizeGrid` unary operator.

In [None]:
op = operations.StandardizeGrid(x0=7550.0, xf=8900.0, nx=100, x_column="energy", y_columns=["itrans"])
result = op(df_client)

Here's a visualization of what it's done:

In [None]:
d0 = df_client.read()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(3, 2))
ax.plot(d0["energy"], d0["itrans"], 'k-')
ax.plot(result["data"]["energy"], result["data"]["itrans"], 'r-')
plt.show()

### Batch processing

While a unary operator acts on only a single input, there are cases where we might wish to apply the same operator to a list of `client`-like objects. The operator `__call__` can handle this. For example, consider the following:

In [None]:
node_client = CLIENT["edge"]["L3"]["uid"]
node_client

Currently, there are 23 entries with `L3` edge keys in the entire database. Let's act the identity on this `Node`, which will apply the operator to each entry individually.

In [None]:
op = operations.Identity()
result = op(node_client)

The first of these results corresponds to the first entry above, the second to the second, and so on.

In [None]:
result[0]["metadata"]

Note as well that `__call__` will attempt to intelligently detect if you provided it with the incorrect type of node. For example:

In [None]:
node_client = CLIENT["edge"]["L3"]
node_client

In [None]:
op = operations.Identity()
op(node_client)