# *grama* Demo

---

*grama* is a *grammar of model analysis*---a language for describing and analyzing mathematical models. Heavily inspired by [ggplot](https://ggplot2.tidyverse.org/index.html), `py_grama` is a Python package that implements *grama* by providing tools for defining and exploring models. This notebook illustrates how one can use *grama*.

In [None]:
### Setup
from dfply import *
import grama as gr
import numpy as np
import pandas as pd
import seaborn as sns

# Quick Tour: Analyzing a model

---

*grama* separates the model *definition* from model *analysis*; once the model is fully defined, only minimal information is necessary for further analysis.

As a quick demonstration, we import a fully-defined model provided with *grama*, and carry out a few analyses.

In [None]:
from grama.models import make_cantilever_beam

model_beam = make_cantilever_beam()
model_beam.printpretty()

The method `printpretty()` gives us a quick summary of the model; we can see this model has two deterministic variables `w,t` and four random variables `H,V,E,Y`, all of which affect the outputs `c_area, g_stress, g_displacement`. Since there are random variables, there is a source of *uncertainty* which we must consider when studying this model.

## Studying model behavior with uncertainty

Since the model has sources of randomness (`var_rand`), we must account for this when studying its behavior. We can do so through a Monte Carlo analysis. We make decisions about the deterministic inputs by specifying `df_det`, and the `py_grama` function `gr.ev_monte_carlo` automatically handles the random inputs. Below we fix a nominal value `w = 0.5 * (2 + 4)`, sweep over values for `t`, and account for the randomness via Monte Carlo.

In [None]:
## Generate data for deterministic variables
df_beam_det = pd.DataFrame(
    data={
        "w": [0.5 * (2 + 4)] * 10,
        "t": np.linspace(2.5, 3, num=10)
    }
)

## Carry out a Monte Carlo analysis of the random variables
df_beam_mc = \
    model_beam >> \
    gr.ev_monte_carlo(n=1e2, df_det=df_beam_det)

To help plot the data, we use `dfply` to wrange the data, and `seaborn` to quickly visualize results.

In [None]:
df_beam_wrangled = \
    df_beam_mc >> \
    gather("output", "y", ["c_area", "g_stress", "g_displacement"])

g = sns.FacetGrid(df_beam_wrangled, col="output", sharey=False)
g.map(sns.lineplot, "t", "y")

The mean behavior of the model is shown as a solid line, while the band visualizes the standard deviation of the model output. From this plot, we can see:

- The random variables have no effect on `c_area` (there is no band)
- Comparing `g_stress` and `g_displacement`, the former is more strongly affected by the random inputs, as illustrated by its wider uncertainty band.

While this provides a visual description of how uncertainty affects our outputs, we might be interested in *how* the different random variables affect our outputs.

## Probing random variable effects

One way to quantify the effects of random variables is through *Sobol' indices*, which quantify variable importance by the fraction of output variance "explained" by each random variable. Since distribution information is included in the model, we can carry out a *hybrid-point Monte Carlo* and analyze the results with two calls to `py_grama`.

In [None]:
df_sobol = \
    model_beam >> \
    gr.ev_hybrid(n_samples=1e3, df_det="nom", seed=101) >> \
    gr.tf_sobol()

df_sobol >> \
    select(X.g_stress, X.g_displacement, X.ind) >> \
    mask(str_detect(X.ind, "S_"))

These results suggest that `g_stress` is largely insensitive to `E`, while `g_displacement` is insensitive to `Y`. For `g_displacement`, the input `V` contributes about three times the variance as variables `H,E`.

To get a *qualitative* sense of how the random variables affect our model, we can perform a set of sweeps over random variable space with a *sinew* design. We use `py_grama` to generate the design, then use `dfply` to wrangle the data for plotting.

In [None]:
df_beam_sweeps = \
    model_beam >> \
    gr.ev_sinews(n_density=50, n_sweeps=10, df_det="nom")

First, we visualize the design in the four-dimensional random variable space of `[H,V,E,Y]`.

In [None]:
sns.pairplot(
    data=df_beam_sweeps,
    vars=model_beam.var_rand,
    hue="sweep_ind"
)

Here we can see the sweeps cross the domain in straight lines at random starting locations. Each of these sweeps gives us a "straight shot" within a single variable. Visualizing the outputs for these sweeps will give us a sense of a single variable's influence, contextualized by the effects of the other random variables.

In [None]:
sns.relplot(
    data=df_beam_sweeps >> \
        gather("input", "x", model_beam.var_rand) >> \
        gather("output", "y", model_beam.outputs) >> \
        mask(X.sweep_var == X.input),
    x="x",
    y="y",
    hue="sweep_ind",
    col="input",
    row="output",
    kind="line",
    facet_kws=dict(sharex=False, sharey=False)
)

Based on this plot, we can see:

- The output `c_area` is insensitive to all the random variables
- As the Sobol' analyis above suggested `g_stress` is insensitive to `E`, and `g_displacement` is insensitive to `Y`
- Visualizing the results shows that inputs `H,E` tend to 'saturate' in their effects on `g_displacement`, while `V` is linear over its domain. This may explain the difference in contributed variance

## The *grama* language

---

As a language, *grama* has both *objects* and *verbs*. 

### Objects

---

*grama* as a language considers two categories of objects:

- **data**: observations on various quantities, implemented by the Python package `Pandas`
- **models**: a function and complete description of its inputs, implemented by `py_grama`

Since data is already well-handled by Pandas, `py_grama` focuses on providing tools to handle models. A `py_grama` model has **functions** and **inputs**:  The method `printpretty()` gives a quick summary of the model's inputs and function outputs. Model inputs are organized into:

|            | Deterministic                            | Random     |
| ---------- | ---------------------------------------- | ---------- |
| Variables  | `model.var_det`                          | `model.var_rand` |
| Parameters | `model.density.marginals[i].d_param`     | (Future*)  |

- **Variables** are inputs to the model's functions
  + **Deterministic** variables are chosen by the user; the model above has `w, t`
  + **Random** variables are not controlled; the model above has `H, V, E, Y`
- **Parameters** define random variables
  + **Deterministic** parameters are currently implemented; these are listed under `var_rand` with their associated random variable
  + **Random** parameters* are not yet implemented

The `outputs` section lists the various model outputs. The model above has `c_area, g_stress, g_displacement`.

### Verbs

---

*grama* as a language has four categories of verbs, which are sorted based on the objects they take and return:

| Category  | Stem (Short) | In    | Out   |
| --------- | ------------ | ----- | ----- |
| Evaluate  | eval_ (ev_)  | Model | Data  |
| Fit       | fit_  (ft_)  | Data  | Model |
| Transform | tran_ (tf_)  | Data  | Data  |
| Compose   | compo (cp_)  | Model | Model |

`py_grama` function names start with a stem, then continue with the specific function name. Both long and short forms exist to distinguish between vanilla functions and *pipe-enabled* versions.

### Functional programming (Pipes)

---

`py_grama` provides tools to use functional programming patterns. Short-stem versions of `py_grama` functions are *pipe-enabled*, meaning they can be used in functional programming form with the pipe operator `>>`. These pipe-enabled functions are simply aliases for the base functions, as demonstrated below:

In [None]:
df_base = gr.eval_nominal(model_beam, df_det="nom")
df_functional = model_beam >> gr.ev_nominal(df_det="nom")

df_base.equals(df_functional)

Functional patterns enable chaining multiple commands, as demonstrated in the Sobol' index code above. In nested form using base functions, this would be:

```python
df_sobol = gr.tran_sobol(gr.eval_hybrid(model_beam, n_samples=1e3, df_det="nom", seed=101))
```

From the code above, it is difficult to see that we first consider `model_beam`, perform a hybrid-point evaluation, then use those data to estimate Sobol' indices. With more chained functions, this only becomes more difficult. In functional form, the order of operations is given by the code order:

```python
df_sobol = \
    model_beam >> \
    gr.ev_hybrid(n_samples=1e3, df_det="nom", seed=101) >> \
    gr.tf_sobol()
```

The other advantage of using functional patterns with the `>>` pipe is that `py_grama` functions can then be chained with functions from `dfply`, which provides pipe-enabled calls to `Pandas` functions. This allows us to combine *data science* tools with *model analysis tools*.

### Data Science Integration

---

To demonstrate the data science integration of `py_grama`, let's take apart the *sinew plot* analysis carried out above. The sinew design was generated by the call

```python
df_beam_sweeps = \
    model_beam >> \
    gr.ev_sinews(n_density=50, n_sweeps=10, df_det="nom")
```

which produced the result:

In [None]:
df_beam_sweeps

The data were then manipulated with a sequence of calls to `dfply`:

```python
df_beam_sweeps >> \
    gather("input", "x", model_beam.var_rand) >> \
    gather("output", "y", model_beam.outputs) >> \
    mask(X.sweep_var == X.input)
```

Step by step, we first *gather* the variables `model_beam.var_rand = [H,V,E,Y]` into key columns `input` and value columns `x`. This "gathers" the data from a wide table into a taller form, shown below.

In [None]:
df_beam_sweeps >> \
    gather("input", "x", model_beam.var_rand)

Note that the columns `[H,V,E,Y]` are now gone, replaced with codes `input` for the variable names and values stored in the `x` column.

We carry out a similar operation on `model_beam.outputs = [c_area, g_stress, g_displacement]`, which yields:

In [None]:
df_beam_sweeps >> \
    gather("input", "x", model_beam.var_rand) >> \
    gather("output", "y", model_beam.outputs)

This gives a gathered set of inputs `x` and outputs `y`, which we could plot all together:

In [None]:
sns.relplot(
    data=df_beam_sweeps >> \
        gather("input", "x", model_beam.var_rand) >> \
        gather("output", "y", model_beam.outputs),
    x="x",
    y="y",
    hue="sweep_ind",
    kind="line"
)


However, this is an utterly confusing plot; we're showing multiple quantities with different units on the same axes. To solve this issue but still show as much of the data as possible, we can use *facets* (sometimes called [small multiples](https://en.wikipedia.org/wiki/Small_multiple)). This will break the plot into a grid of smaller plots, one for each input/output pair.

In [None]:
sns.relplot(
    data=df_beam_sweeps >> \
        gather("input", "x", model_beam.var_rand) >> \
        gather("output", "y", model_beam.outputs),
    x="x",
    y="y",
    hue="sweep_ind",
    col="input",
    row="output",
    kind="line",
    facet_kws=dict(sharex=False, sharey=False)
)

Note that we have used the `facet_kws` optional argument to allow each subplot to scale its own axis to fit its subset of the data.

This is close to the plot we saw above, but with a number of "spikes". This is not a lack of smoothness in the response, but rather a consequence of samples "out-of-plane" being visualized in each facet. To solve this, we add one last `mask` call to the chain of data commands, "masking out" the out-of-plane cases. The following is the final version of the plot, showed above.

In [None]:
sns.relplot(
    data=df_beam_sweeps >> \
        gather("input", "x", model_beam.var_rand) >> \
        gather("output", "y", model_beam.outputs) >> \
        mask(X.sweep_var == X.input),
    x="x",
    y="y",
    hue="sweep_ind",
    col="input",
    row="output",
    kind="line",
    facet_kws=dict(sharex=False, sharey=False)
)

With a combination of `py_grama` and existing data science tools, one can easily carry out *exploratory model analysis*, like the analysis shown above.

Note: It appears that the original [dfply](https://github.com/kieferk/dfply) may no longer be updated; I have forked [my own version](https://github.com/zdelrosario/dfply) which I will maintain to support `py_grama`.