# Grama: Exploratory Model Analysis

*Purpose*: Once we've built a grama model, we can use a variety of tools to *make sense* of the model. However, as we saw with exploratory data analysis (EDA), we need both tools and a *mindset* to make sense of data. This exercise is about an analogous approach to sense-making with models: *exploratory model analysis* (EMA).


## Core Idea: Curiosity and Skepticism

Remember the core principles of EDA:

1. Curiosity: Generate lots of ideas and hypotheses about your data.
2. Skepticism: Remain unconvinced of those ideas, unless you can find credible patterns to support them.

We can apply these same principles when studying a model; a process called *exploratory model analysis*. However, when studying a model, we have the means to more immediately test our hypotheses: We can *evaluate* the model to generate new data as we're carrying out our exploration!

These ideas can be a little abstract, so let's illustrate them with a concrete example.


## Setup


In [None]:
import grama as gr
import pandas as pd
DF = gr.Intention()
%matplotlib inline

## Running Example: Circuit model

The following code initializes a model for a [parallel RLC circuit](https://en.wikipedia.org/wiki/RLC_circuit#Parallel_circuit). I'm not expecting you to know any circuit theory; in fact, I chose this model because you're unlikely to know much about this system. We will use exploratory model analysis techniques to learn something about this model.


In [None]:
from grama.models import make_prlc_rand
md_circuit = make_prlc_rand()

# Basic Facts

Before we start exploring a model, we should first understand the basic facts about that model. This is most easily done in grama by printing the model's summary:


### __qX__ Model summary

Print the model summary for `md_circuit`. Answer the questions under *observations* below.


*Observations*

- Compare the variability (measured by standard deviation) of the three random variables. Which is most variable?
  - (Your response here)


## Model context

The symbols above don't tell us anything about what the model quantities *mean*. Here are some basic facts on the quantities in the model

| Variable | I/O | Description |
|---|---|---|
| `L` | Input | Nominal inductance |
| `R` | Input | Nominal resistance |
| `C` | Input | Nominal capacitance |
| `dL` | Input | Percent error on inductance |
| `dR` | Input | Percent error on resistance |
| `dC` | Input | Percent error on capacitance |
| `omega0` | Output | Natural frequency |
| `Q` | Output | Quality factor |

The deterministic variables `L, R, C` are the *designed* component values; these are selected by a circuit designer to achieve a desired performance. 

The random variables `dL, dR, dC` are *perturbations* to the designed component values. These random perturbations model the variability we would see in production, as manufactured components exhibit real variability.

The output `omega0` is the natural frequency; depending on the use-case, a designer would want to achieve a particular `omega0`. Thus having `omega0` lie close to some target value is desirable.

The output `Q` is the quality factor; a larger `Q` corresponds to a more "selective" circuit. Put differently, a higher `Q` helps the circuit reject unwanted signals.


# Inputs

First, let's get a sense for how the inputs of the model vary.


### __qX__ Overview of inputs

Generate a random sample from the model at its nominal deterministic values, and visualize all of the random **inputs**. Answer the questions under *observations* below.

*Hint*: You should be using `gr.ev_sample()` for this task. A particular keyword argument with `gr.ev_sample()` will allow you to generate the appropriate plot using `gr.pt_auto()`.


In [None]:
(
    md_circuit

)

*Observations*

*Note*: If you can't reliably make out the shapes of the distributions, try increasing `n`.

- What shape of distribution does each random input have?
  - (Your response here)
- What kind of dependency do the random inputs exhibit?
  - (Your response here)


## Hypothesis: Same Variability

> Since the random perturbations `dL, dR, dC` are mutually independent, we should see the same variability regardless of the circuit component values.

You'll assess this hypothesis in the next task.


### __qX__ Compare designed and realized values

The following plot shows the designed and realized capacitance values. Answer the questions under *observations* below.

*Hint*: Remember that you can add `gr.scale_x_log10()` and `gr.scale_y_log10()` to a ggplot to change to a log-log scale.


In [None]:
# TASK: Inspect the following plot
(
    # NOTE: No need to edit this part of the code
    md_circuit
    >> gr.ev_sample(
        n=20,
        df_det=gr.df_make(
            R=1e-1,
            L=1e-5,
            C=gr.logspace(-3, 2, 20)
        )
    )
    # Visualize
    >> gr.ggplot(gr.aes("C", "Cr"))
    + gr.geom_point()

)

*Observations*

- Consider the hypothesis `we should see the same variability regardless of the circuit component values`. Is this hypothesis true?
  - (Your response here)


# Outputs

Next, we'll study the outputs of the model "on their own"; that is, we're not yet going to look at the input-to-output relationships.


### __qX__ Nominal outputs

Evaluate the model at its nominal input values.


In [None]:
## TODO: Evaluate the model at its nominal input values
df_nominal = (
    md_circuit

)

# NOTE: Use this to check your work
assert \
    isinstance(df_nominal, pd.DataFrame), \
    "df_nominal is not a DataFrame; did you evaluate the model?"

df_nominal

### __qX__ Overview of outputs

Generate a random sample from the model at its nominal deterministic values, and visualize the distribution of **outputs**. Answer the questions under *observations* below.

*Hint*: This is a lot like __qX__ Overview of inputs. You should only need to use `gr.ev_sample()` and `gr.pt_auto()`.


In [None]:
(
    md_circuit
    ## TODO: Generate a random sample from the model, and visualize the
    ## **output** distributions.

)

*Observations*

*Note*: If you can't reliably make out the shapes of the distributions, try increasing `n`.

- What distribution shapes do the realized component values `Lr, Rr, Cr` have?
  - (Your response here)
- What distribution shape does the output `Q` have?
  - (Your response here)
- What distribution shape does the output `omega0` have?
  - (Your response here)


## Hypothesis: Nominal Outputs are Likely Outputs

Here's something we might intuitively expect:

> The nominal model outputs should be more likely output values.

You'll assess this hypothesis multiple different ways in the following tasks.


### __qX__ Compare distribution and nominal values for `omega0`

Create a plot that visualizes both a distribution of `omega0` and the nominal output value of `omega0`. Answer the questions under *observations* below.

*Hint*: One good way to do this is with a histogram and a vertical line.


In [None]:
(
    md_circuit
    >> gr.ev_sample(n=1e4, df_det="nom")

)

*Observations*

- How does the hypothesis `The nominal model outputs should be more likely output values` hold up for `omega0`?
  - (Your response here)


### __qX__ Density of outputs

Visualize a random sample of `Q` and `omega0` with a 2d bin plot. Increase the sample size `n` to get a "full" plot. Answer the questions under *observations* below.


In [None]:
(
    md_circuit
#     >> gr.ev_sample(n=???, df_det="nom")

)

*Observations*

- Briefly describe the distribution of realized performance (values of `Q` and `omega0`).
  - (Your response here)


### __qX__ Compare nominal with distribution

Add a point for the nominal output values to the following plot. Answer the questions under *observations* below.


In [None]:
## NOTE: You don't need to touch most of this code; 
## just the line indicated below
(
    md_circuit
    >> gr.ev_sample(n=1e4, df_det="nom")
    
    >> gr.ggplot(gr.aes("Q", "omega0"))
    + gr.geom_bin2d()
    + gr.geom_point(
        # Use the `data` argument to add a geometry for
        # the output value associated with nominal input values

        color="salmon",
        size=4,
    )
)

*Observations*

The nominal design (red point) represents the predicted performance if we assume the nominal circuit component values. 

- How does the distribution of real circuit performance (values of `Q`, `omega0`) compare with the nominal performance?
  - (Your response here)
- Is the most likely performance (values of `Q`, `omega0`) the same as the nominal performance? ("Most likely" is where `count` is the largest.)
  - (Your response here)
- Assume that another system depends on the particular values of `Q` and `omega0` provided by this system. Would it be safe to assume the performance is within 1% of the nominal performance?
  - (Your response here)


# Input-to-output Relationships

Now that we have a sense of both the inputs and outputs of the model, let's do some exploration to see how the two are related.


### __qX__ Correlation tile plot

Use the routine `tf_iocorr()` to transform the data from `ev_sample()` into input-to-output correlations. The `pt_auto()` routine will automatically plot those correlations as colors on a tile plot. Study the correlation tile plot, and answer the questions under *observations* below.

*Note*: You're going to see a *ton* of warnings when you run `tf_iocorr()` on the data; you'll think about what those mean in the questions below.


In [None]:
(
    ## NOTE: No need to edit the call to ev_sample()
    md_circuit
    >> gr.ev_sample(
        n=1e3, 
        df_det="nom",
    )
    ## TODO: Use gr.tf_iocorr() to compute 
    ## input-to-output correlations

    ## NOTE: No need to edit; pt_auto will automatically
    ## adjust the plot to use the correlation data
    >> gr.pt_auto()
)

*Observations*

- The verb `tf_iocorr()` should throw a ton of warnings. What are these messages warning you about? Based on the colors in the plot, which correlations are not defined?
  - (Your response here)
- Do we have any information on how the outputs `omega0` and `Q` depend on the deterministic inputs `L, R, C`? Why do you think that is?
  - (Your response here)
- Based on the available information: Which inputs does `omega0` depend on? How do those inputs affect `omega0`?
  - (Your response here)
- Based on the available information: Which inputs does `Q` depend on? How do those inputs affect `Q`?
  - (Your response here)


## Hypothesis: `dR` has no effect on `omega0`

Based on the correlation tile plot above, we can formulate the following hypothesis:

> The variability in the resistance `dR` has no effect on the natural frequency `omega0`.

While this may seem obvious, it's important to keep in mind that correlation is a *crude* measure of dependency. For instance, the following code generates `X, Y` pairs that have a deterministic dependency, but very different correlation values:


In [None]:
## NOTE: No need to edit; run and inspect
# Generate data
df_example = (
    gr.df_make(x=gr.linspace(-1, +1, 100))
    >> gr.tf_mutate(
        y_linear=0.5 * DF.x - 1,
        y_quad=1.0 * DF.x**2 - 0.5,
    )
)

# Compute correlation coefficients
print(
    df_example
    >> gr.tf_summarize(
        corr_linear=gr.corr(DF.x, DF.y_linear),
        corr_quad=gr.corr(DF.x, DF.y_quad),
    )
)

# Visualize
(
    df_example
    >> gr.tf_pivot_longer(
        columns=["y_linear", "y_quad"],
        names_to=[".value", "case"],
        names_sep="_",
    )
    >> gr.ggplot(gr.aes("x", "y", color="case"))
    + gr.geom_point()
)

Note that the `y` most certainly depends on `x` in the `quad` case, but the correlation is zero; this is because correlation can only detect *linear* trends. Thus, we need to exercise caution when interpreting a correlation tile plot.

```{admonition} Correlation is a rough measure of dependency
Correlation is a rough measure of dependency; for instance, a zero correlation can hide nonlinear relationships.
```

We can further investigate the hypothesis formulated from the correlation tile plot by constructing some sweeps.


### __qX__ Sweeps

Construct a sinew plot to investigate how each input affects the outputs of the model. Answer the questions under *observations* below.

*Note*: You only need to sweep the random variables; you can use the nominal levels for the deterministic inputs.


In [None]:
(
    md_circuit
    ## TODO: Make a sinew plot

)

*Observations*

- How does the hypothesis *The variability in the resistance `dR` has no effect on the natural frequency `omega0`* hold up?
  - (Your response here)
