# Exploratory Model Analysis

*Purpose*: **TODO**.


## Core Idea: Curiosity and Skepticism

Remember the core principles of EDA:

1. Curiosity: Generate lots of ideas and hypotheses about your data.
2. Skepticism: Remain unconvinced of those ideas, unless you can find credible patterns to support them.

We can apply these same principles when studying a model; a process called *exploratory model analysis*. However, when studying a model, we have the means to more immediately test our hypotheses: We can *evaluate* the model to generate new data as we're carrying out our exploration!

These ideas can be a little abstract, so let's illustrate them with a concrete example.


## Setup


In [None]:
import grama as gr
DF = gr.Intention()
%matplotlib inline

## Running Example: Circuit model


In [None]:
from grama.models import make_prlc_rand
md_circuit = make_prlc_rand()

# Basic Facts


### __qX__ Model summary


*Observations*

- Compare the variability (measured by standard deviation) of the three random variables. Which is most variable?
  - (Your response here)


## Model context

The deterministic variables `L, R, C` are the *designed* component values.

The random variables `dL, dR, dC` are *perturbations* to the designed component values.


# Inputs

## Assess input variability


### __qX__ Overview of inputs


In [None]:
(
    md_circuit

)

*Observations*

- ...
  - (Your response here)


Here's a hypothesis:

> Since the random perturbations `dL, dR, dC` are mutually independent, we should see the same variability regardless of the circuit component values.

You'll assess this hypothesis in the next task.


### __qX__ Compare designed and realized values

The following plot shows the designed and realized capacitance values. Answer the questions under *observations* below.

*Hint*: Remember that you can add `gr.scale_x_log10()` and `gr.scale_y_log10()` to a ggplot to change to a log-log scale.


In [None]:
# TASK: Inspect the following plot
(
    # NOTE: No need to edit this part of the code
    md_circuit
    >> gr.ev_sample(
        n=20,
        df_det=gr.df_make(
            R=1e-1,
            L=1e-5,
            C=gr.logspace(-3, 2, 20)
        )
    )
    # Visualize
    >> gr.ggplot(gr.aes("C", "Cr"))
    + gr.geom_point()

)

*Observations*

- Consider the hypothesis `we should see the same variability regardless of the circuit component values`. Is this hypothesis true?
  - (Your response here)


# Outputs


### __qX__ Overview of outputs


In [None]:
(
    md_circuit

)

*Observations*

*Note*: If you can't reliably make out the shapes of the distributions, try increasing `n`.

- What distribution shapes do the realized component values `Lr, Rr, Cr` have?
  - (Your response here)
- What distribution shape does the output `Q` have?
  - (Your response here)
- What distribution shape does the output `omega0` have?
  - (Your response here)


### __qX__ Density of outputs

Visualize `Q` and `omega0` with a 2d bin plot. Increase the sample size `n` to get a "full" plot.


In [None]:
(
    md_circuit

)

*Observations*

- Briefly describe the distribution of realized performance (values of `Q` and `omega0`).
  - (Your response here)


### __qX__ Compare nominal with distribution


In [None]:
(
    md_circuit
    >> gr.ev_sample(n=1e4, df_det="nom")
    
    >> gr.ggplot(gr.aes("Q", "omega0"))
    + gr.geom_bin2d()
    + gr.geom_point(
        # Use the `data` argument to add a geometry for
        # the output value associated with nominal input values

        color="salmon",
        size=4,
    )
)

*Observations*

The nominal design (red point) represents the predicted performance if we assume the nominal circuit component values. 

- How does the distribution of real circuit performance (values of `Q`, `omega0`) compare with the nominal performance?
  - (Your response here)
- Is the most likely performance (values of `Q`, `omega0`) the same as the nominal performance? ("Most likely" is where `count` is the largest.)
  - (Your response here)
- Assume that another system depends on the particular values of `Q` and `omega0` provided by this system. Would it be safe to assume the performance is within 1% of the nominal performance?
  - (Your response here)


# Input-to-output Relationships

TODO


### __qX__ Correlation tile plot

Use the routine `tf_iocorr()` to transform the data from `ev_sample()` into input-to-output correlations. The `pt_auto()` routine will automatically plot those correlations as colors on a tile plot. Study the correlation tile plot, and answer the questions under *observations* below.


In [None]:
(
    ## NOTE: No need to edit the call to ev_sample()
    md_circuit
    >> gr.ev_sample(
        n=1e3, 
        df_det="nom",
    )
    ## TODO: Use gr.tf_iocorr() to compute 
    ## input-to-output correlations

    ## NOTE: No need to edit; pt_auto will automatically
    ## adjust the plot to use the correlation data
    >> gr.pt_auto()
)

*Observations*

- Do we have any information on how the outputs `omega0` and `Q` depend on the deterministic inputs `L, R, C`? Why do you think that is?
  - (Your response here)
- Based on the available information: Which inputs does `omega0` depend on? How do those inputs affect `omega0`?
  - (Your response here)
- Based on the available information: Which inputs does `Q` depend on? How do those inputs affect `Q`?
  - (Your response here)


### __qX__ Targeted scatterplots

(Use this to get more detailed view)


### __qX__ Sinew plot


In [None]:
(
    md_circuit

)

*Observations*

- ...
  - (Your response here)
