# Grama: Fitting Multivariate Distributions

*Purpose*: 


## Setup


In [None]:
import grama as gr
DF = gr.Intention()
%matplotlib inline

For this exercise, we'll study a dataset of observations on die cast aluminum parts.


In [None]:
from grama.data import df_shewhart


# Dependency in the wild


TODO

In [None]:
(
    df_shewhart
    >> gr.ggplot(gr.aes("density", "hardness"))
    + gr.geom_point()
)

A normal distribution does a *reasonable* job representing both `density` and `hardness`.


In [None]:
## NOTE: No need to edit
(
    df_shewhart
    >> gr.tf_mutate(
        q_density=gr.qqvals(DF.density, "norm"),
        q_hardness=gr.qqvals(DF.hardness, "norm"),
    )
    >> gr.tf_rename(
        v_density="density",
        v_hardness="hardness",
    )
    >> gr.tf_pivot_longer(
        columns=["q_density", "q_hardness", "v_density", "v_hardness"],
        names_to=[".value", "var"],
        names_sep="_",
    )
    
    >> gr.ggplot(gr.aes("q", "v"))
    + gr.geom_abline(intercept=0, slope=1, linetype="dashed")
    + gr.geom_point()
    + gr.facet_wrap("var", scales="free")
    + gr.labs(x="Reference Quantile", y="Observed Quantile")
)

However,


In [None]:
## NOTE: No need to edit
# Build a model 
md_independence = (
    gr.Model("Independent Properties")
    >> gr.cp_marginals(
        density=gr.marg_fit("norm", df_shewhart.density),
        hardness=gr.marg_fit("norm", df_shewhart.hardness),
    )
    >> gr.cp_copula_independence()
)

# Draw simulated observations
(
    md_independence
    >> gr.ev_sample(n=1e3, df_det="nom", skip=True)
    >> gr.tf_mutate(source="Simulated")
    >> gr.tf_bind_rows(
        df_shewhart
        >> gr.tf_mutate(source="Experimental")
    )
    
    >> gr.ggplot(gr.aes("density", "hardness"))
    + gr.geom_point(gr.aes(color="source"))
    + gr.theme_minimal()
)

This model does not respect the correlation we observe in the data.


# Marginal-Copula Approach


In [None]:
## NOTE: No need to edit
# Build a model 
md_copula = (
    gr.Model("Independent Properties")
    >> gr.cp_marginals(
        density=gr.marg_fit("norm", df_shewhart.density),
        hardness=gr.marg_fit("norm", df_shewhart.hardness),
    )
    ## KEY DIFFERENCE: Fit a gaussian copula
    >> gr.cp_copula_gaussian(df_data=df_shewhart)
)

# Draw simulated observations
(
    md_copula
    >> gr.ev_sample(n=1e3, df_det="nom", skip=True)
    >> gr.tf_mutate(source="Simulated")
    >> gr.tf_bind_rows(
        df_shewhart
        >> gr.tf_mutate(source="Experimental")
    )
    
    >> gr.ggplot(gr.aes("density", "hardness"))
    + gr.geom_point(gr.aes(color="source"))
    + gr.theme_minimal()
)

## Steps

1. Fit a marginal for each uncertain quantity
    - Follow the process from `e-grama06-fit-univar`; this should include checking for statistical control!
2. Fit a copula to relate the uncertain quantities
3. Assess the model


# Case Study: Circuit Performance


In [None]:
from grama.models import make_prlc_rand
md_circuit = make_prlc_rand()
md_circuit

In [None]:
df_circuit = (
    md_circuit
    >> gr.ev_sample(n=1e4, df_det="nom", seed=101)
)

(
    df_circuit
    >> gr.ggplot(gr.aes("Q", "omega0"))
    + gr.geom_bin2d()
)

## Marginals


### __qX__ Check for statistical control

Check for statistical control of the output `omega0`. Use a control chart to make this determination.

*Hint*: You *do not* need to write all this code from scratch! Try copying code from the previous exercise `e-grama-06-fit-univar`.


In [None]:
## TASK: Make a control chart of batch size n_batch = 10


*Observations*

- How many batches *total* are there?
  - (Your response here)
- How many batches fall outside the control limits? Is that significantly more than we would expect?
  - (Your response here)
- Does this process seem to be under statistical control?
  - (Your response here)


### __qX__ Fit a marginal for `omega0`


In [None]:
## TASK: Fit a marginal for `omega0`
mg_omega0 = None


## NOTE: Use the following to help check your work
(
    df_circuit
    >> gr.tf_mutate(q=gr.qqvals(DF.omega0, marg=mg_omega0))
    >> gr.ggplot(gr.aes("q", "omega0"))
    + gr.geom_abline(intercept=0, slope=1, linetype="dashed")
    + gr.geom_point()
)

*Observations*

- How well does your distribution fit the data?
  - (Your response here)


### __qX__ Fit a marginal for `Q`


In [None]:
## TASK: Fit a marginal for `Q`
mg_Q = None


## HINT: Use the assessment techniques discussed in 
# the previous exercise


*Observations*

- How well does your distribution fit the data?
  - (Your response here)


## Copula


In [None]:
## NOTE: No need to edit
md_out_independence = (
    gr.Model("Circuit Output: Independence")
    >> gr.cp_marginals(
        omega0=mg_omega0,
        Q=mg_Q,
    )
    >> gr.cp_copula_independence()
)

md_out_independence 


### __qX__ Fit a gaussian copula

*Hint*: The code above demonstrates how to add a gaussian copula to a grama model.


In [None]:
## TASK: Fit a gaussian copula model
md_out_copula = (
    gr.Model("Circuit Output: Copula")
    >> gr.cp_marginals(
        omega0=mg_omega0,
        Q=mg_Q,
    )

)

## NOTE: Do not edit; use this to check your work
assert \
    isinstance(md_out_copula.density.copula, gr.CopulaGaussian), \
    "md_out_copula must have a gaussian copula"

md_out_copula 


### __qX__ Compare the models

Use the following code to compare the multivariate models; answer the questions under *observations* below.


In [None]:
## NOTE: No need to edit
(
    df_circuit
    >> gr.tf_mutate(source="True")
    >> gr.tf_bind_rows(
        md_out_independence
        >> gr.ev_sample(n=1e4, df_det="nom", skip=True)
        >> gr.tf_mutate(source="Independence")
    )
    >> gr.tf_bind_rows(
        md_out_copula
        >> gr.ev_sample(n=1e4, df_det="nom", skip=True)
        >> gr.tf_mutate(source="Copula")
    )
    
    >> gr.ggplot(gr.aes("Q", "omega0"))
    + gr.geom_bin2d()
    + gr.facet_wrap("source")
)

*Observations*

- How well does the independence model represent the true data?
  - (Your response here)
- What aspects does the copula model get *correct*?
  - (Your response here)
- What aspects does the copula model get *incorrect*?
  - (Your response here)
