# c05-plates

*Purpose*: Variability in engineering systems exposes us to risk. To rigorously design for variability, we need to analyze data, model the variability, and study its effects on system performance. To practice this entire process, you will study the structural safety of a plate subject to buckling loads.


## Informed Consent

As a reminder, this course is part of a study of engineers' behavior. While not all parts of the course include data collection, we will analyze your responses to this homework as part of the research.

We will analyze your answers to this homework, and may quote this work as part of published research.

You can ask to have your responses excluded from the study after the interview by sending us an email. Before starting this assignment, do you consent to sharing your work with the study?


I agree to share my responses with the study

- (Please type your name here)


In [None]:
import grama as gr
import pandas as pd
DF = gr.Intention()
%matplotlib inline

# For assertion
from pandas.api.types import is_numeric_dtype


## Background

This challenge continues the `c03-stang` challenge; we will start from the same dataset but will apply ideas of modeling and design under uncertainty.


# Assess Statistical Control

Before we studied the Stang et al. dataset using generic EDA techniques; let's revisit this dataset armed with the knowledge of statistical process monitoring.


In [None]:
from grama.data import df_stang


### __q1__ Assess statistical control of `E` across thicknesses

Construct a control chart with groupings according to plate thickness. Assess the state of statistical control of the elasticity. Answer the questions under *observations* below.


In [None]:
## TODO: Construct a control chart
(
    df_stang
# solution-begin
    >> gr.pt_xbs(group="thick", var="E")
# solution-end
)

*Observations*

<!-- task-begin -->
- Is the variability of `E` under statistical control across plate thicknesses? How do you know?
  - (Your response here)
- Is the mean of `E` under statistical control across plate thicknesses? How do you know?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- Is the variability of `E` under statistical control? How do you know?
  - Likely yes; the points are all within the control limits and there are no patterns in the data.
- Is the mean of `E` under statistical control? How do you know?
  - No; there are several violations of the control limits. In particular, the thickest plates have a much lower elasticity than the other specimens.
<!-- solution-end -->


## Follow-up experiment

We've seen that the Stang et al. data are certainly *not* under statistical control (and we have some hints as to what went wrong!). Let's imagine that manufacturing of these plates has ramped up, and we have access to a much larger dataset from this production line.

*Note*: The following data were simulated; they do not correspond to physical experiments.


In [None]:
df_data = pd.read_csv("./data/c05-data.csv")
df_data

### Data Dictionary

Here is a data dictionary for the new `df_data`.

| Symbol | Variable | Meaning |
|---|---|---|
| `E` | Elasticity (ksi) | Mechanical property |
| `mu` | Poisson's ratio (-) | Mechanical property |
| `t` | Thickness (in) | Geometric property |
| `id_machine` | Machine identifier | Manufacturing variable |
| `id_specimen` | Specimen identifier | Manufacturing variable |
| `id_measurement` | Measurement (operator) identifier | Manufacturing variable |


### __q2__ Explore the experimental design

Answer the following questions to better understand the experimental design. Note that the same questions are posed within each cell and under *observations* below.

*Hint*: The verbs `tf_count()` and `tf_distinct()` will be very useful for answering some of these questions!


In [None]:
## Q: What thicknesses were tested?
# solution-begin
(
    df_data
    >> gr.tf_count(DF.t)
)
# solution-end

In [None]:
## Q: How many unique specimens were manufactured?
# solution-begin
(
    df_data
    >> gr.tf_summarize(spec_max=gr.max(DF.id_specimen) + 1)
)
# solution-end

In [None]:
## Q: How many specimens were made on each machine?
# solution-begin
(
    df_data
    >> gr.tf_distinct(DF.id_specimen, DF.id_machine)
    >> gr.tf_count(DF.id_machine)
)
# solution-end

In [None]:
## Q: How many times did each operator measure each specimen?
# solution-begin
(
    df_data
    >> gr.tf_count(DF.id_specimen, DF.id_measurement)
    >> gr.tf_summarize(n_max=gr.max(DF.n))
)
# solution-end

*Observations*

<!-- task-begin -->
- What thicknesses were tested?
  - (Your response here)
- How many unique specimens were manufactured?
  - (Your response here)
- How many specimens were made on each machine?
  - (Your response here)
- How many times did each operator measure each specimen?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- What thicknesses were tested?
  - 0.125 and 0.250 inches
- How many unique specimens were manufactured?
  - 120
- How many specimens were made on each machine?
  - 20 specimens each
- How many times did each operator measure each specimen?
  - Just once
<!-- solution-end -->


### __q3__ Compare across thicknesses

Compare the elasticity across plate thicknesses; does elasticity seem to be consistent across thickness?


In [None]:
## TODO: Compare elasticity across thicknesses
## NOTE: There are many ways to do this!
(
    df_data
# solution-begin
    >> gr.ggplot(gr.aes("t", "E"))
    + gr.geom_boxplot(gr.aes(group="t"), notch=True)
# solution-end
)

*Observations*

<!-- task-begin -->
- Is thickness consistent across plate thickness?
  - (Your response here)
- Will it be reasonable to group together plates of different thicknesses when assessing statistical control? Why or why not?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- Is thickness consistent across plate thickness?
  - No; from the plot above, we can see a significant difference in median plate elasticity across thickness.
- Will it be reasonable to group together plates of different thicknesses when assessing statistical control? Why or why not?
  - No; when assessing statistical control we should use groups within which the behavior is consistent. Since we have identified a strong inconsistency across thickness, each thickness should be treated separately.
<!-- solution-end -->


### __q4__ Assess statistical control of Poisson's ratio

Consider only the `t == 0.250` plates. Assess the state of statistical control of Poisson's ratio. Answer the questions under *observations* below.


In [None]:
## TODO: Assess the state of statistical control
# solution-begin
(
    df_data
    >> gr.tf_filter(DF.t == 0.250)
    >> gr.pt_xbs(group="id_measurement", var="mu")
)
# solution-end

*Observations*

<!-- task-begin -->
- Is `mu` under statistical control? Why or why not?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- Is `mu` under statistical control? Why or why not?
  - Likely yes; there are no patterns in the variability or mean of `mu`.
<!-- solution-end -->


### __q5__ Assess statistical control of elasticity

Consider only the `t == 0.250` plates. Assess the state of statistical control of the elasticity. Answer the questions under *observations* below.


In [None]:
## TODO: Assess the state of statistical control
# solution-begin
(
    df_data
    >> gr.tf_filter(DF.t == 0.250)
    >> gr.pt_xbs(group="id_measurement", var="E")
)
# solution-end

In [None]:
# solution-begin
(
    df_data
    >> gr.tf_filter(DF.t == 0.250)
    >> gr.pt_xbs(group="id_machine", var="E")
)
# solution-end

*Observations*

<!-- task-begin -->
- Why is it important that we limit this analysis to `t == 0.25` plates?
  - (Your response here)
- Is `E` under statistical control? Why or why not?
  - (Your response here)
- Based on the group variable(s) you chose, what follow-up investigations should be done?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- Why is it important that we limit this analysis to `t == 0.25` plates?
  - We saw above that the thinnest `t == 0.125` and thickest `t == 0.25` plates have significantly different mean elasticities. Therefore, it is not valid to lump thin and thick plates in the same group. Limiting to `t == 0.25` plates is one way to avoid this erroneous grouping.
- Is `E` under statistical control? Why or why not?
  - No; the mean elasticity violates the control limits for Machines C and D, and the variability of elasticity violates the UCL for measurement h.
- Based on the group variable(s) you chose, what follow-up investigations should be done?
  - We should investigate Machines C and D, and operator h.
<!-- solution-end -->


### __q6__ Assess statistical control of elasticity (Pt. 2)

Repeat your investigation of statistical control of `E`, but use the filters given below. Answer the questions under *observations* below.


In [None]:
## TASK: Assess statistical control of `E` for the filtered data
(
    df_data
    >> gr.tf_filter(
        DF.t == 0.250,
        DF.id_machine != "C",
        DF.id_measurement != "h",
    )
# solution-begin
    >> gr.pt_xbs(group="id_machine", var="E")
# solution-end
)

In [None]:
## TASK: Assess statistical control of `E` for the filtered data
(
    df_data
    >> gr.tf_filter(
        DF.t == 0.250,
        DF.id_machine != "C",
        DF.id_measurement != "h",
    )
# solution-begin
    >> gr.pt_xbs(group="id_measurement", var="E")
# solution-end
)

*Observations*

<!-- task-begin -->
- For the set of machines and operators considered, is `E` under statistical control? Why or why not?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- For the set of machines and operators considered, is `E` under statistical control? Why or why not?
  - Likely yes; we see no patterns across machines or operators.
<!-- solution-end -->


# Consider Sources of Variability

Now that we've found a more stable subset of the data, we can move towards modeling the variability with distributions. However, before we go much further, we should investigate the *sources* of variability in the data. Note that we have access to multiple measurements of the same specimens (identified by `id_measurement`); thus we can use the dataset to approximate the real and erroneous variability in the material properties.

For the rest of the exercise, we will consider the following subset of the data.


In [None]:
## NOTE: No need to edit
df_sub = (
    df_data
    >> gr.tf_filter(
        DF.t == 0.250,
        DF.id_machine != "C",
        DF.id_measurement != "h",
    )
)

### __q7__ Estimate the real variability

Identify the column in `df_sub` that groups together multiple measurements of the same quantity.

The code below applies the *mean heuristic* ([e-stat02-source](https://zdelrosario.github.io/evc-course/exercises_solution/d08-e-stat02-source-solution.html#heuristics)) to produce a more stable measurement of `E`, then computes the variance across these more stable measurements.


In [None]:
## TASK: Apply the Mean Heuristic to group by the appropriate variable
df_var_mfg = (
    df_sub
# task-begin
    ## TODO: Group by the appropriate variable
# task-end
# solution-begin
    >> gr.tf_group_by(DF.id_specimen)
# solution-end
    >> gr.tf_summarize(E=gr.mean(DF.E))
    >> gr.tf_ungroup()
    >> gr.tf_summarize(E_var_mfg=gr.var(DF.E))
)

## NOTE: No need to edit; use this to check your work
assert \
    abs(df_var_mfg.E_var_mfg[0] - 109670.635716) < 1e-6, \
    "Incorrect variance; make sure you grouped by the correct variable"

df_var_mfg

### __q8__ Estimate the measurement variability

Estimate the measurement variability in `E` by taking the variance within each specimen. Average these variances to produce a more stable estimate `E_var_meas`. Answer the questions under *observations* below.


In [None]:
## TASK: Estimate the measurement variability
df_var_meas = (
    df_sub
# solution-begin
    >> gr.tf_group_by(DF.id_specimen)
    >> gr.tf_summarize(E_var=gr.var(DF.E))
    >> gr.tf_ungroup()
    >> gr.tf_summarize(E_var_meas=gr.mean(DF.E_var))
# solution-end
)

## NOTE: No need to edit; use this to check your work
assert \
    abs(df_var_meas.iloc[0, 0] - 168382.91387) < 1e-6, \
    "Incorrect variance; make sure you grouped by the correct variable"

df_var_meas

*Observations*

<!-- task-begin -->
- How do `E_var_mfg` (previous task) and `E_var_meas` compare?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- How do `E_var_mfg` (previous task) and `E_var_meas` compare?
  - `E_var_mfg ~= 109670` is a bit smaller than `E_var_meas ~= 168383`
<!-- solution-end -->


The following code applies the mean heuristic to get more stable measurements for `E` and `mu`.


In [None]:
## NOTE: No need to edit
df_real = (
    df_sub
    >> gr.tf_group_by(DF.id_specimen)
    >> gr.tf_summarize(
        mu=gr.mean(DF.mu),
        E=gr.mean(DF.E),
    )
    >> gr.tf_ungroup()
)

# Model the Variability

Now you will construct a model for the variability in `E` and `mu`. You'll use this to assess the structural safety of a plate subject to compressive (buckling) loads.


### __q9__ Assess dependency of `E` and `mu`

Assess the dependency between `E` and `mu`. Answer the questions under *observations* below.

*Hint*: There are many ways to do this!


In [None]:
## TASK: Assess the dependency between E and mu
(
    df_sub
# solution-begin
    >> gr.ggplot(gr.aes("E", "mu"))
    + gr.geom_point()
# solution-end
)

*Observations*

<!-- task-begin -->
- What---if any---dependency do `E` and `mu` exhibit?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- What---if any---dependency do `E` and `mu` exhibit?
  - `E` and `mu` do not seem to be correlated: no dependency
<!-- solution-end -->

The following code implements the buckling plate model; you wrote code like this in c03-stang.


In [None]:
## NOTE: No need to edit
md_plate = (
    gr.Model("Plate critical buckling stress")
    >> gr.cp_vec_function(
        fun=lambda df: gr.df_make(
            k_cr=(df.m * df.b / df.a + df.a / df.m / df.b)**2
        ),
        var=["a", "b", "m"],
        out=["k_cr"],
        name="Shape factor",
    )
    >> gr.cp_vec_function(
        fun=lambda df: gr.df_make(
            sigma_cr=df.k_cr * (3.14**3/12) * df.E*1e3 / (1 - df.mu**2)
                    *(df.t / df.b)**2
        ),
        var=["k_cr", "E", "mu", "t", "b"],
        out=["sigma_cr"],
        name="Buckling stress",
    )
    >> gr.cp_vec_function(
        fun=lambda df: gr.df_make(
            g_buckle=df.sigma_cr - 2e5 / df.b / df.t,
        ),
        var=["sigma_cr", "b", "t"],
        out=["g_buckle"],
        name="Limit state: Buckling",
    )
)

md_plate

### __q10__ Fit a distribution with all observations

Fit a distribution for the inputs `E` and `mu` to `md_plate` using the filtered dataset `df_sub`. Make sure to follow the proper modeling process. Add as many code cells as you need for this task. Answer the questions under *Observations* below.


In [None]:
# solution-begin
mg_E = gr.marg_fit("lognorm", df_sub.E, floc=0)

(
    df_sub
    >> gr.tf_mutate(
        q_E=gr.qqvals(DF.E, marg=mg_E),
    )
    
    >> gr.ggplot(gr.aes("q_E", "E"))
    + gr.geom_abline(intercept=0, slope=1, linetype="dashed")
    + gr.geom_point()
)
# solution-end

In [None]:
# solution-begin
mg_mu = gr.marg_fit("lognorm", df_sub.mu, floc=0)

(
    df_sub
    >> gr.tf_mutate(
        q_mu=gr.qqvals(DF.mu, marg=mg_mu),
    )
    
    >> gr.ggplot(gr.aes("q_mu", "mu"))
    + gr.geom_abline(intercept=0, slope=1, linetype="dashed")
    + gr.geom_point()
)
# solution-end

In [None]:
## TASK: Add your distribution model to `md_total`
md_total = (
    md_plate
# solution-begin
    >> gr.cp_marginals(
        E=gr.marg_fit("lognorm", df_sub.E, floc=0),
        mu=gr.marg_fit("lognorm", df_sub.mu, floc=0),
    )
    >> gr.cp_copula_independence()
# solution-end
)
md_total

*Observations*

<!-- task-begin -->
- What assumptions / choices did you make in your model?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- What assumptions / choices did you make in your model?
  - I assumed a "lognorm" distribution for both `E` and `mu`, and assumed independence between the two variables.
<!-- solution-end -->

### __q11__ Fit a distribution with averaged observations

Re-fit the same model you defined above, but use the averaged observations `df_real` instead.


In [None]:
## TASK: Add your distribution model to `md_real`, using `df_real` in the fit
md_real = (
    md_plate
# solution-begin
    >> gr.cp_marginals(
        E=gr.marg_fit("lognorm", df_real.E, floc=0),
        mu=gr.marg_fit("lognorm", df_real.mu, floc=0),
    )
    >> gr.cp_copula_independence()
# solution-end
)
md_real

# Design Under Uncertainty

Now that we've built a couple models, we can use them to assess the structural safety of plate designs. We'll start by assessing the safety of a baseline design.


In [None]:
## NOTE: No need to edit this
df_baseline = gr.df_make(t=0.25, a=12.0, b=9.0, m=1)


### __q12__ Assess a baseline design

Assess the probability of failure $\text{pof} = \mathbb{P}[g \leq 0]$ according to both `md_total` and `md_baseline`. Answer the questions under *observations* below.


In [None]:
## TASK: Assess the probability of failure
df_baseline_total = (
    md_total
# solution-begin
    >> gr.ev_sample(n=1e3, df_det=df_baseline)
    >> gr.tf_summarize(
        pof_lo=gr.pr_lo(DF.g_buckle <= 0),
        pof=gr.pr(DF.g_buckle <= 0),
        pof_up=gr.pr_up(DF.g_buckle <= 0),
    )
# solution-end
)

## NOTE: Use this to check your work
assert \
    isinstance(df_baseline_total, pd.DataFrame), \
    "df_baseline_total is not a DataFrame; make sure to evaluate the model"
assert \
    "pof_lo" in df_baseline_total.columns, \
    "df_baseline_total does not have a pof_lo column; make sure to include a lower CI end"
assert \
    "pof_up" in df_baseline_total.columns, \
    "df_baseline_total does not have a pof_up column; make sure to include a lower CI end"


In [None]:
## TASK: Assess the probability of failure
df_baseline_real = (
    md_real
# solution-begin
    >> gr.ev_sample(n=1e3, df_det=df_baseline)
    >> gr.tf_summarize(
        pof_lo=gr.pr_lo(DF.g_buckle <= 0),
        pof=gr.pr(DF.g_buckle <= 0),
        pof_up=gr.pr_up(DF.g_buckle <= 0),
    )
# solution-end
)

## NOTE: Use this to check your work
assert \
    isinstance(df_baseline_real, pd.DataFrame), \
    "df_baseline_real is not a DataFrame; make sure to evaluate the model"
assert \
    "pof_lo" in df_baseline_real.columns, \
    "df_baseline_real does not have a pof_lo column; make sure to include a lower CI end"
assert \
    "pof_up" in df_baseline_real.columns, \
    "df_baseline_real does not have a pof_up column; make sure to include a lower CI end"


In [None]:
## NOTE: No need to edit; use this to check your work
(
    df_baseline_total
    >> gr.tf_mutate(model="Total")
    >> gr.tf_bind_rows(
        df_baseline_real
        >> gr.tf_mutate(model="Real")
    )
    
    >> gr.ggplot(gr.aes("model", "pof"))
    + gr.geom_hline(yintercept=0.01, linetype="dashed")
    + gr.geom_errorbar(gr.aes(ymin="pof_lo", ymax="pof_up"))
    + gr.geom_point()
)

*Observations*

<!-- task-begin -->
- According to the `Total` model, does the baseline design meet the desired criteria of `pof < 0.01` (dashed line)?
  - (Your response here)
- According to the `Real` model, does the baseline design meet the desired criteria of `pof < 0.01` (dashed line)?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- According to the `Total` model, does the baseline design meet the desired criteria of `pof < 0.01` (dashed line)?
  - No; the `Total` model gives a failure rate somewhere between `0.04` and `0.08`.
- According to the `Real` model, does the baseline design meet the desired criteria of `pof < 0.01` (dashed line)?
  - Likely no; at a sample size of `n=1e3` the confidence interval for the `Real` model includes the target of `0.01`, but based on the evidence, it is possible that `pof > 0.01`.
<!-- solution-end -->


### __q13__ Adjust the design

Adjust the thickness of the plate to *confidently* achieve `pof < 0.01`. Repeat this process for both the `Total` and `Real` models. Answer the questions under *observations* below.


In [None]:
## TASK: Adjust the design
df_revised_total = gr.df_make(
    ## TODO: Adjust the thickness to modify the design
# task-begin
    # t=0.25,
# task-end
# solution-begin
    t=0.255,
# solution-end
    ## NOTE: Do not edit the following values
    a=12.0, b=9.0, m=1
)

## NOTE: No need to edit
df_revised_total = (
    md_total
    >> gr.ev_sample(n=1e4, df_det=df_revised_total)
    >> gr.tf_summarize(
        pof_lo=gr.pr_lo(DF.g_buckle <= 0),
        pof=gr.pr(DF.g_buckle <= 0),
        pof_up=gr.pr_up(DF.g_buckle <= 0),
    )
)

df_revised_total


In [None]:
## TASK: Adjust the design
df_revised_real = gr.df_make(
    ## TODO: Adjust the thickness to modify the design
# task-begin
    # t=0.25,
# task-end
# solution-begin
    t=0.251,
# solution-end
    ## NOTE: Do not edit the following values
    a=12.0, b=9.0, m=1
)

## NOTE: No need to edit
df_revised_real = (
    md_real
    >> gr.ev_sample(n=1e4, df_det=df_revised_real)
    >> gr.tf_summarize(
        pof_lo=gr.pr_lo(DF.g_buckle <= 0),
        pof=gr.pr(DF.g_buckle <= 0),
        pof_up=gr.pr_up(DF.g_buckle <= 0),
    )
)

df_revised_real


In [None]:
## NOTE: No need to edit; the following visual will help you assess the results
(
    df_revised_total
    >> gr.tf_mutate(model="Total")
    >> gr.tf_bind_rows(
        df_revised_real
        >> gr.tf_mutate(model="Real")
    )
    
    >> gr.ggplot(gr.aes("model", "pof"))
    + gr.geom_hline(yintercept=0.01, linetype="dashed")
    + gr.geom_errorbar(gr.aes(ymin="pof_lo", ymax="pof_up"))
    + gr.geom_point()
)

*Observations*

<!-- task-begin -->
- What needs to be the case with your results to *confidently* conclude that `pof < 0.01`?
  - (Your response here)
- What thickness is necessary to confidently achieve `pof < 0.01` with the `Total` model?
  - (Your response here)
- What thickness is necessary to confidently achieve `pof < 0.01` with the `Real` model?
  - (Your response here)
- Suppose the plate manufacturing process can only achieve tolerances to within `0.01` in of thickness. Does distinguishing between real and total variability have a large consequence in this case?
  - (Your response here)
- Now suppose the plate manufacturing process can achieve tolerances to within `0.001` in of thickness, and weight is a major concern. Does distinguishing between real and total variability have a large consequence in this case?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- What needs to be the case with your results to *confidently* conclude that `pof < 0.01`?
  - The confidence interval for the POF needs to be entirely below `0.01`.
- What thickness is necessary to confidently achieve `pof < 0.01` with the `Total` model?
  - I find `t=0.255`.
- What thickness is necessary to confidently achieve `pof < 0.01` with the `Real` model?
  - I find `t=0.251`.
- Suppose the plate manufacturing process can only achieve tolerances to within `0.01` in of thickness. Does distinguishing between the `Total` and `Real` model assumptions have a large consequence in this case?
  - No; both model assumptions would require `t=0.26` to confidently achieve `pof < 0.01.`
- Now suppose the plate manufacturing process can achieve tolerances to within `0.001` in of thickness, and weight is a major concern. Does distinguishing between the `Total` and `Real` model assumptions have a large consequence in this case?
  - Yes; under the `Real` model we can shave off a bit more weight, leading to a higher-performance design that still achieves the `pof < 0.01` constraint.
<!-- solution-end -->
