# c02-michelson

*Purpose*: When studying physical problems, there is an important distinction between *error* and *uncertainty*. The primary purpose of this challenge is to dip our toes into these factors by analyzing a real dataset.

*Reading*: [Experimental Determination of the Velocity of Light](https://play.google.com/books/reader?id=343nAAAAMAAJ&hl=en&pg=GBS.PA115) (Optional)


## Setup

In [None]:
import grama as gr
import pandas as pd
DF = gr.Intention()
%matplotlib inline

# For downloading data
import os
import requests

The following code downloads the data from a public Google sheet.

*Note*: Feel free to adapt this code for your own projects!


In [None]:
# Filename for local data
filename_data = "data/michelson.csv"

# The following code downloads the data, or (after downloaded)
# loads the data from a cached CSV on your machine
if not os.path.exists(filename_data):
    # Make request for data
    url_data = "https://docs.google.com/spreadsheets/d/1av_SXn4j0-4Rk0mQFik3LLr-uf0YdA06i3ugE6n-Zdo/export?format=csv"
    r = requests.get(url_data, allow_redirects=True)
    open(filename_data, 'wb').write(r.content)
    print("   Data downloaded from public Google sheet")
else:
    # Note data already exists
    print("   Data loaded locally")
    
# Read the data into memory
df_michelson = (
    pd.read_csv(filename_data)
    >> gr.tf_select("Date", "Distinctness", "Temp", "Velocity")
)

## Background

In 1879 Albert Michelson led an experimental campaign to measure the speed of light [1]. His approach was a development upon the method of Foucault[3], and resulted in a new estimate of $v_0 = 299944 \pm 51$ kilometers per second (in a vacuum). This is very close to the modern *exact* value of `r LIGHTSPEED_VACUUM`. In this challenge, you will analyze Michelson's original data, and explore some of the factors associated with his experiment.

I've already copied Michelson's data from his 1880 publication; the code chunk below will load these data from a public googlesheet.

*Aside*: The speed of light is *exact* (there is **zero error** in the value `LIGHTSPEED_VACUUM`) because the meter is actually [*defined*](https://en.wikipedia.org/wiki/Metre#Speed_of_light_definition) in terms of the speed of light!


In [None]:
## NOTE: Don't edit; these are constants used in the challenge
LIGHTSPEED_VACUUM    = 299792.458 # Exact speed of light in a vacuum (km / s)
LIGHTSPEED_MICHELSON = 299944.00  # Michelson's speed estimate (km / s)
LIGHTSPEED_PM        = 51         # Michelson error estimate (km / s)


## Data dictionary

- `Date`: Date of measurement
- `Distinctness`: Distinctness of measured images: 3 = good, 2 = fair, 1 = poor
- `Temp`: Ambient temperature (Fahrenheit)
- `Velocity`: Measured speed of light (km / s)


# First Look


### __q1__ Recreate this table

Re-create the following table (from Michelson (1880), pg. 139) using `df_michelson` and `grama`. Note that your values *will not* match those of Michelson *exactly*; why might this be?

| Distinctness | n  | MeanVelocity |
|--------------|----|----------|
|            3 | 46 |   299860 |
|            2 | 39 |   299860 |
|            1 | 15 |   299810 |

*Hint*: The helper `gr.n()` may be helpful in this task.


In [None]:
## TASK: Re-create the table above using df_michelson
df_q1 = (
    df_michelson
# solution-begin
    >> gr.tf_group_by(DF.Distinctness)
    >> gr.tf_summarize(
        n=gr.n(),
        MeanVelocity=gr.mean(DF.Velocity),
    )
    >> gr.tf_ungroup()
# solution-end
)

## NOTE: No need to edit below here
(
    df_q1
    >> gr.tf_arrange(gr.desc(DF.Distinctness))
)

*Observations*

<!-- task-begin -->
- Write your observations here!
  - (Your response here)
- Why might your table differ from Michelson's?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- Write your observations here!
  - My values for `Distinctness == 1` are a bit lower than the others
- Why might your table differ from Michelson's?
  - Michelson's table seems to be rounded to the nearest 10'th
<!-- solution-end -->


The `Velocity` values in the dataset are the speed of light *in air*; Michelson introduced a couple of adjustments to estimate the speed of light in a vacuum. In total, he added $+92$ km/s to his mean estimate for `VelocityVacuum` (from Michelson (1880), pg. 141). While the following isn't fully rigorous ($+92$ km/s is based on the mean temperature), we'll simply apply this correction to all the observations in the dataset.


### __q2__ Adjust the velocity values

Create a new variable `VelocityVacuum` with the $+92$ km/s adjustment to `Velocity`. Assign this new DataFrame to `df_q2`.


In [None]:
## TASK: Adjust the data, assign to df_q2
# task-begin
df_q2 = None
# task-end
# solution-begin
df_q2 = (
    df_michelson
    >> gr.tf_mutate(VelocityVacuum=DF.Velocity + 92)
)
# solution-end

df_q2

# Deeper Look

## True values vs estimates

As part of his study, Michelson assessed the various potential sources of error, and provided his best-guess for the error in his speed-of-light estimate. These values are provided in `LIGHTSPEED_MICHELSON`---his nominal estimate---and `LIGHTSPEED_PM`---plus/minus bounds on his estimate. Put differently, Michelson believed the true value of the speed-of-light probably lay between `LIGHTSPEED_MICHELSON - LIGHTSPEED_PM` and `LIGHTSPEED_MICHELSON + LIGHTSPEED_PM`.

Let's introduce some terminology:

- **True Error** refers to the difference between a true value and an estimate of that value; for instance `LIGHTSPEED_VACUUM - LIGHTSPEED_MICHELSON`.
- **Estimated Error** is an analyst's *assessment* of the error.

Note that, in order to compute the true error **we have to know the true value**. Since a "true" value is often not known in practice, one generally does not know the error. The best they can do is quantify their degree of uncertainty. We will learn some means of quantifying uncertainty in this class, but for many real problems uncertainty includes some amount of human judgment [2].

However, this scenario is special: We have an *accepted* speed-of-light value that we can treat as the true value. Thus, we have the means to compare Michelson's estimated error against the true error.


### __q3__ Compare true error and estimated error

Compare Michelson's speed of light estimate `LIGHTSPEED_MICHELSON` against the modern speed of light value `LIGHTSPEED_VACUUM`. Is Michelson's estimated error (`LIGHTSPEED_PM`) greater or less than the true error?


In [None]:
## TASK: Compare Michelson's estimate and error against the true value
# task-begin
# Write your code here
# task-end
# solution-begin
## Compare the raw values
print(gr.df_make(
    lower=LIGHTSPEED_MICHELSON - LIGHTSPEED_PM,
    estimate=LIGHTSPEED_MICHELSON,
    true=LIGHTSPEED_VACUUM,
    upper=LIGHTSPEED_MICHELSON + LIGHTSPEED_PM,
))
## Compare estimated and true error
print(gr.df_make(
    err_estimated=LIGHTSPEED_PM,
    err_true=LIGHTSPEED_VACUUM - LIGHTSPEED_MICHELSON,
    err_abs=abs(LIGHTSPEED_VACUUM - LIGHTSPEED_MICHELSON),
))
# solution-end

*Observations*

<!-- task-begin -->
- Is Michelson's estimate of the error greater or less than the true error? How would you describe Michelson's *confidence* in his estimate? (Was he overconfident? Underconfident?)
  - (Your response here)
- Make a quantitative comparison between Michelson's uncertainty and his error.
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- Is Michelson's estimate of the error greater or less than the true error? How would you describe Michelson's *confidence* in his estimate? (Was he overconfident? Underconfident?)
  - The true value does **not** lie between the proposed bounds that Michelson provided. Therefore his estimate for the error is overconfident.
- Make a quantitative comparison between Michelson's uncertainty and his error.
  - The true error level is about three times that of Michelson's assessed error.
<!-- solution-end -->

# Searching for patterns

When there are errors in our conclusions from data, we should look for an *explanation* for those errors. Michelson's estimate for the speed-of-light is in error. In this last part of the challenge, you'll search for explanations.


### __q4__ Search for patterns

You have access to a few other variables in `df_q2`. Construct *at least three (3) visualizations* of `VelocityVacuum` against these other factors. Are there other patterns in the data that might help explain the difference between Michelson's estimate and `LIGHTSPEED_VACUUM`?


In [None]:
## TASK: Your visual (1)
(
    df_q2
# task-begin
## Construct your visual here
# task-end
# solution-begin
    >> gr.ggplot(gr.aes("Temp", "VelocityVacuum"))
    + gr.geom_count()
    + gr.geom_hline(yintercept=LIGHTSPEED_VACUUM)
# solution-end
)

*Observations*

<!-- task-begin -->
- (Your observations from your 1st plot here)
<!-- task-end -->
<!-- solution-begin -->
- There's no clear pattern in this presentation of the data.
<!-- solution-end -->

In [None]:
## TASK: Your visual (2)
(
    df_q2
# task-begin
## Construct your visual here
# task-end
# solution-begin
    >> gr.ggplot(gr.aes("Distinctness", "VelocityVacuum"))
    + gr.geom_boxplot(gr.aes(group="Distinctness"))
    + gr.geom_hline(yintercept=LIGHTSPEED_VACUUM)
# solution-end
)

*Observations*

<!-- task-begin -->
- (Your observations from your 2nd plot here)
<!-- task-end -->
<!-- solution-begin -->
- There's a weak pattern in `Distinctness`, but it does not seem to be significant
<!-- solution-end -->

In [None]:
## TASK: Your visual (3)
(
    df_q2
# task-begin
## Construct your visual here
# task-end
# solution-begin
    >> gr.ggplot(gr.aes("Date", "VelocityVacuum"))
    + gr.geom_hline(yintercept=LIGHTSPEED_VACUUM)
    + gr.geom_count()
    + gr.theme(axis_text_x=gr.element_text(angle=270))
# solution-end
)

*Observations*

<!-- task-begin -->
- (Your observations from your 3rd plot here)
<!-- task-end -->
<!-- solution-begin -->
- There's no clear pattern in these data
<!-- solution-end -->

## A First Look at Control Charts

In a future exercise (`e-stat06-spc`) we'll learn about *control charts*: A control chart is a data analysis tool that helps us find *suspicious patterns* in a dataset. A control chart doesn't tell us *why* something happened; rather, it tells us which observations might be important to investigate further.

Control charts look for patterns in *groups* of observations (not single observations); this helps us to pick out meaningful patterns while protecting ourselves against randomness (spurious patterns).


### __q5__ Inspect this control chart

Inspect the following control chart: It considers the speed-of-light measurements in groups of `8`. Answer the questions under *observations* below.

*Note*: You'll probably see a red-highlighted `Warning` when you run the code below; that's OK, so long as the plot shows!


In [None]:
## NOTE: No need to edit, run and inspect
(
    df_q2
    >> gr.tf_mutate(idx=DF.index // 8)
    >> gr.pt_xbs(group="idx", var="VelocityVacuum")
)

*Observations*

<!-- task-begin -->
- What patterns do you see in the control chart? Where do these patterns occur? (By group number `idx`)
  - (Your response here)
- In which group of observations would you want to get more information?
  - (Your response here)
<!-- task-end -->
<!-- solution-begin -->
- What patterns do you see in the control chart? Where do these patterns occur? (By group number `idx`)
  - The variability is `Above Limit` at `idx == 2` and the mean is `Above Limit` at `idx == 3`.
- In which group of observations would you want to get more information?
  - Groups `idx == 2` and `idx == 3`.
<!-- solution-end -->

# References

- [1] Michelson, [Experimental Determination of the Velocity of Light](https://play.google.com/books/reader?id=343nAAAAMAAJ&hl=en&pg=GBS.PA115) (1880)
- [2] Henrion and Fischhoff, [Assessing Uncertainty in Physical Constants](https://www.cmu.edu/epp/people/faculty/research/Fischoff-Henrion-Assessing%20uncertainty%20in%20physical%20constants.pdf) (1986)
- [3] BYU video about a [Fizeau-Foucault apparatus](https://www.youtube.com/watch?v=Ik5ORaaeaME), similar to what Michelson used.
