# Environment Validation

This is a sample assignment notebook for validating that Otter works in your environment.

To test your environment, run

```console
otter assign env-validation.ipynb dist -v
```

You should see a directory called `dist` created with the autograder and student subdirectories and a `data.csv` file in each. You should also see an autograder zip file in the autograder directory. You should also see in the output of the `otter assign` command that the tests were run on the notebook and they succeded.

In [12]:
import pandas as pd
import random

**Question 1:** Fill in the `hailstone` function below which returns the hailstone sequence of the number `n`.

In [4]:
def hailstone(n):
    # BEGIN SOLUTION
    l = [n]
    while n != 1:
        if n % 2 == 0: n //= 2
        else: n = 3 * n + 1
        l.append(n)
    return l
    # END SOLUTION

hailstone(7)

[7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]

In [5]:
assert hailstone(1) == [1]

In [6]:
assert hailstone(2) == [2, 1]

In [7]:
assert hailstone(3) == [3, 10, 5, 16, 8, 4, 2, 1]

In [8]:
# HIDDEN
assert hailstone(7) == [7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]

**Question 2.** Evaluate the integral:

$$
\int x e^x \, \mathrm dx
$$

$$
\begin{aligned}
\int x e^x \, \mathrm dx &= \texttt{YOUR MATH HERE} \\
\end{aligned}
$$

$$
\begin{aligned}
\int x e^x \, \mathrm dx &= x e^x - \int e^x \, \mathrm dx \\
&= x e^x - e^x + C \\
&= e^x (x - 1) + C \\
\end{aligned}
$$

**Question 3:** Load the data from `data.csv` and perform an A/B test to determine whether the population in condition `A` has a statistically significant difference in `val` rfom the population in condition `B`. 

In [58]:
random.seed(42) # IGNORE
df = pd.read_csv("data.csv")

def test_stat(df, cond_col):
    """
    Calculate the test state on a dataframe. Assumes that the 
    """
    return abs(df[df[cond_col] == "A"]["val"].mean() - df[df[cond_col] == "B"]["val"].mean())

observed = test_stat(df, "cond") # SOLUTION

stats = []
for _ in range(1000):
    df["shuffled_cond"] = random.sample(list(df["cond"]), df.shape[0]) # SOLUTION
    stats.append(test_stat(df, "shuffled_cond")) # SOLUTION

p_value = sum(s >= observed for s in stats) / len(stats) # SOLUTION
print(f"A/B test p-value: {p_value}")

A/B test p-value: 0.003


In [59]:
assert len(stats) == 1000

In [60]:
assert 0 <= p_value <= 1

In [61]:
# HIDDEN
import math
assert math.isclose(observed, 0.45037490465339625)

In [62]:
# HIDDEN
import math
assert math.isclose(p_value, .003)