### 1. Let $W_1$ be your net winnings on a single spin of a roulette wheel when you bet $1 on a single number. This bet pays 35 to 1, meaning that for each dollar you bet, you win $35 if the ball lands on that number and lose $1 otherwise. We calculated the p.m.f., expected value, and variance of $W_1$ in Examples 22.1 and 28.1.

### Let $W_1, W_2, ... W_{10}$ be independent random variables with the same distribution as $W_1$

### Consider the random variables $X = 10 W_1$ and $Y = W_1 + W_2 + ... W_{10}$. Which one represents...

### - ...your net winnings if you bet $1 on that number on each of 10 spins of the roulette wheel?
### - ...your net winnings if you bet $10 on that number on a single spin of the roulette wheel?

### Calculate E[X], E[Y], Var[X], and Var[Y]. How do they compare?

- From 22.1, $E[X] = -0.053$
- From 28.1, $Var[X] = 33.21$

- X is the case where you bet $10 on a single spin, Y is the case where you put $1 on 10 spins

- X
    - $\begin{align}
        E[X] &= 10E[W_1] \\
        &= 10 * -0.053 \\
        &= -0.53
        \end{align}$

    - $\begin{align}
        Var[X] &= Var[10 \times W_1] \\
        &= 100 Var[W_1] \\
        &= 100 \times 33.21 \\
        &= 3321
        \end{align}$

- Y
    - $\begin{align}
        E[Y] &= E[W_1] + E[W_2] + ... E[W_{10}] \\
        &= 10 * -0.053 & \text{Linearity of Expectations} \\
        &= -0.53
        \end{align}$

    - $\begin{align}
        Var[Y] &= Var[W_1 + W_2 + ... W_{10}] \\
        &= Cov[W_1 + W_2 ... W_{10}, W_1 + W_2 ... W_{10}] \\
        &= \sum_{i=1}^{10} \sum_{j=1}^{10} Cov[W_i, W_j] \\
        &= \sum_{i=1}^{10} Var[W_i] \\
        &= 10 \times 33.21 \\
        &= 332.1
        \end{align}$

- Proof by simulation

In [18]:
import numpy as np
population=[-1,35]
X = []
Y = []

for _ in range(10_000):
    X.append(10 * np.sum(np.random.choice(population, size=1, replace=True, p=[37/38, 1/38])))
    Y.append(np.sum(np.random.choice(population, size=10, replace=True, p=[37/38, 1/38])))

print(np.mean(X))
print(np.var(X))

print(np.mean(Y))
print(np.var(Y))

0.584
3698.2189439999997
-0.4456
335.61584064


### 2. Consider the following three scenarios:

### - A fair coin is tossed 3 times. X is the number of heads and Y is the number of tails.
### - A fair coin is tossed 4 times. X is the number of heads in the first 3 tosses, Y is the number of heads in the last 3 tosses.
### - A fair coin is tossed 6 times. X is the number of heads in the first 3 tosses, Y is the number of heads in the last 3 tosses.

### Use properties of covariance to calculate Cov[X,Y] for each of these three scenarios. You should not need to use LOTUS or the shortcut formula for covariance.

### Hint 1: For the first scenario, write Y as a function of X
### Hint 2: For the second scenario, write X = A+B and Y=B+C, where A,B,C are independent random variables.

- Senario 1:
    - $\begin{align}
        Cov[X,Y] &= Cov[X, 3-X] \\
        &= Cov[X, 3] + Cov[X, -X] \\
        &= -Cov[X, X] \\
        &= -Var[X] \\
        &= - (3 \cdot 0.5 \cdot 0.5) \\
        &= - 0.75
        \end{align}$

- Senario 2:
    - Let A be first toss, B be middle 2 tosses, C be last toss
    - $\begin{align}
        Cov[X,Y] &= Cov[A+B, B+C] \\
        &= Cov[A,B] + Cov[A,C] + Cov[B,B] + Cov[B,C] \\
        &= Cov[B,B] \\
        &= Var[B] \\
        &= 2 * 0.5 * 0.5 \\
        &= 0.5
        \end{align}$

- Senario 3:
    - $\begin{align}
        Cov[X,Y] &= 0 & \text{because X and Y are independent}
        \end{align}$


### 3. A poker hand (5 cards) is dealt off the top of a well-shuffled deck of 52 cards. Let X be the number of diamonds in the hand. Let Y be the number of hearts in the hand.

### a. Do you think Cov[X,Y] is positive, negative, or zero? Explain.
### b. Let $D_i(i=1,..5)$ be a random variable that is 1 if the i-th card is a diamond and 0 otherwise. What is E[D_i]?
### c. Let $H_i(i=1,..5)$ be a random variable that is 1 if the i-th card is a heart and 0 otherwise. Of course, E[H_i] is the same as E[D_i], since there are the same number of hearts as diamonds in a 52-card deck. What is Cov[D_i, H_i]? What is Cov[D_i, H_j] when $i \neq j$? (Keep in mind that $D_i$ and $H_i$ are indicator random variables that only take on the values 0 or 1.) Hint: Make a table for the joint p.m.f. There are only 4 possible outcomes.
### d. Use your answers to parts b and c (and the properties of covariance, of course) to calculate Cov[X,Y]

- a.
    - Cov[X,Y] should be negative. The higher your value of X, the lower your value of Y must be 
    - i.e. If you have 5 diamonds, you must have 0 hearts

- b. 
    - $\begin{align}
        E[D_i] &= \sum_{d} d \cdot f_D(d) \\
        &= 1 * \frac{13}{52} \\
        &= \frac{13}{52} 
        \end{align}$

- c.
    - Since H_i and D_i cannot both be 1, $E[D_i H_i] = \sum_{d, h} d \cdot h \cdot f(d,h) = 0$
        - $\begin{align}
            Cov[D_i, H_i] &= E[D_i H_i] - E[D_i]E[H_i] \\
            &= - E[D_i]E[H_i] \\
            &= - \frac{13^2}{52^2}  \\
            \end{align}$

    - It is possible for both $D_i$ and $H_j$ to be 1, so $E[D_i H_j] = \sum_{d, h} 1 \cdot 1 \cdot f(d,h) = \frac{1^2}{4^2}$
        - $\begin{align}
            Cov[D_i, H_j] &= E[D_i H_j] - E[D_i]E[H_j] \\
            &= \frac{1^2}{4^2} - \frac{13^2}{52^2} \\
            &= 0
            \end{align}$

- d.
    - $\begin{align}
        Cov[X, Y] &= Cov[D_1 + D_2 + ... D_5, H_1 + H_2 + ... H_5] \\
        &= \sum_{i=1}^{5} Cov[D_i, H_i] + \sum_{i,j=1, i \neq j}^{5} Cov[D_i, H_j] \\
        &= 5 * -\frac{1}{16} \\
        &= -\frac{5}{16}
        \end{align}$

In [277]:
import numpy as np
population = ['D']*13 + ['C']*13 + ['H']*13 + ['S']*13
X = []
Y = []
d1 = []
h1 = []
h2 = []
d1h1 = []
d1h2 = []
for _ in range(10_000):
    sample=np.random.choice(population, 5, replace=False)
    X.append(len([x for x in sample if x=='D']))
    Y.append(len([x for x in sample if x=='H']))

    d1.append(sample[0] == 'D')
    h1.append(sample[0] == 'H')
    h2.append(sample[1] == 'H')

    d1h1.append((sample[0] == 'D') * (sample[0] == 'H'))
    d1h2.append((sample[0] == 'D') * (sample[1] == 'H'))

## Expectation of X and Y when X and Y are hypergeometric
print('='*50)
print('Expectation of X and Y when Hypergeometric(5, 13/52)')
print(f'X: {np.mean(X)} | Y: {np.mean(Y)} | n*p: {5 * 13/52}')

## Expectation of D_i * H_i 
print('='*50)
print('Expectation of E[D_i H_i]')
print(f'E[D_i . H_i]: {np.mean(d1h1)} --> 0 because position i cannot be both D and H')

## Expectation of D_i * H_j
print('='*50)
print('Expectation of E[D_i H_j]')
print(f'E[D_i . H_j]: {np.mean(d1h2)} | 1/16 = 0.0625')

## Expectation of D_i and H_j
print('='*50)
print('Expectation of D_i, H_i')
print(f'E[D_i]: {np.mean(d1)} | E[H_i]: {np.mean(h1)} | 13/52=0.25')

## Cov[D_i, H_i]
print('='*50)
print('Cov[D_i, H_i] = E[D_i . H_i] - E[D_i]E[H_i]')
print(f'E[D_i . H_i] - E[D_i]E[H_i]: {np.mean(d1h1) - np.mean(d1)*np.mean(h1)} | -1/16 = -0.0625')

## Cov[D_i, H_j]
print('='*50)
print('Cov[D_i, H_j] = E[D_i . H_j] - E[D_i]E[H_j]')
print(f'E[D_i . H_j] - E[D_i]E[H_j]: {np.mean(d1h2) - np.mean(d1)*np.mean(h2)} | 1/16 - 1/16 = 0')

## Cov[X,Y]
print('='*50)
print('Cov[X,Y] = Cov[(D1..D5), (H1...H5)] | -5/16 = -0.3125')
print(f'np.cov(X,Y): {np.cov(X,Y, ddof=1)[0,1]}')
EX = np.mean(X)
EY = np.mean(Y)
cov_manual=(X - EX) * (Y-EY)
print(f'np.mean(cov_manual)=np.mean((X - EX) * (Y-EY)): {np.mean(cov_manual)}')

Expectation of X and Y when Hypergeometric(5, 13/52)
X: 1.2527 | Y: 1.2637 | n*p: 1.25
Expectation of E[D_i H_i]
E[D_i . H_i]: 0.0 --> 0 because position i cannot be both D and H
Expectation of E[D_i H_j]
E[D_i . H_j]: 0.0664 | 1/16 = 0.0625
Expectation of D_i, H_i
E[D_i]: 0.2502 | E[H_i]: 0.2497 | 13/52=0.25
Cov[D_i, H_i] = E[D_i . H_i] - E[D_i]E[H_i]
E[D_i . H_i] - E[D_i]E[H_i]: -0.06247493999999999 | -1/16 = -0.0625
Cov[D_i, H_j] = E[D_i . H_j] - E[D_i]E[H_j]
E[D_i . H_j] - E[D_i]E[H_j]: 0.0022487199999999957 | 1/16 - 1/16 = 0
Cov[X,Y] = Cov[(D1..D5), (H1...H5)] | -5/16 = -0.3125
np.cov(X,Y): -0.28666565656565673
np.mean(cov_manual)=np.mean((X - EX) * (Y-EY)): -0.28663699


### 4. Recall the coupon collector problem from Lesson 26: McDonald’s decides to give a Pokemon toy with every Happy Meal. Each time you buy a Happy Meal, you are equally likely to get any one of the 6 types of Pokemon. Let X be the number of Happy Meals you have to buy until you “catch ’em all”.

### In that lesson, you calculated E[X] using linearity of expectation. Now, use properties of covariance to calculate Var[X]

- Recall that X is a sum of geometric variables X = P_1 + P_2 + ... P_6
    - Each variable P is a geometric random variable
    - But with every type you've collected, the probability of a success goes down
    - i.e. P_1 is Geometric(p=6/6), because you will only need 1 draw to collect your first type
    - i.e. P_2 is Geometric(p=5/6), because there are 5 remaining of the 6 that you don't already have
    - ...


- We know that for type i, the probability p is $(6-i-1)/6$
- $E[P_i] = \frac{1}{p_i} = \frac{6}{6-i+1}$
- $Var[P_i] = Cov[P_i, P_i] = \frac{1 - p_i}{p_i^2} = \frac{1 - \frac{6-i+1}{6}}{\frac{(6-i+1)^2}{6^2}}$

- $Cov[P_i, P_{j, j \neq i}] = 0$ because the draws for each type are independent geometric processes. Proof by simulation below

- We know that
    - $\begin{align}
        Var[X] &= Cov[X,X] \\
        &= Cov[P_1 + ... + P_6, P_1 + ... + P_6] \\
        &= \sum_{i=1}^{6} Cov[P_i, P_i] \\
        &= \sum_{i=1}^{6} \frac{1 - \frac{6-i+1}{6}}{\frac{(6-i+1)^2}{6^2}} \\
        &= 0 + \frac{1/6}{(5/6)^2} + \frac{2/6}{(4/6)^2} + \frac{3/6}{(3/6)^2} + \frac{4/6}{(2/6)^2} + \frac{5/6}{(1/6)^2} \\
        &\approx 38.99
        \end{align}$

In [345]:
import numpy as np
import scipy

n = 100_000
p1 = scipy.stats.geom.rvs(p=6/6, size=n)
p2 = scipy.stats.geom.rvs(p=5/6, size=n)
p3 = scipy.stats.geom.rvs(p=4/6, size=n)
p4 = scipy.stats.geom.rvs(p=3/6, size=n)
p5 = scipy.stats.geom.rvs(p=2/6, size=n)
p6 = scipy.stats.geom.rvs(p=1/6, size=n)
X = p1 + p2 + p3 + p4 + p5 + p6

def compute_cov(arr1, arr2: np.array) -> float:
    ep1=np.mean(arr1)
    ep2=np.mean(arr2)
    cov_manual = (arr1-ep1)*(arr2-ep2)
    return np.mean(cov_manual)

## All ~0
print(compute_cov(p1, p2), compute_cov(p2, p3), compute_cov(p3, p4), compute_cov(p4, p5), compute_cov(p5, p6))

print(f'Simulated Variance: {np.var(X)}')
print(f'Theoretical Variance: {((1/6) / (5/6)**2) +  ((2/6) / (4/6)**2) + ((3/6) / (3/6)**2) + ((4/6) / (2/6)**2) + ((5/6) / (1/6)**2)}')

0.0 0.0007152566000000052 0.000578798699999982 -0.010648123399999983 0.014744102599999894
Simulated Variance: 38.886338635899996
Theoretical Variance: 38.99


### 5. At Diablo Canyon nuclear plant, radioactive particles hit a Geiger counter according to a Poisson process with a rate of 3.5 particles per second. Let X be the number of particles detected in the first 2 seconds. Let Z be the number of particles detected in the first 3 seconds. Find Cov[X,Z]. Hint: Note that X and Z are not independent. However, you should be able to write Z=X+Y, where Y is a random variable that is independent of X

- Let Z = X+Y, where Y is the number of particles detected in time (2,3]
- $X \sim \text{Poisson}(7)$
- $Y \sim \text{Poisson}(3.5)$
- X and Y are independent

$\begin{align}
    Cov[X,Z] &= Cov[X, X+Y] \\
    &= Cov[X,X] + Cov[X,Y] \\
    &= Cov[X,X] \\
    &= Var[X] \\
    &= \mu_x \\ 
    &= 7
\end{align}$