### Pair Problem

In today's pair problem, we will be simulating drawing from a  _binomial distribution_. The binomial distribution is used to describe cases where we undergo a fixed number of trials, each trial has two possible outcomes, the probability of the outcomes does not change from trial to trial, and the trials are independent. The typical example of a binomial distribution problem is flipping a coin a fixed number of times. A less common example would be checking the number of parachutes that failed to open in a test batch at a parachute company.

For a binomial distribution problem, we normally phrase our questions as:
"If I do `N` trials, and each trial has a probability `prob` of succeeding, what is the probability I get X as a result?"
If you can answer this question, you can calculate the mean, variance, and other quantities of interest.

We have written a function for you that will make a sample. To use it, import it in your code as follows
```python
from generate_sample import get_sample_success

# to use it to generate the number of heads you would get flipping a coin 10 times
get_sample_success(0.5, 10)  # might return 4
get_sample_success(0.5, 10)  # might return 8

# to simulate counting the number of 6s you get after rolling a 1d6 15 times
get_sample_success(1/6.0, 15)  # might return 3
```

For many processes, such as testing whether parachutes open, we don't know the probability `p` beforehand. Instead, we know the data we have collected, and we need to infer the probability `p` from our data.

#### Your problem

We are trying to improve the proportion of defects we have when manufacturing Banana cell phones. We have some data on two different processes we want to try and use, which we call `process0` (our current process) and `process1` (the new process). It is expensive to switch to `process1` unless we are reasonably sure it makes a substaintial improvement in the rates. We will make a small production line at one factory, a run a batch of size `N` through both processes. We are employing you to help us scope out how large a batch `N` we need.

1) Suppose `p0 = 0.05` and `p1 = 0.03` (i.e `p1` is better), and we make 1,000 phones through each process. Simulate this 10,000 times and tell us in how many of those simulations `p0` ends up with fewer defects than `p1` (i.e. how many times out of this 10,000 simulations did we get the wrong result)?

2) Suppose `p0 = 0.05` and `p1 = 0.04` (i.e. `p1` is better, but less so) and we make 1,000 phones through each process. Simulating 10,000 times: how many simulations did `p0` end up with fewer defects than `p1`? How does this compare to the previous result?

3) Suppose `p0 = 0.05` and `p1 = 0.04` and we make 20,000 phones through each process. Simulating 10,000 times, what proportion of simulations did we end up with the wrong answer (i.e. claiming that we should stick with `p0`?)

4) We think that the differences are probably `p0 = 0.05` and `p1 = 0.048`. How many phones do we need to put in the batch to make sure the probability of making the wrong call is less than 1%?

In [4]:
from generate_sample import get_sample_success
from scipy.stats import binom

import random

In [14]:
# For Coin FLipping
print('----------')
print("The following represent a binomial distribution:")
print('----------')

print("# of Heads After 10 Trials:")
print('----------')
for _ in range(5):
    print(get_sample_success(0.5, 10))  # Number of heads we get after 10 trials from flipping a FAIR COIN (p=0.5)
    
print('----------')

print("# of 6's After 15 Trials:")
print('----------')
for _ in range(5):
    print(get_sample_success(1/6, 15)) # No. of 6's you get from rolling a dice 15 times

----------
The following represent a binomial distribution:
----------
# of Heads After 10 Trials:
----------
8.0
5.0
5.0
3.0
5.0
----------
# of 6's After 15 Trials:
----------
5.0
5.0
0.0
3.0
4.0


## Class Solution

In [15]:
# To test if a new process is more viable than the current one

#### Your problem

We are trying to improve the proportion of defects we have when manufacturing Banana cell phones. We have some data on two different processes we want to try and use, which we call `process0` (our current process) and `process1` (the new process). It is expensive to switch to `process1` unless we are reasonably sure it makes a substaintial improvement in the rates. We will make a small production line at one factory, a run a batch of size `N` through both processes. We are employing you to help us scope out how large a batch `N` we need.

1) Suppose `p0 = 0.05` and `p1 = 0.03` (i.e `p1` is better), and we make 1,000 phones through each process. Simulate this 10,000 times and tell us in how many of those simulations `p0` ends up with fewer defects than `p1` (i.e. how many times out of this 10,000 simulations did we get the wrong result)?

2) Suppose `p0 = 0.05` and `p1 = 0.04` (i.e. `p1` is better, but less so) and we make 1,000 phones through each process. Simulating 10,000 times: how many simulations did `p0` end up with fewer defects than `p1`? How does this compare to the previous result?

3) Suppose `p0 = 0.05` and `p1 = 0.04` and we make 20,000 phones through each process. Simulating 10,000 times, what proportion of simulations did we end up with the wrong answer (i.e. claiming that we should stick with `p0`?)

4) We think that the differences are probably `p0 = 0.05` and `p1 = 0.048`. How many phones do we need to put in the batch to make sure the probability of making the wrong call is less than 1%?   (we can't keep experimenting foreveR!!)

In [None]:
# p0 = 0.05 (probability of making a defect phone on the first process)
# p1 = 0.03 (probability of mkaing a defect phone on the second process)

In [None]:
def generate_sample(p_success, n_trials):
    return np.array([get_sample_success(p_success, sample_size) for _ in range(n_trials)])

In [24]:
# Case 1
# p0= 0.05, p1 = 0.03, N=1000

process0 = float(get_sample_success(0.05, 1000))
process1 = float(get_sample_success(0.03, 1000))

print(process0, process1)
print("Case 1: Number of cases where we made a mistake: ", sum(process0 < process1))

53.0 29.0


TypeError: 'bool' object is not iterable

In [None]:
# Case 2
process1 = get_sample_success(0.04, 1000)
print("Case 2: Number of cases where we made a mistake: ", sum(process0 < process1))

In [22]:
# Case 3
# p0= 0.05, p1 = 0.03, N=20000
process0 = get_sample_success(0.05, 20000)
process1 = get_sample_success(0.04, 20000)
print("Case 2: Number of cases where we made a mistake: ", sum(process0 < process1))

TypeError: 'numpy.bool_' object is not iterable