# 0) Instructions:
Please complete the workbook below. Some of the calculations are already ready to be "run". However, please read the text carefully to find questions that you should answer for credit. Remember that to answer a question in text, you click "insert", then "insert cell below", switch the input from "code" to "markdown", type your answer, and finally click the run button to set your text in stone. In a few questions, you will need to do some calculations on your own. Hint: for these calculations you can copy, paste, and modify code that is above the calculation that you need to do. You may work by yourself or in groups of 2. Please remember to put your name on top, remember to save the workbook, and remember to upload to Canvas.

# 1) The binomial distribution:
The first probability distribution that we discussed in lecture was the Binomial distribution. This distribution has the following probability distribution function:
$P(y) = \frac{n!}{y!(n-y)!} \pi^y (1-\pi)^{n-y}$, where $n$ is the number of trials, $\pi$ is the probability of success on a single trial, $1-\pi$ is the probability of failure on a single trial, $y$ is the number of successes in $n$ trials and $n! = n \times (n-1) \times (n-2) \times ... \times 3 \times 2 \times 1$.

## 1.1) Example: 
Let's work on the problem 4.6 from the textbook as an example. It says: *Suppose an economics examination has 25 true-or-false questions and a passing grade is obtained with 17 or more correct answers. A student answers the 25 questions by flipping a fair coin and answering true if the coin shows a head and false if it shows a tail. Using the classical interpretation of probability, what is the chance the student will pass the exam?*.

This problem has $\pi = 0.5$ and $n = 25$. We want to know $P(Y \geq 17)$, which is the probability of having 17 or more correct answers (successes) out of the 25 questions (trials).

The probability of having 17 or more successes out of 25 trials is: $P(Y \geq 17) = P(y = 17) + P(y = 18) + P(y = 19) + P(y = 20) + P(y = 21) + P(y = 21) + P(y = 22) + P(y = 23) + P(y = 24) + P(y = 25)$. If you do not understand why this is please stop here, think about it, discuss it, or ask about it.

Lets use the bionomial formula to figure out the probability of getting exactly 20 questions right, that is: $P(y = 20)$. Below you will see the formula with the values plugged in and then code for the same. Note that $x!$ in code is "factorial(x)".

$p(y = 20) = \frac{25!}{20!(25-20)!} 0.5^{20} (1-0.5)^{25-20}$

In [4]:
(factorial(25) / (factorial(20) * factorial(25-20))) * (0.50^20) * (1 - 0.5)^(25-20)

So $P(y = 20) = 0.0015834$

**Question for you!** What is the probability of getting exactly 21 questions right? Please insert a new code cell below, and repeat the same calculation that was just done (copy, paste, and modify).

Now, rather than using those ugly formulas, we can use R code. Take a look at the code below:

In [6]:
dbinom(20, 25, .50)

Does this number look farmiliar? That's right, it is $P(y = 20)$ given that $n = 25$ and $\pi = .50$. "dbinom" is a way of asking for the probability density function for a bionmial distribution.

**Question for you!** Please calculate the probability of getting exactly 21 questions right using the "dbinom" code as was just done. Remeber to add a new code cell below, then copy, paste, and modify.

Okay, so now to calculate $P(Y \geq 17) = P(y = 17) + P(y = 18) + P(y = 19) + P(y = 20) + P(y = 21) + P(y = 21) + P(y = 22) + P(y = 23) + P(y = 24) + P(y = 25)$. We could calculate this as follows (please scroll):

In [8]:
dbinom(17, 25, .50) + dbinom(18, 25, .50) + dbinom(19, 25, .50) + dbinom(20, 25, .50) + dbinom(21, 25, .50) + dbinom(22, 25, .50) + dbinom(23, 25, .50) + dbinom(24, 25, .50) + dbinom(25, 25, .50)

Okay so we have calculated $P(y \geq 17)$, which in our case is the probability of getting at least 17 questions right on the exam with 25 questions on it. This is the probability of the student passing by flipping a coin to answer each question.

There is another "code" that makes our life easier when working with binomial probabilities. To calculate $P(Y \geq 17)$ we can do this:

In [14]:
pbinom(16, 25, 0.50, lower.tail = FALSE)

Above, we see that we used the "pbinom" code to calculate $P(Y \geq 17)$, given $\pi = 0.50$ and $n = 25$. A few things to note: "lower.tail = FALSE" means that we are looking for $P(Y > a)$, where $a$ is some number. If we said "lower.tail = TRUE", that would mean that we were looking for $P(Y < a)$. Now, you notice that rather than using 17, we used 16 in our "pbinom" command. This is because this command is computing $P(Y > a)$ instead of $P(Y \geq a)$. We use the fact that $P(Y > 16) = P(Y \geq 17)$. If you don't understand why this is please stop, think about it, talk about it, or ask about it.

## 1.2) Problem:
Suppose a statistics exam has 10 multiple choice questions (with options {a, b, c, and d}) and a passing grade is obtained with 7 or more correct answers. A student answers the 10 questions by flipping two fair coins and answering "a" if the student gets (Heads, Heads) on the coin tosses, "b" if the student gets (Heads, Tails), "c" if the student gets (Tails, Heads), and "d" if the student gets (Tails, Tails). Using the classical interpretation of probability, what is the chance the student will pass the exam? Answer this problem by inserting new cells below this one (not above!). Please use the "dbinom" or "pbinom" codes as you have seen above. Remember to switch between "markdown" and "code" depending on if you want to write text (markdown) or do calculations (code).

## 1.3) Generating binomial distribution random values
When we have a probability distribution such as the binomial probability distribution, we can generate "random" values from it. For example, lets say that 100 students are going to use the coin tossing as in "1.1) Example" above to try and pass the test. We can simulate the number of questions that each student would get correct (success) out of 25 problems (trials). Please run the code below and see the output of the "rbinom" function. 

In [23]:
rbinom(100, 25, 0.50)

In the above 100 random values (the number of correct problem answered on the 25 problem exam using a coin toss process) were generated. Each time you run that code above you will get a different set of "number of correct problems". 

Let's now simulate 2,000 students who are using the coin tossing to answer problems on this exam and see how many of them pass. To do this we will first generate 2000 binomial distribution random values (each one is a single students \# of correct answers). We will then count how many students get 17 or more answers correct. Ih this process we will save the random binomial distribution values as "x" so we can use them later.

In [31]:
x <- rbinom(2000, 25, .50)

In [32]:
x

In [33]:
as.data.frame(table(x))

x,Freq
5,1
6,5
7,33
8,64
9,131
10,194
11,263
12,299
13,307
14,274


Above we have used the "table" code to make a frequency table that shows "x" the number of questions answered correctly, and "Freq" the frequency or number of students out of the 2,000 that got "x" number of questions correct. The "as.data.frame" code simply makes the result into a nice looking table. Please insert a cell below and answer (in markdown) the number of students who got 17 or more questions correct.

We know from "1.1) Example" above that the probability of passing the test using the coin flipping strategy is 0.053876. Do you see about 5.3876\% of the students passing the test? Is it close to this percentage or far off? Please answer below (insert new cell, write answer as "markdown").

# 2) The Poisson Distribution
We defined the probability distribution function for the Poisson distribution as the following. Let $y$ be the number of events occurring during a fixed interval of time, the probability distribution function for a Poisson random variable is: $P(y) = \frac{\mu ^ y e^{-\mu}}{y!}$.

# 3) The Normal Distribution