<a href="https://colab.research.google.com/github/yardsale8/probability_simulations_in_R/blob/main/2_4_simulating_the_Poisson_distribution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
library(dplyr)
library(tidyr)
library(purrr)
library(devtools)
install_github('yardsale8/purrrfect', force = TRUE)
library(purrrfect)

# Simulating the Poisson Distribution

There are two main ways to simulate a Poisson random variable.

1. Simulating the raw outcomes by converting $Uniform \rightarrow Exponential\rightarrow Poisson$, which is covered in chapter 5.
2. Simulating the number of successes directly using `rpois`, which we cover here.

Finally, we will illustrate how to use simulations to estimate the expected value.

### Review - Poisson Process

A simple outcomes--which represents a discrete event happening across a continous time/region--is generated by a [Poisson point process](https://www.google.com/search?client=safari&rls=en&q=wikipedia+poisson+process&ie=UTF-8&oe=UTF-8), provided
1. Non-overlapping intervals are independent,
2. The probability of an event is proportional to the length of the interval, and
3. The probability of two events happening at the same time is zero.

### The Poisson distribution

Suppose that we are counting events over a fixed iterval of time, and let $X$ is the number of events in that interval, then $X$ will have a [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution)

## Strategies for simulating the binomial distribution

1. Simulate the raw event trials (see chapter 5).
2. Simulate the number of successes directly using `rpois`.

### Example - Earthquake
Earthquakes occuring in a particular region can be modeled as a Poisson process with a mean rate of $\lambda = 2$ earthqukes per year.  Let $X$ represent the number of earthquakes in a randomly selected year.  Answer the following.

1. Find $P(X = 3)$,
2. Find $P(X \ge 3)$,
3. Find the cut off for top 25% of years in terms of number of earthquakes, and
4. Estimate the mean and variance of the number of earthquakes.


#### Setting up the simulations

We need to simulate a Poisson process with a mean rate of $\lambda = 2$, which can be accomplished with the base `rpois` function.

In [4]:
?rpois

In [3]:
replicate_int(10, rpois(1, 2))

.trial,.outcome
<dbl>,<int>
1,1
2,0
3,0
4,2
5,0
6,0
7,2
8,5
9,3
10,1


#### a. & b. Estimate $P(X = 3)$ and $P(X \ge 3)$

In [14]:
lambda <- 2
num.trials <- 100000
(replicate_int(num.trials, rpois(1, lambda))
 %>% mutate(is.three = .outcome == 3,
            at.least.three = .outcome >= 3)
 %>% estimate_all_prob
 %>% mutate(exact.is.three = dpois(3, 2),
            exact.at.least.three = 1 - sum(dpois(0:2, 2)))
 %>% relocate(exact.is.three, .after = is.three)
 %>% relocate(exact.at.least.three, .after = at.least.three)
 )

is.three,exact.is.three,at.least.three,exact.at.least.three
<dbl>,<dbl>,<dbl>,<dbl>
0.1796,0.180447,0.32103,0.3233236


#### c. Cut off for largest 25%

In [15]:
lambda <- 2
left.tail <- 1 - 0.25
num.trials <- 100000
(replicate_int(num.trials, rpois(1, lambda))
 %>% summarise(x.largest.25.percent = quantile(.outcome, left.tail) + 1)
 )

x.largest.25.percent
<dbl>
4


#### d. Estimate the mean and variance.

In [18]:
lambda <- 2
num.trials <- 100000
(replicate_int(num.trials, rpois(1, lambda))
 %>% summarise(mu = mean(.outcome),
               sigma.sqr = var(.outcome),
               sigma = sqrt(sigma.sqr))
 )

mu,sigma.sqr,sigma
<dbl>,<dbl>,<dbl>
1.99929,1.984889,1.408861


### Adjusting $\lambda$

It is important that the value of $lambda$ matches the time units of the fixed interval.  For example, if we were to ask questions about earthquakes in a given month, then we need the units of $\lambda$ to be *earthquakes per month*.

**Important.** Pay close attention to the units and convert as needed.

### Example - Earthquakes per century

Again we have earthquakes occuring in a particular region  with a mean rate of $\lambda = 2$ earthqukes per year.  Let $Y$ represent the number of earthquakes in a randomly selected century.  Estimate the mean and variance of $X$.

#### Step 1. Selecting $\lambda$

By [dimensional analysis](https://en.wikipedia.org/wiki/Dimensional_analysis), we convert 2 earthquakes per year into 200 earthquakes per century.

$$\lambda = \frac {??\:earthquakes}{century} = \frac {2\;earthquakes}{1\;year}\frac {100 \;years}{1\;century}=\frac {200\;earthquakes}{1\;century}$$



#### Step 2. Simulate and estimate

In [22]:
lambda <- 200
num.trials <- 100000
(replicate_int(num.trials, rpois(1, lambda))
 %>% summarise(mu = mean(.outcome),
               sigma.sqr = var(.outcome),
               sigma = sqrt(sigma.sqr))
 )

mu,sigma.sqr,sigma
<dbl>,<dbl>,<dbl>
200.0526,199.5205,14.12517


### <font color="red"> Exercise 2.4.1 - Flaws in a sheet of metal</font>

Flaws in metal sheeting produced by a high temperature roller occur at the rate of one per 10 square feet.  

1. What value of $lambda$ should be used? Explain.
2. What is the probability that exactly 3 flaws will appear in a 5-by-8 foot panel?
3. Find the cut off for the smallest 10% of flaws.

In [None]:
# Your code there