# Computational Probability Theory

- Basic concepts of probability models needs to be revised to solve the problems below.
- Ask google.com, wikipedia.org, or any other helpful source to derive the correct answers
- Due: Oct. 14 Monday, 2019

You need to use `scipy` package to solve the problems. Install `scipy` if you don't have it installed in your computer by
```
pip3 install scipy
```

In [1]:
import scipy
print (scipy.__version__)

1.3.1


In [2]:
import numpy as np
print (np.__version__)

1.17.2


### We use sub-moduels `scipy.stats` and `scipy.special` as given below.

In [3]:
import scipy.stats
import scipy.special

## Normal distribution
The function `scipy.stats.norm.cdf(a)` gives the probability $Prob[X<a]$ for the standard normal distribution.
$$
    Prob[X < a] = \int_{-\infty}^{a} p(x) dx
$$
where $p(x)$ is the pdf of the standard normal distribution ($m=0$, $\sigma=1$)
$$
    p(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\bigg\{ -\frac{1}{2}\frac{(x-m)^2}{\sigma^2}  \bigg\}
$$

For example, $Prob[X < 0]$ for the standard normal distribution is 0.5 and can be obtained as follows

In [4]:
scipy.stats.norm.cdf(0)

0.5

The point x where cdf(x) == p can be obtained using `scipy.stats.norm.ppf(p)`.
- that is, the function cdf(x) is 0.5 at x = 0
- which is given below

In [5]:
scipy.stats.norm.ppf(0.5)

0.0

### Binomial distribution

- https://en.wikipedia.org/wiki/Binomial_distribution
- PMF (Probability Mass Function)
$$
    Pr[k | n, p] = {n \choose k} p^k (1-p)^{n-k}
$$

- use `scipy.stats.binom.cdf()` for CDF
- use `scipy.stats.binom.pmf()` for PMF

In [6]:
scipy.stats.binom.cdf(k=3, n=10, p=0.4)

0.3822806016000001

In [7]:
scipy.stats.binom.pmf(k=3, n=10, p=0.4)

0.21499084800000012

### Poisson distribution
- https://en.wikipedia.org/wiki/Poisson_distribution
- PMF
$$
    Pr[k | \mu] = Pr[ k \mbox{ events in interval }] = \frac{\mu^k \exp ^{-\mu}}{k!}
$$
    where 
    - $\mu$ is the average number of events per interval
    - wikipedia.org uses $\lambda$ instead of $\mu$
    - $k$ takes values 0,1,2, ...
- use `scipy.stats.poisson.cdf()` for CDF
- use `scipy.stats.poisson.pmf()` for PMF

In [8]:
scipy.stats.poisson.pmf(k=90, mu=100)

0.025038944623030353

In [9]:
scipy.stats.poisson.cdf(k=90, mu=100)

0.17138511932176148

### Bayes' Theorem
- https://en.wikipedia.org/wiki/Bayes%27_theorem

- product rule of probability
$$
    P(X=x_i, Y=y_j) = P(Y=y_j | X=x_i) p(X=x_i)
$$
    - $P(X=x_i, Y=y_j)$ is called the joint probability.

- sum rule of probability
$$
    P(X = x_i) = \sum_{j=1}^L P(X=x_i, Y=y_j)
$$
    - $P(X=x_i)$ is called the marginal probability.

- Bayes' theorem
$$
    P(Y=y_j | X=x_i) = \frac{ P(X=x_i | Y=y_j) P(Y=y_j) } {P(X=x_i)} = \frac{ P(X=x_i | Y=y_j) P(Y=y_j) } {\sum_k P(X=x_i | Y=y_k) P(Y=y_k)}
$$

---
## Problem
---

    A survey is conducted in a large office building. It is found that 30% of the office workers weigh less than 62kg and that 25% of the office workers weigh more than 98kg.

    The weights of the office workers may be modelled by a normal distribution with mean $m$ and standard deviation $s$.

(a)

1. Determine two simultaneous linear equations satisfied by $m$ and $s$.
2. Compute the values of $m$ and $s$

(b)
Find the probability that an office worker weighs more than 100kg.

    There are elevators in the office building that take the office workers to their offices.
    Given that there are 10 workers in a particular elevator,

(c)
find the probability that at least four of the workers weigh more than 100kg.

    Given that there are 10 workers in an elevator and at least one weighs more than 100kg,

(d)
find the probability that there are fewer than four workers exceeding 100kg.

    The arrival of the elevators at the ground floor between 08:00 and 09:00 can be modelled by a Poisson distribution.

    Elevators arrive on average every 36 seconds.

(e)
Find the probability that in any half hour period between 08:00 and 09:00 more than 60 elevators arrive at the ground floor.

    An elevator can take a maximum of 10 workers. Given that 400 workers arrive in a half hour period independently of each other,

(f) find the probability that there are sufficient elevators to take them to their offices.

## END.

In [10]:
import scipy.stats as st
import scipy.special as ssp

In [11]:
a, b = st.norm.ppf(0.3), st.norm.ppf(0.75)

In [12]:
s = 36 / (b - a)
s

30.02776910773775

In [13]:
m = 62 - a * s
m

77.74657751557635

In [14]:
z0 = (100 - m ) / s
z0

0.7410947648018662

In [15]:
Pz0 = st.norm.cdf(z0)

In [16]:
t = 1 - Pz0
t

0.22931799182323287

In [17]:
ssp.comb(10, 4) * t**4 * (1-t)**6

0.12168110266158293

In [18]:
sum = 0
for i in range (4):
    sum += ssp.comb(10,i) * t**i * (1-t)**(10-i)
1 - sum

0.1779522510824021

In [19]:
sum = 0
for i in range (4,11):
    sum += ssp.comb(10,i) * t**i * (1-t)**(10-i)

In [20]:
sum

0.17795225108240206

In [21]:
1 - st.binom.cdf(3, n=10, p=t)

0.17795225108240198

In [22]:
sum123 = 0
for i in [1,2,3]:
    sum123 += ssp.comb(10,i) * t**i * (1-t)**(10-i)
sum123

0.7481294092068188

In [23]:
nu= st.binom.cdf(3, n=10, p=t) - st.binom.cdf(0, n=10, p=t)

In [24]:
deno = 1 - st.binom.cdf(0, n=10, p=t)

In [25]:
nu, deno

(0.7481294092068189, 0.9260816602892209)

In [26]:
nu / deno

0.807843888165514

In [27]:
1 - st.poisson.cdf(60, mu=50)

0.07216017981325695

In [28]:
1 - st.poisson.cdf(39, mu=50)

0.935429631078867