**There are 4 problems. Please read the questions carefully and complete the tasks below. Upload your codes with output using the naming convention "SudarsanAcharya_MCL605FinalLabExam.ipynb" to https://tinyurl.com/17ohpzxi**

In [None]:
library(ggplot2)

**Problem-1**: We have a box with $20$ balls in four different colors. Out of the $20$ balls, there are $5$ balls for each color. Balls for each color are numbered from $1$ to $5.$ We pick $4$ balls from the box without replacement. Compute the following probabilities using simulation:
1. probability that we get two different colors, two balls each?
2. probability that we get four different colors and the sum of the numbers
on the balls is equal to $8$?

In [None]:
# Sampling space (consider the colors as r, g, b, y and the balls numbered
# as 1, 2, 3, 4, 5. So we have r1, g1, b1, y1, r2, g2, b2, y2, etc.)
S = paste0(c('r', 'g', 'b', 'y'), c(replicate(4, c(1:5))))

# Corresponding probabilites for the outcomes in the sampling space
p = (1/length(S)) * replicate(length(S), 1)

simulationsize = 1e03

simulatedData = replicate(simulationsize,sample(S, size = 4, prob = p))
#print(simulatedData) # use a small simulationsize for checking purposes

# Function to check if we get two different colors, two balls each
checkEvent1 = function(data){
  # Extract the first character in the simulated data
  colors = substr(data, 1, 1)
  #print(colors) # use for checking purposes
  unique_colors = unique(colors)
  for (val in unique_colors){
    if (sum(table(val) == 2) != 2){
      return(1)
      }   
  }
  return(0)  
}

approximate_probability = sum(apply(simulatedData, 1, checkEvent1)) / simulationsize
cat('Approximate probability of getting 2 different colors, 2 balls each = ', approximate_probability, '\n')

# Function to check if we get four different colors and sum on the balls is 8
checkEvent2 = function(data){
  # Extract the first character in the simulated data
  colors = substr(data, 1, 1)
  unique_colors = unique(colors)
  # Extract the second character in the simulated data
  numbers = as.numeric(substr(data, 2, 2))
  if (table(data) == 4 && sum(data) == 8){
      return(1)
      }
  else{
    return(0)
      }   
  }  
approximate_probability = sum(apply(simulatedData, 2, checkEvent2)) / simulationsize
cat('Approximate probability of getting 4 different colors and sum 8 = ', approximate_probability, '\n')



Approximate probability of getting 2 different colors, 2 balls each =  0.004 
Approximate probability of getting 4 different colors and sum 8 =  0 


**Problem-2**: A machine produces items in batches. For each batch, the machine can be in control or out of control. Suppose the machine is in control for $98\%$ of the production batches; The production defect rate is $0.05$ when the machine is in control and $0.2$ when the machine is out of control.

Suppose we want to update our prior knowledge of the probability that the machine is in-/out-of control. The updated probabilites are called posterior probabilities. To that end, we select a small random sample of $5$ items from a production batch and inspect how many among them are defective, which is represented by the random variable $X.$  Compute the missing entries in the following table:
![](https://bl3302files.storage.live.com/y4mak7FXQkOYKK6NgHFgz4yyatR7pkxfh8X9_mEP8QerzWCxC7CtfGnDDq1vvtjZUj59TGJG6guRLQXyM98iOTskbRfL4gftgeF3Lc5beimrZA89Y_1yl0K1By_fNDbm6IHHHumF25i_vHFN69wSYmAuawAIpFx0jFf5zBUGTnsoxyhPRYnzTZ8nPSvrSSDVvQW?width=660&height=186&cropmode=none)

Suppose we want to interrupt the production process when we are suspicious that the production process is out of control based on the number of defective items we see in a sample of 5. Let us quantify our suspicion as a $30\%$ or greater chance. Would you stop production if 3 defective items are observed?

**Hint**: Apply Bayes' theorem to $P(\text{in control}\,|\,X =j)$ and identify what kind of a random variables are $P(X = j\,|\,\text{in control})$ and $P(X = j\,|\,\text{out of control}).$ 

In [None]:
n = 5
samplesize = c(0:5)
p_ic =  0.05 # defect rate when machine is in control
p_oc = 0.2 # defect rate when machine is out of control
icp = 0.98 # prior in control probability
ocp = 1-icp # prior out of control probability
for (j in samplesize) {
  uicp = icp * dbinom(j, size=n, prob=p_ic) / (icp * dbinom(j, size=n, prob=p_ic) + ocp * dbinom(j, size=n, prob=p_oc)) # posterior in control probability
  uocp = ocp * dbinom(j, size=n, prob=p_oc) / (icp * dbinom(j, size=n, prob=p_ic) + ocp * dbinom(j, size=n, prob=p_oc)) # posterior out of control probability
  print(j)
  print(uicp)
  print(uocp)
}         

[1] 0
[1] 0.9914316
[1] 0.008568378
[1] 1
[1] 0.9605672
[1] 0.03943276
[1] 2
[1] 0.8368237
[1] 0.1631763
[1] 3
[1] 0.5191501
[1] 0.4808499
[1] 4
[1] 0.1851999
[1] 0.8148001
[1] 5
[1] 0.04566636
[1] 0.9543336


**No, we will continue the production as the suspicion is 30% which is less. **

**Problem-3**: A man claims to have extrasensory perception (ESP). As a test, a fair coin is flipped $10$ times and the man is asked to predict the outcome in advance. He gets $7$ out of $10$ correct. What is the probability that he would have done *at least* this well if he had no ESP?

**Hint**: If he had no ESP, what would be the probability of success, i.e. guessing an outcome? You can treat the number of correct guesses if he had no ESP as an appropriate discrete random variable $X.$

In [None]:
n = 10
p = 0.5
j = c(7:10)
sum(dbinom(j, n, p))

**Problem-4**: Suppose a random number of $K$ customers shop at a supermarket in a day. Let $X_1,X_2,\ldots,X_K$ represent the random number of items purchased independently by the $1$st, $2$nd, $\ldots,$ $K$th customer. The total number of items sold in a day is a random number $Y$ such that: $$ Y = X_1+X_2+\cdots+X_K.$$ Suppose that $30$ customers visit the supermarket on an average per day and that each customer buys on an average 3 items.

If you are the supermarket owner and want to increase the *expected* total number of items sold in a day. You have two options: (a) increase the *expected* number of customers by $10\%$ by spending on external advertisement (b) increase the *expected* number of items purchased by each cutomer by $10\%$ by spending on internal (in-shop) advertisement. 
1. Do both options result in an increase in the expected total number of items sold in a day? **Hint**: use simulation, and recall what is the expected value of a Poisson random variable.
2. Which option results in the greatest increase in the expected total number of items sold in a day?
3. Which option is least risky? **Hint**: standard deviation

In [None]:
simulationsize =1e04
K = 30

for (i in 1:length(simulationsize)){
  if ( ?!= 0){
    Y[i] = 
      }
    }

# Expected number of items sold in a day
mean(?)

# Standard deviation of items sold in a day
stdev(?)