#### A simulation exercise

- Did you show where the distribution is centered at and compare it to the theoretical center of the distribution?

- Did you show how variable it is and compare it to the theoretical variance of the distribution?

- Did you perform an exploratory data analysis of at least a single plot or table highlighting basic features of the data?

- Did the student perform some relevant confidence intervals and/or tests?

- Were the results of the tests and/or intervals interpreted in the context of the problem correctly?
Did the student describe the assumptions needed for their conclusions?

--- 

In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should:

- Show the sample mean and compare it to the theoretical mean of the distribution.

- Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

- Show that the distribution is approximately normal.

- In point 3, focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.

As a motivating example, compare the distribution of 1000 random uniforms

In [None]:
n <- 40
nsim <- 1000
lambda <- 0.2
theor_mean <- 1/lambda
theor_sd <- 1/lambda

In [None]:
sims_mean <- c()
sims_sd <- c()
for (i in 1:nsim) { 
    sample <- rexp(n, lambda)
    sample_mean <- mean(sample)
    sample_sd <- sqrt(var(sample))
    sims_mean <- c(sims_mean, sample_mean)
    sims_sd <- c(sims_sd, sample_sd)
}

In [None]:
mean(theor_mean > sims_mean - qnorm(0.975)*theor_sd/sqrt(n) & theor_mean < sims_mean + qnorm(0.975)*theor_sd/sqrt(n))

In [None]:
normed_sim <- (sims_mean - theor_mean) * sqrt(n) / theor_sd

In [None]:
jpeg('meanplot.jpg')
hist(sims_mean)
abline(v=5, col='red', lwd=5, lty=3)
legend(6, 220, legend=c("Sample mean hist", "Theoretical mean"),
       col=c("black", "red"), lty=c(1,3), cex=0.8)
dev.off()

In [None]:
jpeg('sdlot.jpg')
hist(sims_sd)
abline(v=5, col='red', lwd=5, lty=3)
legend(6, 330, legend=c("Sample SD hist", "Theoretical SD"),
       col=c("black", "red"), lty=c(1,3), cex=0.8)
dev.off()

In [None]:
norm_mean <- (sims_mean - theor_mean)*sqrt(n)/theor_sd

In [None]:
norm_mean2 <- (sims_mean - theor_mean)*sqrt(n)/sample_sd

In [None]:
x <- seq(-5, 5, length=1000)
hx <- dnorm(x)

In [None]:
qqnorm(norm_mean)
qqline(norm_mean)

In [None]:
hist(norm_mean2, freq = FALSE,  col="white")
hist(norm_mean2, freq = FALSE,  col="white", add=TRUE)
lines(x, hx,  col="red", type="l", lty=2, xlab="x value")

In [None]:
jpeg('normplot.jpg')
hist(norm_mean, freq = FALSE,  col="white")
lines(x, hx,  col="red", type="l", lty=2, xlab="x value")
legend(1, 0.35, legend=c("Normed sample mean", "Standard normal distribution"),
       col=c("black", "red"), lty=c(1,3), cex=0.8)
dev.off()