## Question 1

One of the classic methods for measuring the acceleration due to gravity to to use a simple pendulum. One measures the period, T, of the oscillations of the pendulum, which consists of mass, m, suspended from a cable of length, l. If the oscillations are small (less than a few degrees), then the period of oscillation is given by: 

$T = 2 \pi \sqrt{\frac{l}{g}}$
 
Suppose that you hang a cat, with a mass of 3.29 +/- 0.38 kg from a cable whose length is measured to be 9.66 +/- 0.03 m. The experiment is performed, and you measure the time for 100 oscillations of the pendulum. The TOTAL time for these 100 oscillations is measured to be 629.0 +/- 0.3 s.

(a) What is the average value for the period of oscillation of the pendulum?

(b) What is the uncertainty in the average value of the period of oscillation of the pendulum?

(c) What is the average value of the acceleration due to gravity, as measured in this experiment?

(d) Assuming that the uncertainties quoted above follow a uniform error distribution, what is the measured uncertainty in the acceleration due to gravity?

(e) Assuming that the uncertainties quoted above follow a Gaussian error distribution, what is the measured uncertainty in the acceleration due to gravity?


### Solution

This problem is quite similar to the first problem on the first assignment.  Referring to
the solution that I posted already for that problem, the only thing that we really have 
to change then is:

(a) the expression for g
(b) the variables used

To get the expression for g, we just solve for g in the expression given in the question for the period, to get that:

$g = \frac{4 \pi^2 l}{T^2}$

In [33]:
import numpy as np
import scipy.stats as stats

from sympy import *

l = 9.66
dl = 0.03
Nd = 100
T_total = 629.0
dT_total = 0.3

T = T_total/Nd
dT = dT_total/Nd

print ("a: %0.2f" % T)
print ("b: %0.3f" % dT)

g, l, T, dg, dl, dT = symbols("g, l, T, dg, dl, dT")
g = 4*pi**2*l/T**2

g

a: 6.29
b: 0.003


4*pi**2*l/T**2

In [34]:
dgdl = abs(diff(g,l))

dgdl

4*pi**2*Abs(T**(-2))

In [35]:
dgdT = abs(diff(g,T))

dgdT

8*pi**2*Abs(l/T**3)

In [36]:
# Calculate the value of the function
g = N(g.subs({l:9.66, T:6.29}))
print ('g = %0.2f ' % g)

g = 9.64 


In [38]:
# Calculate the uncertainty in g, assuming uniform uncertainties.
dg = dgdl*dl + dgdT*dT

dg

8*pi**2*dT*Abs(l/T**3) + 4*pi**2*dl*Abs(T**(-2))

In [42]:
# Evaluate numerically
dg = N(dg.subs({l:9.66, T:6.29, dl:0.03, dT:0.003}))

print ("g = %0.2f +/- %.4f" % (g,dg))

g = 9.64 +/- 0.0391


In [43]:
# calculate the uncertainty in g, assuming gaussian uncertainties.
dg = sqrt( dgdl**2*dl**2 + dgdT**2*dT**2 )

dg

sqrt(64*pi**4*dT**2*Abs(l/T**3)**2 + 16*pi**4*dl**2*Abs(T**(-2))**2)

In [44]:
# Evaluate numerically
dg = N(dg.subs({l:9.66, T:6.29, dl:0.03, dT:0.003}))

print ("g = %0.2f +/- %.4f" % (g,dg))

g = 9.64 +/- 0.0313


## Question 2

When circuit boards used in the manufacture of compact disc players are tested, the long-run percentage of defectives is 5%. Let X = the number of defective boards in a random sample of size n = 25, so P(X) ~ BINOMDIST(X,25,0.05,0).

(a) Determine P(X <= 2).

(b) Determine P(X >= 5).

(c) Determine P(1 <= X <= 4).

(d) What is the probability that none of the 25 boards is defective?

(e) Calculate the expected value and standard deviation of X.

### Solution

This question is $exactly$ the same as one on the homework problems, so I am not going to give a detailed explanation for its solution, given that we went over it in class already.

In [23]:
n = 25
p = 0.05

X = stats.binom(n,p)

print ("a: %0.3f" % X.cdf(2))

print ("b: %0.5f" % (1-X.cdf(4)))

print ("c: %0.3f" % (X.cdf(4)-X.cdf(0)))

print ("d: %0.3f" % X.cdf(0))

print ("e: %0.2f, %0.4f" % (X.mean(),X.std()))

a: 0.873
b: 0.00716
c: 0.715
d: 0.277
e: 1.25, 1.0897


## Question 3

There are two machines available for cutting corks intended for use in wine bottles. Measurements of 25 corks from the first machine indicates that it produces corks with diameters that are distributed with a sample mean 3.02 cm and sample standard deviation 0.08 cm. Measurements of 30 corks from the second machine reveals that it produces corks with diameters that have a distribution with sample mean 3.03 cm and sample standard deviation 0.05 cm. Acceptable corks have diameters between 2.9 cm and 3.1 cm.

What is the uncertainty in the true mean cork diameter for the first machine? (Round your answer to four decimal places.)

What is the uncertainty in the true mean cork diameter for the second machine? (Round your answer to four decimal places.)

What is the probability that the first machine will produce an acceptable cork? (Round your answer to four decimal places.)


What is the probability that the second machine will produce an acceptable cork? (Round your answer to four decimal places.)

### Solution:

People had a lot of difficulty with this question, which honestly surprised me, but that happens!  One of the crucial concepts in this course is understanding the difference between the parameters associated with the $population$ and those associated with a $sample$ of the population.

In the homework problem that was very similar to this question, the second and third sentences are:

"The first produces corks with diameters that are normally distributed with mean 3.02 cm and standard deviation 0.08 cm. The second machine produces corks with diameters that have a normal distribution with mean 3.03 cm and standard deviation 0.05 cm."

The wording of the homework problem is clear; they are telling us the values of $\mu$ and $\sigma$, the $population$ mean and standard deviation, respectively.

Now, compare that to the wording of this question, where it $explicitly$ states that there was an experiment carried out, and the results of the experiment are given, for the $samples$ taken in that experiment.  Thus again, the wording is clear. In this problem, they are telling you the values of the $sample$ mean and standard deviation, $\bar{x}$ and $s$.

The conclusion, then, is that we must use the t-distribution to calculate probabilities associated with mean of the distribution, and $not$ the z-distribution, as we did in solving the homework problem.

In [29]:
# As we do NOT know the value of sigma, we MUST use the t-distribution!!!!!

n1=25
n2=30
df1=n1-1
df2=n2-1

xb1=3.02
xb2=3.03
s1=0.08
s2=0.05

X1 = stats.t(df=df1,loc=xb1,scale=s1)
X2 = stats.t(df=df2,loc=xb2,scale=s2)
#
# Note:  in the file called Assignment2_Solutions.ipynb in my github repo, these two lines
# were written as:
#
# X1 = stats.norm(3.02,0.08)
# X2 = stats.norm(3.03,0.02)
#
# So, here, we are literally just replacing those two lines with the corresponding
# expression for the t-distribution, which requires the additonal parameter of number
# of degreess of freedom.  Then we solve parts (c) and (d) in exactly the same way as
# in the homework problem.
#

xlow = 2.9
xhigh = 3.1

# The uncertainty is the standard error in the mean = s/sqrt(n)  (this is for parts a and b)
print ("a: %0.4f" % (s1/np.sqrt(n1)))
print ("b: %0.4f" % (s2/np.sqrt(n2)))

# The probability of an acceptable cork is the integral of the t-distribution (cdf) between
# the high and low acceptable values.
print ("c: %0.4f" % (X1.cdf(xhigh)-X1.cdf(xlow)))
print ("d: %0.4f" % (X2.cdf(xhigh)-X2.cdf(xlow)))


a: 0.0160
b: 0.0091
c: 0.7630
d: 0.9067


## Question 4

Based on an extensive series of experiments carried out over many years, it has been found that the true mean survival rate (μ) for cats dropped from the Trible Library is 95%, with a standard deviation (σ) of 2%. Recent meetings of the CNU administration have resulted in a report that states that in any future experiments, the mean survival rate must be greater than 90.0%, but at the same time must not be greater than 98.0%. The rationale for these limits is unclear. It is suspected that representatives from the Faculty of Arts and Humanities were involved. You have been assigned the job of supervisor of cat dropping experiments at CNU. Your job depends on making sure these limits are not violated. As such, you need to write protocols for future experiments, and the most important thing to decide is: How many cats must be dropped in each experiment? Because your job depends on it, you decide to set the Type-I error probability, α, at 0.001. Calculate the minimum value of N that will be required for you to keep your job.

### Solution

As we know the value of $\sigma$, we can use the z-distribution.  Again, typically the first thing that you should think about in solving any of these types of problems is: what is the appropriate distribution/test to use to answer the question that you are attempting
to answer.

Looking at the data given, we see that we have been provided with the values of:

$\alpha$, $\mu$, $\sigma$, and then a lower limit ($\bar{x}_{low}$) and an upper limit 
($\bar{x}_{high}$).

Whenever you see that there are $limits$ provided, this should trigger in your mind the idea that we can use these values to calculate $critical$ values for the appropriate probability distribution.  So, the lower limit and the upper limit should correspond to
critical values for $z_{low}$ and $z_{high}$:

$z_{low} = \frac{(\bar{x}_{low} - \mu)}{\frac{\sigma}{\sqrt{n}}}$

$z_{high} = \frac{(\bar{x}_{high} - \mu)}{\frac{\sigma}{\sqrt{n}}}$

Looking at these equations, we see that they depend on $n$, which is what we are trying to find!!

So, how do we proceed?  Well, we need to find the values of $z_{low}$ and $z_{high}$ by using the value of $\alpha$, which we have not used yet!!

$z_{low} = \rm{stats.norm.ppf(}\alpha\rm{)}$

$z_{high} = \rm{stats.norm.ppf(}1-\alpha\rm{)}$

In [45]:
import numpy as np
import scipy.stats as stats

mu = 95.0
sigma = 2.0

xblow = 90.0
xbhigh = 98.0

alpha = 0.001

zlow = stats.norm.ppf(alpha)
zhigh = stats.norm.ppf(1-alpha)
print ("Critical values are %0.3f and %0.3f" % (zlow,zhigh))

Critical values are -3.090 and 3.090


Again, we turn to the definition of z:

$z = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}$

Solving for n, we get that:

$ n = \left( \frac{\sigma z}{(\bar{x}-\mu)} \right)^2 $

So, we can calulate a value of n for the lower limit and the upper limit!

If we calculate a non-integer value of n, we have to round UP to the next highest integer.
So, we take the integer value of the quantity (which just cuts off the decimal places),
and then add one to that.

Finally, we need to take the larger of these two values of n that we calculate.

In [49]:
nlow = ((zlow*sigma)/(xblow-mu))**2
nhigh = ((zhigh*sigma)/(xbhigh-mu))**2

nlow = int(nlow)+1
nhigh = int(nhigh)+1

print ("Minimum required number of cats = ",max(nlow,nhigh))

Minimum required number of cats =  5
