# Appendix C. Fundamentals of Mathematical Statistics

Statistical inference involves learning something about a population given the availability of a sample from the population. By "learning" Wooldridge mean several things such as estimation or hypothesis testing

Let Y be a random sample variable representing a population with a probability density function f(y;&theta;), which depdens on the singe parameter &theta. The probability density function of Y is assumed to be known except for the value &theta; different values of &theta; imply different population distributions and therefore we are interested in the value of this &theta; If we can obtain certain kinds of samples from the population , then we can learn something about &theta;

The easiest sampling scheme to deal with is random sampling.

Random Sampling. If Y1,Y2,...,Yn are independent random variables with a common probability density function f(y:&theta;), then [Y1,Y2,...,Yn] is said to be a random sample from f(y;&theta;) 

When [Y1,Y2,...,Yn] is a random sample from the density f(y;&theta;) we also say that Yi are independent, identically distributed or (i.i.d) random variables from f(y:&theta;)

## C.2. Finite Sample Properties of estimators

In this section we are concerned with finite sample properties of estimators. The term "finite" sample comes from the fact that properties holf for a sample og any size no matter how large or small. Sometimes these are called small sample properties

### C-2a Estimators and Estimates

Given a random sample {Y1, Y2,..,Yn} drawn from a population distribution that depends on an unknown parameter &theta; an estimator of &theta; is a rule that assigns each possible outcome of the sample a value of &theta;

As an example of an estimator, let {Y1,Y2,...,Yn} be a random sample from a population with mean &mu;. A natural estimator of &mu; is the average of the random sample: 

\begin{align}
\vec{Y} & = n^{-1} \sum_{i=1}^n Y_i \\
\end{align}

\begin{align}
\vec{Y} \\
\end{align}
 is called the sample average
The distribution of an estimator is often called sampling distribution.

### C-2b Unbiadsedness

An estimator, W of &theta;, is an unbiased estimator if E(W)=&theta; for all possible values of E(W)=&theta;

If an estimator is unbiased, then its probability distribution has an expected value equal to the parameter it is supposed to be estimating. 
Unbiadseness does not mean that the estimate we get with any particular is equal to &theta;, or even very close to &theta;

For an estimator that is not unbiased, we define its bias as follows.

If W is a biased estimator of &theta; its bias is defined: Bias(W):=E(W)-&theta;

Some estimators can be shown to be unbiased quite generally. The sample average is an unbiased estimator of the population mean &mu;, regardless of the underlying population distribution.Refer to Wooldridge page 677 for a formal proof

For hypotheses testing, we will need to estimate the variance &sigma;^2 from a population with mean &mu;. Letting {Y1,..,Yn} denote the random sample from the population with E(Y)=&mu; and Var(Y)= &sigma;^2, define the estimator as

\begin{align}
\ S^2 & = 1/(n-1) \sum_{i=1}^n (Y_i - \vec{y})^2 \\
\end{align}

which is usually called the sample variance. It can be shown that S^2 is an unbiased for sigma;^2: E(S^2)= &sigma;^2

The division by n-1, rather than n, accounts for the fact that the mean &mu; is estimated rathern than known.

### C-2d The sampling variance of estimators

Unbiadseness only ensures that the sampling distribution of an estimator has a mean value equal for the parameter it is supposed to be estimating. This is fine but we also need to know how spread out the distribution of an estimator is. An estimator can be equal to &theta;on average, but it can also be very far away with large probability. Refer to wooldrige page 679 for a visual example.


The variance of an estimator is often called its sampling variance and provides a single measure of the dispersion in the distribution.

write expression C.6

Therefore, if {Yi:i+1,2,..,n} is a random sample from a population with mean &mu; and variance &sigma;^2, Then the sample average has the same mean as the population, but its sampling variance equals the population variance,&sigma;^2 , divided by the sample size.

An important implication of the previsous expression C.6 is that it can be made very close to zero by increasing the sample size n. This is a key feature of an estimator

### C-2e Efficiency

Relative efficiency. If W1 and W2 are two unbiased estimators of &theta;, W1 is efficient relative to W2 when Var(W1) &le; Var(W2) for all &theta; with strict inequality for at least one value of &theta;

## C.3. Asymptotic or Large Sample Properties of Estimators

Asymptotic analysis involves approximating the features of the sampling distribution of an estimator. 

### C-3a Consistency

Let Wn be an estimator of &theta; based on a sample Y1,Y2,..., Yn of size n. Then Wn is a consistent estimator of &theta; if for every &epsilon; &ge;0, 

\begin{equation*}
P(\left| W_n - \theta  \right| \ge \epsilon) \rightarrow 0  as n \rightarrow \infty
\end{equation*}

If Wn is not consistent for &theta; then we say it is inconsistent

Unlike unbiadsedness, which is a feature of an estimator for a given sample size, consistency involves the behavior of the sampling distribuion of the estimator as the sample size n gets large.

If an estimator is not consistent then it does not help us to learn about &theta; even with an unlimited amount of data

An example of a consistent estimator is the average of a random sample drawn from a population with mean &mu; and variance &sigma;^2. We have already shown that the sample average is unbiased for &mu;. Given expression C.6 we notice that Var(Yhat) tends to cero as n increases so this estimator is also consistent.

The conclusion that the YHAT is consistent for &mu; holds even if Var(Yhat) does not exist. This classic result is known as the law of large numbers (LLN) 

### C-3b Asymptotic normality

Consistency informs us that the distribution of the estimator is collapsing around the parameter as the sample size gets large, it does not however tells us anything about the shape of the distribution for a given sample size

For constructing interval estimators and testing hypotheses we need a way to approximate the distribution of our estimators. Most econometric estimators have distributions that are well approximated by a normal distribution for large samples.

Asimptotic normality property means that the cummulative distribution function of a given sequence of random variables gets closer and closer to the CDF of the standard normal distribution as the sample size n gets large.

The central limit theorem (CLT), one of the most powerful results in probability and statistics, states that the average from a random sample for any population (with finite variance), when standarized, has an asymptotic standard normal distribution:

Central Limit Theorem (CLT). Let {Y1,Y2,..,Yn} be a random sample with mean &mu; and variance &sigma;^2. Then,

\begin{align} 
\ Z_{n} = \frac{(\vec{Y} - \mu)}{(\sigma/\sqrt{n})}
\end{align}

has an asymptotic standard normal distribution.

Most estimators envountered in statistics and econometrics can be written as functions of sample averages, in which case we can apply the law of large numbers and the central limit theorem.

By replacing &sigma; with its consistent estimator S_n in the previous equation we obtain

\begin{align} 
\frac{(\vec{Y} - \mu)}{(s_n/\sqrt{n})}
\end{align}

which also has an approximate standard normal distribution for large n.

## C-4. General approaches to parameter estimation

Methods of moments, maximum likelihood, Least Squares. Refer to Wooldridge for more details

## C-5. Interval Estimation and Confidence Intervals

Suppose we have a population with a N(&mu;,1) distribution and let {Y1,Y2,..,Yn} be a random sample from this population. the sample average Yhat has a normal distribution wiht mean &mu; and variance 1/n. Therefore we can standarize Yhat to have:

\begin{equation*}
P(-1.96 \leq \frac{(\vec{Y} - \mu)}{(1/\sqrt{n})} \ge 1.96) = 0.95
\end{equation*}

which is equivalent to:

\begin{equation*}
P(\vec{Y}-1.96/\sqrt{n} \leq \mu \ge \vec{Y}-1.96/\sqrt{n}) = 0.95
\end{equation*}

The previous equation tells us that the probability that the random interval contains the population mean &mu; is 0.95 or 95%

When we say that the previous equation is a 95% confidence interval for &mu; we mean that the random interval

\begin{equation*}
[\vec{Y}-1.96/\sqrt{n},\vec{Y}+1.96/\sqrt{n}] 
\end{equation*}

contains &mu; with probability 0.95. In other words, before the random sample is drawn, there is a 95% chance that the interval contains &mu;. This is an interval estimator and it is random interval as the endpoints change with different samples.

### C-5b Confidence Intervals for the Mean from the a Normally Distributed Population

The previous equation is not very useful in practice given that it assumes that the variance is known to be unity. It is easy to extend it to the case where the standard deviation /mu; is known to be any value. In this case the 95% confidence interval is 

\begin{equation*}
[\vec{y}-1.96 *\sigma/\sqrt{n},\vec{y]}+1.96 * \sigma/\sqrt{n}] 
\end{equation*}

Therefore provided &sigma; is not known, a confidence interval for &mu; is readily computed. To allow for unknown &sigma; we first must use an estimate. Let

\begin{align}
\ s = (\frac{1}{n-1} * \sum_{i=1}^n (y_i - \vec{y})^2 )^{1/2}\\
\end{align}

Denote the sample standard deviation. Thus we have obtained a confidence interval that depends entirely on the observed data by replacing &sigma; with its estimate s. However in this case the 95% level of confidence is no longer preserved becasuse s depends on the particular example.

In this case we rely on the t distribution. The t distribution arises from the fact that

\begin{equation*}
\frac{(\vec{Y} - \mu)}{(S/\sqrt{n})} \sim t_{n-1}
\end{equation*} 

where Yhat is the sample average and S is the sample standard deviation of the random sample {Y1,Y2,...,Yn}.

### Wooldridge. Example C.2. Effect of Job training grants on worker's productivity

We are analyzing scrap rates for firms that receive a job training grant in 1998. The scrap rates for 1987 and 1988 are printed in Wooldridge (Table C.3). We are interested in the change between the years. 

In [1]:
# Manually enter raw data from Wooldridge, Table C.3:
SR87<-c(10,1,6,.45,1.25,1.3,1.06,3,8.18,1.67,.98,1,.45,
                                      5.03,8,9,18,.28,7,3.97)
SR88<-c(3,1,5,.5,1.54,1.5,.8,2,.67,1.17,.51,.5,.61,6.7,
                                            4,7,19,.2,5,3.83)
# Calculate Change (the parentheses just display the results):
cat(" Change:",(Change <- SR88 - SR87))



# Ingredients to CI formula
cat("\n Sample Average:",(avgCh<- mean(Change)))
cat("\n Sample Size:", n    <- length(Change))
cat("\n Standard Deviation:",(sdCh <- sd(Change)))
cat("\n Standard Error:",(se   <- sdCh/sqrt(n)))
cat("\n 97.5% percentile (C) with n-1 degrees of freedom:",(c    <- qt(.975, n-1)))

# Confidence intervall:
cat("\n confidence interval: [",c( avgCh - c*se, avgCh + c*se ),"]")


 Change: -7 0 -1 0.05 0.29 0.2 -0.26 -1 -7.51 -0.5 -0.47 -0.5 0.16 1.67 -4 -2 1 -0.08 -2 -0.14
 Sample Average: -1.1545
 Sample Size: 20
 Standard Deviation: 2.400639
 Standard Error: 0.5367992
 97.5% percentile (C) with n-1 degrees of freedom: 2.093024
 confidence interval: [ -2.278034 -0.03096631 ]

given the previous confidence interval we can conclude that with 95% confidence, the average change in scrap rates in the population is not zero and thus that job training has a positive impact on worker's productivity (via scrap reduction rates).

### C-5c A simple Rule of Thumb for a 95% Confidence Interval

The previous confidence interval can be computed for any sample size and any confidence level. Given that the t distribution approaches the standard normal distribution as the degreess of freedom gets large, a rule of thumb for an approximate 95% confidence interval is 

\begin{equation*}
[\vec{y}\pm 2*se(\vec{y})]
\end{equation*} 

This is we obtain yhat and its standard error and then compute yhat plus or minus twice its standard error to obtain the confidence interval.

### C-5d Asymptotic Confidence Interval for NonnormalPopulations 

In some applications the population is clearly nonnormal. A leading case is the Bernouilli distribution where the random variable takes on only the values zero and one, in other cases the nonnormal population has no standard distribution. this does not matter provided the sample size is sufficiently large for the central limit theorem to give a good approximation for the distribution of the sample averata Yhat.

For large n, an approximate 95% confidence interval is

\begin{equation*}
[\vec{y}\pm 1.96*se(\vec{y})]
\end{equation*} 

where the value 1.96 is the 97.5th percentile in the standard normal distribution. 

### Wooldridge. Example C.3 Race Discrimination in Hiring

We are looking into race discrimination using the dataset AUDIT.DTA The variable y represents the difference in hiring rates between black and white applicants with identical CV. After calculating the average, sample size, standard deviation and the standard error of the sample average, the following code calculates the value for the factor c as the 97.5 percentile of the standard normal distribution which is 1.96. Finally, the 95% and 99% CI are reported.

In [2]:
library(foreign)
audit <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/audit.dta?raw=true")


In [3]:
# w =1 if the white person got the job offer from employee i, 0 otherwise
# b=1 if the black person got the job offer from employee i, 0 otherwise
# y=b-w
head(audit,4)

w,b,y
1,1,0
1,1,0
1,1,0
1,1,0


The distribution of y is clearly not normal, it is discrete and takes only three values, nevertheless and approximate confidence interval for &theta;b - &theta;w can be obtained by using large sample methods.

Using the 241 observed data points, we get:

In [4]:
cat("average w:",(avgw<- mean(audit$w)))
cat("\naverage b:",(avgb<- mean(audit$b)))
cat("\naverage y:",(avgy<- mean(audit$y)))

average w: 0.3568465
average b: 0.2240664
average y: -0.1327801

Given that: $\vec{b}$=.224 and  $\vec{w}$=.357 we conclude that there is evidence of discrimination against blacks. We can learn much more however by computing a 95% confidence interval for &mu;.

In [5]:
n   <- length(audit$y)
sdy <- sd(audit$y)
se  <- sdy/sqrt(n)
c   <- qnorm(.975)

# 95% Confidence intervall:
cat("[",avgy + c * c(-se,+se),"]")


[ -0.1936301 -0.07193011 ]

And computing a 99% confidence interval for &mu;.

In [6]:
# 99% Confidence intervall:
cat("[",avgy + qnorm(.995) * c(-se,+se),"]")

[ -0.2127505 -0.05280966 ]

Neither interval contains the value zero thus indicating that there is evidence of discrimination against black people.

## C-6. Hypotheses Testing

So far we have reviewed how to evaluate point estimators and we have seen how to construct and interpret confidence intervals. Sometimes however we are interested in a yes/no answer. Devising methods for anwsering such questions, using a sample of data, is known as hypothesis testing

### C-6a Fundamentals of Hypothesis Testing

Following Wooldridge we illustrate the concept with an election example. Suppose there are two candidates in an election, Candidares A and candidates B. Candidate A is reported to have received 42% of the popular vote, while Candidate B received 58%. These are supposed to represent the true percentages of the population.

Candidate A is convinced that more people mush have voted for him, so he would like to investigate whether the election was rigged. 

One way to proceed is to set up a hypothesis test. Let &theta; denote the true proportion of the population voting for Candidate A. The hypothesis that the reported results are accurate can be stated as

\begin{equation*}
H_0:\theta =.42
\end{equation*}

This is an example of a null hypothesis. The null hypothesis is presumed to be true until the data strongly suggests otherwise. In the current example Candidate A must present fairly strong evidence against the null hypothesis in order to win a recount

The alternative Hypothesis in the election example is that the true proportion voting for Candidate A in the election is greater than .42

\begin{equation*}
H_1:\theta \gt .42
\end{equation*}

In order to conclude that Ho is false and that H1 is true we must have evidence "beyond reasonable doubt" against H0.

In hypotheses testing we can make two kinds of mistakes. First we can reject the null hypothesis when it is in fact true, this is called a Type I error. In the election example a Type I occurs if we reject Ho when the true proportion of people voting for Candiate A is in fact .42.

The second kind of error is failing to reject Ho when it is actually false. This is called a Type II error.

We will never know with certainty whether an error was committed. However we can compute the probability of making either a Type I or a Type II error. Generally we define the significance level of a test as the probability of a Type I error, this is typically denoted by &alpha;

\begin{equation*}
\alpha = P(Reject \hspace{.1cm} H_o \mid H_o)
\end{equation*}

This is read: The probability of rejecting Ho given that Ho is true. If &alpha;=.5 then the researcher is willing to falsely reject Ho 5% of the time, in order to detect deviations from Ho

Once we have chosen the significance level, we would then like to minimize the probability of a Type II error or, conversely, maximize the power of a test against all relevant alternatives. The power of a test is just one minus the probability of a Type II error. Mathematically

\begin{equation*}
\pi (\theta) =1- P(Type II  \mid \theta)
\end{equation*}

Where &theta; denotes the actual value of the parameter

### C-6b Testing hypotheses about the mean in a Normal Population

In order to test a null hypothesis against an alternative, we need to choose a test statistic and a critical value. The choices for the statistic and critical value are based on convenience and on the desire to maximize power given a significance level for the test.

Testing hypotheses about the mean &mu; from a normal population is straightforward. The null hypothesis is stated as:

\begin{equation*}
H_0:\mu = \mu_0
\end{equation*}

The rejection rule we choose depends on the nature of the alternative hypothesis. If we are interested for instance in the value of &mu; only when &mu; is at least as large as &mu;0 then

\begin{equation*}
H_1:\mu \gt \mu_0
\end{equation*}

Intuitively we should reject Ho in favor of H1 when the value of the sample average, yhat, is "sufficiently" greater than &mu;o. In order to determine when yhat is large enough for Ho to be rejected at the chosen significance level we use the following test statistic

\begin{equation*}
\sqrt{n}*(\vec{y}-\mu_0)/s = (\vec{y}-\mu_0)/se(\vec{y})
\end{equation*} 



Where se(yhat)=s/SQRT(n) is the standard error of yhat. Given the sample of data, it is easy to obtain t. We work with t because, under the null hypothesis, the random variable

\begin{equation*}
T=\sqrt{n}*(\vec{Y}-\mu_0)/S 
\end{equation*} 

has a tn-1 distribution. Now suppose we have settled on a 5% significance level. Then, the critical value c is chosen so that P(T>c|Ho)=0.5. That is, the probability of a Type I error is 5%. Once we have found c, the rejection rule is

\begin{equation*}
t \gt c
\end{equation*}

Where c is the 100(1-&alpha;) percentile in a tn-1 distribution. This is an example of a one-tailed test because the rejection region is in one tail of the t distribution.

### Wooldridge. Example C.5 Race Discrimination in Hiring

In the Urgan Institute study of discrimination in hiring (see Example C.3) usig the data in AUDIT, we are primarily interested in testing Ho:&mu;=0 against Ho:&mu;<0 where &mu; is the difference in probaiblities that blacks and whites receive job offers.

In [7]:
library(foreign)
audit <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/audit.dta?raw=true")
head(audit,4)

w,b,y
1,1,0
1,1,0
1,1,0
1,1,0


Using the 241 paired comparisons in the data file AUDIT, we obtain:

In [8]:
cat("\nAverage y:",(avgy<- mean(audit$y)))
n   <- length(audit$y)
sdy <- sd(audit$y)
cat('\nStandard Error:',se  <- sdy/sqrt(n))


Average y: -0.1327801
Standard Error: 0.03104648

With the t statistic for testing Ho:&mu;=0 can be aproximated by:

In [9]:
cat('t statistic aproximation',(t <- avgy/se))

t statistic aproximation -4.276816

The value is so far out in the left tail of the distribution that we reject Ho at any reasonable significance level. 

As the following table shows the .0.5 critical value for the one-side test is about -2.58 thus a t value of -4.28 is very strong evidence against Ho in favor of H1. Thus we conclude that there is discrimination in hiring.

In [10]:
alpha.one.tailed = c(0.1, 0.05, 0.025, 0.01, 0.005, .001)
CV <- qt(1 - alpha.one.tailed, n-1)
cbind(alpha.one.tailed, CV)

alpha.one.tailed,CV
0.1,1.285089
0.05,1.651227
0.025,1.969898
0.01,2.341985
0.005,2.596469
0.001,3.124536


### C-6d Computing and Using p-Values

the p-value of a test provides a measure of how large is the significance level at which we could carry the significance test and still fail to reject the null hypothesis.

Generally, small p-values are evidence against Ho, since they indicate that the outcome of the data occurs with small probability if Ho is true.

On the other hand, a large p-value is weak evidence against Ho.

### Wooldridge. Example C.6. Effect of Job training grants on worker's productivity (Revisited)

Consider again the example C.2. From a policy perspective, there are two questions of interest. First, what is our best estimate of the mean change in scrap rates &mu; ?

We have already obtained this for the 20 sample of firms understudy. The sample average of the change in scrap rates is:

In [11]:
# Manually enter raw data from Wooldridge, Table C.3:
SR87<-c(10,1,6,.45,1.25,1.3,1.06,3,8.18,1.67,.98,1,.45,
                                      5.03,8,9,18,.28,7,3.97)
SR88<-c(3,1,5,.5,1.54,1.5,.8,2,.67,1.17,.51,.5,.61,6.7,
                                            4,7,19,.2,5,3.83)
# Calculate Change (the parentheses just display the results):
Change <- SR88 - SR87



# Ingredients to CI formula
cat("\n Sample Average:",(avgCh<- mean(Change)))


 Sample Average: -1.1545

Relative to the initial average scrap rate in 1987, this represent a fall in the scrap rate of about 26.3% (-1.15/4.38 = -.263), which is a non trivial effect

Second, we would also like to know whether the sample provides strong evidence for an effect in the population of manufacturing firms that could have received grants. The null Hypothesis is Ho:&mu;=0 and we test this against H1:&mu;<0, where &mu; is the average change in scrap rates.

In [12]:
# Manually enter raw data from Wooldridge, Table C.3:
SR87<-c(10,1,6,.45,1.25,1.3,1.06,3,8.18,1.67,.98,1,.45,
                                      5.03,8,9,18,.28,7,3.97)
SR88<-c(3,1,5,.5,1.54,1.5,.8,2,.67,1.17,.51,.5,.61,6.7,
                                            4,7,19,.2,5,3.83)
# Calculate Change (the parentheses just display the results):
cat(" Change:",(Change <- SR88 - SR87))



# Ingredients to CI formula
cat("\n Sample Average:",(avgCh<- mean(Change)))
cat("\n Sample Size:", n    <- length(Change))
cat("\n Standard Deviation:",(sdCh <- sd(Change)))
cat("\n Standard Error:",(se   <- sdCh/sqrt(n)))
cat("\n 97.5% percentile (C) with n-1 degrees of freedom:",(c    <- qt(.975, n-1)))

# Confidence intervall:
cat("\n confidence interval: [",c( avgCh - c*se, avgCh + c*se ),"]")


 Change: -7 0 -1 0.05 0.29 0.2 -0.26 -1 -7.51 -0.5 -0.47 -0.5 0.16 1.67 -4 -2 1 -0.08 -2 -0.14
 Sample Average: -1.1545
 Sample Size: 20
 Standard Deviation: 2.400639
 Standard Error: 0.5367992
 97.5% percentile (C) with n-1 degrees of freedom: 2.093024
 confidence interval: [ -2.278034 -0.03096631 ]

In [13]:
cat('t-statistic',(t <- avgCh/se))
# p value
cat('\np-value',(p <- pt(t,n-1)))

t-statistic -2.150711
p-value 0.02229063

This small p-value gives reasonable evidence against Ho. This is certainly enough evidence to reject the null hypothesis that the train grants had no effect at the 2.5% significance level (or higher, for instance 5%)

### Wooldridge. Example C.7 Race Discrimination in Hiring (revisited)

Using the matched pairs data from the Urban Institute in the AUDIT data file (n=241), we obtained the following t statistic

In [14]:
library(foreign)
audit <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/audit.dta?raw=true")
avgy<- mean(audit$y)
n   <- length(audit$y)
sdy <- sd(audit$y)
se  <- sdy/sqrt(n)

In [15]:
cat('t statistic aproximation',(t <- avgy/se))

t statistic aproximation -4.276816

In [16]:
cat('\np-value',(p <- pt(t,n-1)))


p-value 1.369271e-05

This  extremely low p-value provides strong evidence against Ho.

### Summary of how to Use p-values

(i) Choose a test statistic T an decide on the nature of the alternative. This determines whether the rejection rule is t>c, t<-c or |t| >c 

(ii) Use the observed value of the t-statistic as the critical value and compute the corresponding significance level of the test. This is the p-value. 

(iii) If a significance level &alpha; has been chosen, then we reject Ho at the 100&alpha;%level. Therefore, it is a small p-value that leads to rejection of the null hypothesis

## Summary

An estimator is a rule for combining data to estimate a population parameter. Two core properties of estimators are unbiadseness and efficiency. Any useful estimator is consistent. 

The central limit theorem implies that, in large samples, the sampling distribution of most estimators is approximately normal.

The sampling distribution of an estimator can be used to construct confidence intervals. Classical hypotheseis testing, which requires specifying a null hypothesis, an alternative hypopothesis and a significance level, is carried out by comparing a test statistic to a critical value. Alternatively, a p-value can be computed that allows us to carry out a test at any significance level.