$\textbf{DS121 - STATISTICAL METHODS II} \\ \texttt{2Q SY2324}$

Instructor: EDGAR M. ADINA

# <center> Hypothesis Testing of Proportion

A population proportion is the share of a population that belongs to a particular category.

Hypothesis tests are used to check a claim about the size of that population proportion.

#### Example

Consider the population of freshmen students of Mapua.

     Population: Freshmen students of Mapua
     Category: Graduate of Mapua Senior High School
     
Let us check the claim that "more than 50% of freshmen students of Mapua are graduates of Mapua Senior High School".

By taking a sample of 400 randomly selected freshmen students of Mapu, we obtained the following information: 

     215 out of 400 freshmen students of Mapua were graduates of Mapua Senior High School


The **sample proportion** is then: $\frac{215}{400}=0.5375$  or 53.75%.

From this sample data we check the indicated claim.

### Defining the Claims

We need to define a **null hypothesis** ($H_0$) and an **alternative hypothesis** ($H_1$) based on the claim we are checking.

The claim was: "**More than 50%** of freshmen students of Mapua are graduates of Mapua Senior High School"

In this case, the parameter is the proportion of freshmen students of Mapua who are graduates of Mapua Senior High School ($p$).

The null and alternative hypothesis are then:

     Null hypothesis: 50% of freshmen students of Mapua are graduates of Mapua Senior High School.

     Alternative hypothesis: More than 50% of freshmen students of Mapua are graduates of Mapua Senior High School.

Which can be expressed with symbols as:

$$ H_0: p = 0.50 \\
   H_1: p > 0.50
$$

This is a **'right tailed'** test, because the alternative hypothesis claims that the proportion is **more than** in the null hypothesis.

If the data supports the alternative hypothesis, we **reject** the null hypothesis and accept the alternative hypothesis.

### Classical Approach

With Python use the scipy and math libraries to calculate the test statistic for a proportion.

In [1]:
import scipy.stats as stats
import math

# Specify the number of occurrences (x), the sample size (n), and the proportion claimed in the null-hypothesis (p)
x = 215
n = 400
p = 0.50

# Calculate the sample proportion
p_hat = x/n

# Calculate and print the test statistic
print((p_hat-p)/(math.sqrt((p*(1-p))/(n))))

1.4999999999999991


Now, we use the Scipy Stats library **norm.ppf()** function find the Z-value (critical value) for a significance level $\alpha = 0.05$ (default) in the right tail.

In [2]:
import scipy.stats as stats
print(stats.norm.ppf(1-0.05))

1.6448536269514722


The computed statistic is less than the critical value. This means that we cannot reject the null hypothesis.

Thus, at $\alpha=5%$, there is no sufficient evidence to show that more than 50% of freshmen students of Mapua are graduates of Mapua Senior High School.

### p-value Approach

Similarly, we use the Scipy Stats library norm.cdf() function find the P-value of a Z-value bigger than 1.49999999991:

In [3]:
import scipy.stats as stats
print(1-stats.norm.cdf(1.499999999999991))

0.0668072012688592


The p-value is greater than the significance level of 5%. Meaning, the probability of committing a Type I error if we reject the null hypothesis now is 6.68%. Thus, we cannot reject the null hypothesis. 

This agrees with the result in the classical approach.

## Exercise

**No. 1.**  Chronic pain is often defined as pain that occurs constantly and flares up frequently, is not caused by cancer, and is experienced at least once a month for a 1-year period of time. Many articles have been written about the relationship between chronic pain and the age of the patient. In a survey conducted on behalf of the **American Chronic Pain Association** in 2004, a random cross section of 800 adults who suffer from chronic pain found that 424 of the 800 participants in the survey were above the age of 50. Using the data in the survey, is there substantial evidence ($\alpha=$ .05) that more than half of persons suffering from chronic pain are over 50 years of age?

In [5]:
import scipy.stats as stats
import math

# Specify the number of occurrences (x), the sample size (n), and the proportion claimed in the null-hypothesis (p)
x = 424
n = 800
p = 0.50

# Calculate the sample proportion
p_hat = x/n

# Calculate and print the test statistic
z = (p_hat-p)/(math.sqrt((p*(1-p))/(n)))
print("Z-score: ", z)

# Calculate and print the critical value
critical_value = stats.norm.ppf(1-0.05)
print("Critical value: ", critical_value)

# Check if we can reject the null hypothesis
if z > critical_value:
    print("We reject the null hypothesis. There is enough evidence that more than half of persons suffering from chronic pain are over 50 years of age.")
else:
    print("We fail to reject the null hypothesis. There is not substantialenough evidence that more than half of persons suffering from chronic pain are over 50 years of age.")


Z-score:  1.6970562748477156
Critical value:  1.6448536269514722
We reject the null hypothesis. There is enough evidence that more than half of persons suffering from chronic pain are over 50 years of age.


**No. 2.** National public opinion polls are often based on as few as 1,500 persons in a random sampling of public sentiment on issues of public interest. These surveys are often done in person because the response rate for a mailed survey is very low and telephone interviews tend to reach a larger proportion of older persons than would be represented in the public as a whole. Suppose a random sample of 1,500 registered voters was surveyed about energy issues. Results showed that 230 of the 1,500 responded that they would favor drilling for oil in national parks. 

A congressman has claimed that over half of all registered voters would support drilling in national parks. Use the survey data to evaluate the congressman’s claim. Use $\alpha$ .05.

In [7]:
# Specify the number of occurrences (x), the sample size (n), and the proportion claimed in the null-hypothesis (p)
x = 230
n = 1500
p = 0.50

# Calculate the sample proportion
p_hat = x/n

# Calculate and print the test statistic
z = (p_hat-p)/(math.sqrt((p*(1-p))/(n)))
print("Z-score: ", z)

# Calculate and print the critical value
critical_value = stats.norm.ppf(1-0.05)
print("Critical value: ", critical_value)

# Check if we can reject the null hypothesis
if z < critical_value:
    print("We fail to reject the null hypothesis. There is not substantial evidence that more than half of all registered voters would support drilling in national parks.")
else:
    print("We reject the null hypothesis. There is substantial evidence that more than half of all registered voters would support drilling in national parks.")

Z-score:  -26.852684533704757
Critical value:  1.6448536269514722
We fail to reject the null hypothesis. There is not substantial evidence that more than half of all registered voters would support drilling in national parks.
