# Z-Test
A Z-Test is a statistical test that is used to determine whether the mean of a sample is significantly different from a known population mean, when the population standard deviation is known. It is a common method for comparing a sample mean to a population mean and is typically used in situations where there is large sample size to assume that the sample mean follows a normal distribution.

The steps for conducting a Z-Test are,
1. Formulate the null hypothesis (H0) and the alternative hypothesis (H1),
    - H0: The population mean is equal to the hypothesized value ($\mu = \mu_0$).
    - H1: The population mean is not equal to the hypothesized value ($\mu ≠ \mu_0$).
2. Collect a random sample from the population of interest and calculate the sample mean ($\bar{x}$).
3. Determine the population standard deviation ($\sigma$) and sample size ($n$).
4. Calculate the Z-Score using the formula, $Z = \frac{(x̄ - μ)}{\frac{σ}{\sqrt{n}}}$.
5. Determine the critical values or the critical region based on the chosen significance level ($\alpha$). Common values for $\alpha$ are 0.05 and 0.01.
6. Compare the calculated Z-Score to the critical values,
    - If the Z-Score falls outside the critical region, the nul hypothesis is rejected in favor of the alternative hypothesis. This indicates that the sample mean is significantly different from the population mean.
    - If the Z-Score falls within the critical region, the null hypothesis is not rejected. Suggesting that there is not enough evidence to conclude a significant difference between the sample mean and the population mean.

Z-Tests are often used in quality control, clinical trials and various research studies when there is access to a large enough sample and the population standard deviation is known. If the population standard deviation is unknown or the sample size is small, a T-Test may be more appropriate.

It is important to note that the assumption of Z-Test, such as normal distribution of the sample means and known population standard deviation, need to be met for the test to provide accurate results. Violating these assumptions may lead to incorrect conclusions.

# Right-Tailed Z-Test
The right-tailed Z-Test is done when there is need to prove that the claim is greater than the given mean.

### Implementation
1. Setup H0 and H1.
2. Calculate the population men, sample mean, population standard deviation, and sample standard deviation.
    ```Python
    population_mean = <given>
    population_sd = <given>
    sample_mean = <given>
    sample_sd = population_sd/ sqrt(n)
    ```
3. Find the Z-Score,
    ```Python
    z_score = (sample_mean - population_mean)/ sample_sd
    ```
4. Find the p-value,
    ```Python
    p_value = 1 - norm.cdf(z_score)
    ```
5. Determine whether to accept or reject the H0.
    ```Python
    if p_value < alpha:
        print("Reject H0")
    else:
        print("Accept H0")
    ```

### Overall code
```Python
from scipy.stats import norm

# H0: the null hypothesis
# HA: the alternative hypothesis

population_mean = <given>
population_sd = <given>
sample_mean = <given>
sample_sd = population_sd/ sqrt(n)

z_score = (sample_mean - population_mean)/ sample_sd

p_value = 1 - norm.cdf(z_score)

if p_value < alpha:
    print("Reject H0")
else:
    print("Accept H0")
```

# Left-Tailed Z-Test
The left-tailed Z-Test is done when there is a need to prove that the claim is lesser than the given mean.

### Implementation
1. Setup H0 and H1.
2. Calculate the population men, sample mean, population standard deviation, and sample standard deviation.
    ```Python
    population_mean = <given>
    population_sd = <given>
    sample_mean = <given>
    sample_sd = population_sd/ sqrt(n)
    ```
3. Find the Z-Score,
    ```Python
    z_score = (sample_mean - population_mean)/ sample_sd
    ```
4. Find the p-value,
    ```Python
    p_value = norm.cdf(z_score)
    ```
5. Determine whether to accept or reject the H0.
    ```Python
    if p_value < alpha:
        print("Reject H0")
    else:
        print("Accept H0")
    ```

### Overall code
```Python
from scipy.stats import norm

# H0: the null hypothesis
# HA: the alternative hypothesis

population_mean = <given>
population_sd = <given>
sample_mean = <given>
sample_sd = population_sd/ sqrt(n)

z_score = (sample_mean - population_mean)/ sample_sd

p_value = norm.cdf(z_score)

if p_value < alpha:
    print("Reject H0")
else:
    print("Accept H0")
```

# Z-Test Examples

### Example 1
Consider a toy use case, where a retailer has 2000 outlets across India. 

The sales for shampoo bottles in one of the stores looks as follows,
- Average sales per week, $\mu_{\text{weekly}}$ = 1800.
- Standard deviation, $\sigma$ = 100.

The above data is retrieved from historical context.

As an owner of this store with all the above information, the intent would be to increase the sales figures. Hiring a marketing team would be the first thing a manager or a sales person would do. Now this marketing team will have a record of their historical performance.

Say that the marketing team are charging a good amount of money because they claim that they are pretty good at what they do. With this information, would it be a good idea to deploy them across all the 2000 stores? Definitely not! Why? Because an advertisment run in Delhi will have little to no impact on the sales in Hyderabad. So what should be done?

Obviously, it is proposed that the marketing firm should show the data that their approach works, by kick starting their marketing campaign across 50 locations and show the results that their marketing campaign is creating a positive impact on the sales.

Consider that, an average of 1850 bottles were sold in the 50 stores. Can it be concluded that they did a good job? Actually, it is hard to say. The reasons being, the sample space is less and the time duration for which the campaign was run for is also too less.

Now consider that there is another marketing team, that has sold an average of 1900 bottles across 5 stores only. This team ran their campaign well but only in 5 stores. Can it be concluded that this team did a good job?

NOTE: On Amazon, if a seller has 5 rating from 5 customers, and another seller has 4.5 ratings from 5000 customers, the latter would be preferred over the former.

Is the improvement just a chance, or is it statistically significant?

A lot of Statisticians keep asking whatever improvement that is seen is just due to an underlying randomness contributing to the improvement or is it statistically significant? In this context, did the marketing team actually create an impact on the sales?

Whenever the words "statistically significant" are used, the words "significance level" are also used.

To be very very sure, a very small value of significance level ($\alpha$) is chosen in order to reject the H0 and accept H1.

Applying the hypothesis testing framework on both teams to arrive at a conclusion on which marketing team to hire.

Analysis on team 1,
1. Claim: They will improve the shampoo sales.
    - H0: The sales will not be impacted, the average sales will continue to be the same, i.e., $\mu$ = 1800.
    - H1: The sales will improve, i.e., $\mu$ > 1800.
2. Distribution: Gaussian.
    - Sample mean, $m = \frac{x_1 + x_2 + ... + x_i}{50}$. Where, $x_i$ = Sales of the i-th store.
    - Standard deviation, $\sigma = \frac{100}{\sqrt{50}}$.
    - Expected value of m, $E(m)$ = 1800.
    - Observed value of m, $O(m)$ = 1850.
3. Based on the data, team 1 is claiming they sold an average of 1850 bottles of shampoo. 1850 is to the right of 1800. Therefore, a right-tailed Z-Test needs to be conducted.
4. To calculate p-value, i.e., P(m > 1850 | H0 is true),

In [1]:
import numpy as np
from scipy.stats import norm

z_score = 1850 - 1800/ (100/ np.sqrt(50))
1 - norm.cdf(z_score)

np.float64(0.0)

5. Compare the p-value with the given significance level ($\alpha$ = 0.01). Since p-value is lesser than $\alpha$, the H0 can be rejected and H1 can be accepted.
6. Conclusion: Marketing team has had an effect on the sales.

Analysis on team 2,
1. Claim: They will improve the shampoo sales.
    - H0: The sales will not be impacted, the average sales will continue to be the same, i.e., $\mu$ = 1800.
    - H1: The sales will improve, i.e., $\mu$ > 1800.
2. Distribution: Gaussian.
    - Sample mean, $m = \frac{x_1 + x_2 + x_3 + x_4 + x_5}{5}$. Where, $x_i$ = Sales of the i-th store.
    - Standard deviation, $\sigma = \frac{100}{\sqrt{50}}$.
    - Expected value of m, $E(m)$ = 1800.
    - Observed value of m, $O(m)$ = 1900.
3. Based on the data, team 1 is claiming they sold an average of 1900 bottles of shampoo. 1900 is to the right of 1800. Therefore, a right-tailed Z-Test needs to be conducted.
4. To calculate p-value, i.e., P(m > 1900 | H0 is true),

In [2]:
import numpy as np
from scipy.stats import norm

z_score = 1900 - 1800/ (100/ np.sqrt(5))
1 - norm.cdf(z_score)

np.float64(0.0)

5. Compare the p-value with the given significance level ($\alpha$ = 0.01). Since p-value is lesser than $\alpha$, the H0 can be rejected and H1 can be accepted.
6. Conclusion: Marketing team has had an effect on the sales.

### Example 2