##**t tests and z tests**

We have just learned two methods to compare a sample mean with some null hypothesis mean. Let $\bar{x}$ be the sample mean, $\mu_0$ be the null hypothesis mean, $s$ the sample standard deviation, and $\sigma$ the true standard deviation of the data $x$, assuming it is known.

<br>

In a z test, we calculate the z score $z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}$. Then we plug that into the standard normal cdf to get our p value. If $z > 0$, we define the p value as

```1 - scipy.stats.norm.cdf(z)```

and if $z \leq 0$, we define the p value as

```scipy.stats.norm.cdf(z)```.

Note, z tests require that we know $\sigma$ to calculate the $z$ score. However, if $n$ is large we can safely assume that $s$ is close to $\sigma$ and use $s$ in the formula for the $z$ score instead of $\sigma$.

<br>

In a t test, we calculate the t score (or t statistic) as $t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$. As you can see, the only difference between the t score and the z score is the use of the sample standard deviation $s$ instead of the true standard deviation $\sigma$. Once we have the t statistic, we calculate the p value using the student t distribution. If $t > 0$, we define the p value as

```1 - scipy.stats.t(df = n-1).cdf(t)```

and if $t \leq 0$, we define the p value as

```scipy.stats.t(df = n-1).cdf(t)```.

<br>

When to use which test:

* If you know the true standard deviation $\sigma$ and $n$ is large (use feel, or use the arbitrary cutoff $n$ > 30), then do a z test.

* If you do not know $\sigma$ but you still think $n$ is large enough, you can probably get away with doing a z test where you use $s$ in the formula for $z$ score instead of $\sigma$.

* If $n$ is small (use feel), then do a t test.

* In general, you should probably just do a t test because scipy has functions that can do t tests no matter how big $n$ is, so anytime you do not know $\sigma$, it is safe to do a t test in python with scipy. The relevant functions (you may need to google the documentation) are scipy.stats.ttest_1samp and scipy.stats.ttest_ind.

In [None]:
import numpy as np
from scipy.stats import norm, gamma, beta, poisson, binom, t
from matplotlib import pyplot as plt

In [None]:
#figure from slides
x = np.linspace(-3,3,100)
plt.plot(norm.pdf(x), label = 'standard normal')
plt.plot(t(df = 5).pdf(x), label = 'student t, df = 5')
plt.plot(t(df = 30).pdf(x), '--', label = 'student t, df = 30')
plt.legend()
plt.show()

## **Problem 1**

Remember in the other notebook we looked at Aaron Judge's wOBA so far in 2025. Judge currently has a wOBA of 0.521 in $n = 191$ plate appearances. The population mean wOBA is $\mu_0 = 0.318$, and the true standard deviation is $\sigma = 0.513$. Judge's sample standard deviation is $s = 0.626$. Calculate a z test (or just copy it from the other notebook) and a t test comparing Aaron Judge's wOBA with the league average wOBA

In [None]:
#mu and sigma
mu = 0.318
sigma = 0.513

#xbar and s and n
xbar = 0.521
s = 0.626
n = 191

#calculate z score and p value in z test
z_score = (xbar - mu) / (sigma / np.sqrt(n))
p_value_z = 1-norm.cdf(z_score) if z_score > 0 else norm.cdf(z_score)
print(p_value_z)

#calculate t score and p value in t test. . .


Perhaps a dumb question, but given the p values in the previous cell, is Aaron Judge significantly better than an average MLB hitter?

## **Problem 2**

More baseball!! Recently, Will Melville was watching a Rangers game. The announcer was talking about one of the Rangers player's performance against the opposing starting pitcher. I can't remember who it was, so let's assume the hitter was Marcus Semien and the pitcher was JP Sears. Semien and Sears had faced off against each other a number of times and Semien had performed well in those matchups. Suppose that they faced each other a total of $n = 25$ times, and Semien got 10 hits, so his batting average in those 25 at bats was $\bar{x} = 0.400$, and the standard deviation was $s = 0.49$. In his career, Semien has a batting average against left handed pitchers like Sears of $\mu_0 = 0.268$ and a true standard deviation of $\sigma = 0.44$.

<br>

During the broadcast, the announcer said, "I was asking Boch [Rangers manager] today about Semien's great performance against Sears and Boch said, 'By the time you get to about 20 at bats between a batter and a pitcher, you start to conclude that a guy really knows how to hit the pitcher...'"

<br>

Will Melville rolled his eyes and muted the broadcast. Why? Calculate a p value in a z test and a t test to see if Semien's batting average against Sears is significantly better than his batting average against all left handed pitchers. Is Will right to roll his eyes, or does Boch have the right idea about 20 at bats being enough to determine that a hitter knows how to beat a pitcher?

In [None]:
#mu and sigma
mu = 0.268
sigma = 0.44

#xbar and s and n
xbar = 0.400
s = 0.49
n = 25

#calculate z score and p value in z test . . .


#calculate the t score and p value in a t test. . .


What does the standard error tell us about our confidence in Semien's performance in 25 at bats against Sears?

### **Problem 3**



In class think of another one to test, or else I'll do another baseball example (Joc Pederson cold streak to start the year)



## **Bonus**

If there is time in class or if anyone is interested, talk about how you can use standard error as a quantification of uncertainty to do *regression to the mean* in a projection model. Credit to Tom Tango.

## **Final Note**

We've been doing t and z tests the hard way because I want you all to understand the math behind the magic. But in practice when you want to do a t test, you should just use scipy's implementation. The idea of p values and statistical significance also applies to the other statistical tests we've learned about in class, namely KS tests. The only thing that changes is the assumption in the null hypothesis, so you just need to be aware of that/google that before you use the test.

In [None]:
from scipy.stats import ttest_1samp, ttest_ind

#google the docs to see how to use these, or look in the github in notebooks/07-stat-significance