# Single-sample t-tests

Single-sample t-tests are used when you want to see if a list of values differs from an expected mean.

Example: Do the favorability ratings for a political candidate differ from a neutral score?

In [4]:
from scipy.stats import ttest_1samp
import numpy as np

In [5]:
# Let's say 5 is very favorable, and 1 is very unfavorable,
# and 3 is neutral
ratings = [2,3,4,2,3,5,5,1,3,4]

np.mean(ratings)

3.2

The average is higher than 3.0, but is it a statisically significant difference?

In [6]:
ttest_1samp(ratings, 3.0)

Ttest_1sampResult(statistic=0.48038446141526187, pvalue=0.642415137720789)

In statistics, the t-statistic is the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error. (https://en.wikipedia.org/wiki/T-statistic)

The p-value is the probability you'd obtain a similar result through chance alone. **If p-value is low, you have a statistically significant result.**

In practice, p-value is the main number I look at. For casual use, I'd say this can be as high as 0.10, but for serious decision-making, you might want it to be 0.01 or even lower.

In [7]:
# What if we get 10 more 4's? Is it significant then?
ratings += [4] * 10

print(ratings)
ttest_1samp(ratings, 3)

[2, 3, 4, 2, 3, 5, 5, 1, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]


Ttest_1sampResult(statistic=2.697516588397716, pvalue=0.014265492922827512)

You might use this for something like [NPS scores](https://www.medallia.com/net-promoter-score/).

# Related samples t-tests 

Related samples t-tests are a before-and-after test using the same people. The order of your list matters here.

Example: Did people lose weight while on this drug? (Note that you'd still want a control group to compare against.)

In [11]:
from scipy.stats import ttest_rel

In [12]:
# You can imagine these are rat weights or something
# I guess they're pretty fat rats
before  = [2,3,4,5,3,2,5,6,7,8,9,3,4] 
after   = [1,2,3,4,5,6,7,2,4,3,6,7,4]

# Order doesn't matter for mean, but it will when we do our t-test
np.mean(before), np.mean(after)

(4.6923076923076925, 4.153846153846154)

They weigh less *on average*, but is it sigificant?

In [13]:
ttest_rel(before, after)
# No meaningful difference

Ttest_relResult(statistic=0.6751399510385769, pvalue=0.5123861808484163)

In [14]:
# New comparison
before = [2,3,4,5,3,2,9,6,7,8,9,9,9] 
after = [1,2,3,4,5,6,7,2,4,3,6,7,4]

np.mean(before), np.mean(after)

(5.846153846153846, 4.153846153846154)

In [15]:
ttest_rel(before, after)
# Very likely a meaningful difference now

Ttest_relResult(statistic=2.381569860407206, pvalue=0.03466356528044936)

This test is used a lot in science, but I'm having trouble thinking of business applications.

# Independent sample t-tests

Independent sample t-tests are used when you're comparing the scores of two different things.

Example: Does this phone A *really* have a higher rating than phone B, or is our sample size too small?

In [18]:
from scipy.stats import ttest_ind

This test doesn't require the arrays to be the same size. This is one reason why it's so much more powerful than just comparing the averages; the output will indirectly tell you whether your sample size is too small.

In [19]:
a = [2,3,5,2,3,4,2,2,4,2,1,3,4,5,3,3,4]
b = [2,3,3,3,4,4,4,4,3,3,3,4,4,4]

np.mean(a), np.mean(b)

(3.0588235294117645, 3.4285714285714284)

In [20]:
# Not significant
ttest_ind(a, b)

Ttest_indResult(statistic=-1.074383566184422, pvalue=0.29150920254640794)

# Chi-square / A-B Tests

I prefer to juse use an online calculator for this. Here's my favorite:

https://abtestguide.com/calc/

When should you use a one vs. two-tailed test?

https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-the-differences-between-one-tailed-and-two-tailed-tests/