# Tutorial 4 - Data Simulation, *t*-Test Power Analysis and Introduction to Regression


*Written and revised by Jozsef Arato, Mengfan Zhang, Dominik Pegler*  
Computational Cognition Course, University of Vienna  
https://github.com/univiemops/tewa1-computational-cognition

---

## 1. Import libraries

In [None]:
import numpy as np
from matplotlib import pyplot as plt
from scipy import linalg, stats

## 2. Data simulation for a *t*-test

in the first task, we simulate a scenario when comparing the effect of new drug on reaction time performance to a placebo.
We will also assume that the drug is  better than the placebo, so reaction time will be sampled from the population mean of 500 ms, and SD of 100 for the target group, and 520 for the control group

first task is to simulate 2 groups of 20 participants, based on the above information (we are assuming normal distribution here)

---



In [None]:
data1 = np.random.normal(500, 100, 20)
data2 = np.random.normal(520, 100, 20)
print(stats.ttest_ind(data1, data2))
print(stats.ttest_ind(data1, data2)[1])

In [None]:
plt.hist(data1, alpha=0.5, label="group 1")
plt.hist(data2, alpha=0.5, label="group 2")
plt.xlabel("reaction time")
plt.ylabel("number of participants")
plt.legend()

once this is done, we can compare the two groups with the independent samples t-test

based on the above, repeat the process 1000 times,
and store the p-value of each iteration's t-test in a numpy array


In [None]:
nsim = 1000
pvalues = np.zeros(nsim)
for i in range(nsim):
    data1 = np.random.normal(500, 100, 100)
    data2 = np.random.normal(520, 100, 100)
    pvalues[i] = stats.ttest_ind(data1, data2)[1]
# if i<5:
#  print(i, pvalues[0:10])

1. visualize the obtained p values with a histrogam
2. calculate how many times you obtained significant difference (p value below 0.05)
3. add vertical line, for the significance threshold on the histrogram

In [None]:
plt.hist(pvalues, bins=20)

print("significance, p<  .05: ", np.sum(pvalues < 0.05))
print("power of design ", np.sum(pvalues < 0.05) / nsim)

plt.plot([0.05, 0.05], [0, nsim / 4], color="k")  # vertical line

write a function SimulateT, using the code above, that takes 4 input variables:
1. mean of group 1,
2. mean of group 2
3. the SD for both groups
4. and the number of particpants, (equal for both groups)

the function should simulate data 1000 times, performs the above analysis and returns only the proportion of significant tests


In [None]:
def simulate_t(mean1, mean2, sd, npart):
    ns = 1000
    pvalues = np.zeros(nsim)
    for i in range(nsim):
        data1 = np.random.normal(mean1, sd, npart)
        data2 = np.random.normal(mean2, sd, npart)
        pvalues[i] = stats.ttest_ind(data1, data2)[1]
    return np.sum(pvalues < 0.05) / ns

In [None]:
simulate_t(500, 520, 100, 20)

### 2.1. Systematic simulation
once this is done, we will keep the mean fixed at 500 & 520, but systematically change the SD in 10 steps: from 20 to 200 ms

1. using your function above, calculate the proportion of significant tests for all of these combinations.
2.  store the results it in a  numpy array
3. visualize the result with  plt.plot
4. make the figure nice (ticks, labels, fontsize)

In [None]:
n_steps = 10
s_ds = np.linspace(20, 200, n_steps)

print(s_ds, len(s_ds))
powers = np.zeros(n_steps)
for i in range(n_steps):
    powers[i] = simulate_t(500, 520, s_ds[i], 20)

In [None]:
powers = np.zeros(n_steps)
for i, sd in enumerate(s_ds):
    powers[i] = simulate_t(500, 520, sd, 20)

In [None]:
plt.plot(s_ds, powers)
plt.xlabel("Standard deviation (ms)", fontsize=15)
plt.ylabel("Power", fontsize=15)


using the same function, change systematically the  number of participants in 8 steps: from 8 to 64 participants
store the results in a numpy array





In [None]:
n_steps = 8
numparts = np.linspace(8, 64, n_steps)
powers = np.zeros(n_steps)
for i in range(n_steps):
    powers[i] = simulate_t(500, 520, 50, int(numparts[i]))
# your code
# your code
# your code

In [None]:
np.intp(np.linspace(8, 64, 8))

In [None]:
?np.random.normal

In [None]:
plt.plot(numparts, powers)
plt.xlabel("Num of participants", fontsize=15)
plt.ylabel("Power", fontsize=15)

lecture continues here!

## 3. Demo of three options for counting and accesssing elements of list/array with nested `for` cycles ...

... and using a 2D numpy array to store calculation results with indexing

In [None]:
list1 = ["dog", "cat", "mouse"]
list2 = ["vienna", "graz"]

### 3.1. Option 1: Iterate over list + counters

In [None]:
num_combined_char = np.zeros((len(list1), len(list2)))
c1 = 0
for l1 in list1:
    c2 = 0
    for l2 in list2:
        print(c1, c2, l1, l2)
        num_combined_char[c1, c2] = len(l1) + len(
            l2
        )  # calcualte and store combined word length
        c2 += 1
    c1 += 1
print(num_combined_char)

### 3.2. Option 2: `range()` + indexing



In [None]:
num_combined_char = np.zeros((len(list1), len(list2)))
for c1 in range(len(list1)):
    for c2 in range(len(list2)):
        print(c1, c2, list1[c1], list1[c2])
        num_combined_char[c1, c2] = len(list1[c1]) + len(
            list2[c2]
        )  # calcualte and store combined word length
print(num_combined_char)

### 3.3. Option 3: `enumerate`


In [None]:
num_combined_char = np.zeros((len(list1), len(list2)))

for c1, l1 in enumerate(list1):
    for c2, l2 in enumerate(list2):
        print(c1, c2, l1, l2)
        num_combined_char[c1, c2] = len(l1) + len(
            l2
        )  # calcualte and store combined word length
print(num_combined_char)

## 4. Simulating data with a linear regression model

growth of infant .5 cm/month  (B1)

starting heigth 50 cm  (B0)

error 8 cm  

n=40

simulate for ages 0 to 36 months

write the linear equation with normal error, to simualte data for your first simulation with a regression model for Y

In [None]:
n = 40
b0 = 50
b1 = 0.5
x = np.linspace(0, 36, n)
y = b0 + b1 * x + np.random.normal(0, 8, n)

In [None]:
plt.scatter(x, y)
plt.xlabel("Age (months)", fontsize=14)
plt.ylabel("Heigth (cm)")

we will use the least squares method, to fit a linear regression


to fit the intercept, we will need a column of ones, that is added to the predictors.

predictor matrix-- design matrix

In [None]:
from scipy import linalg

In [None]:
print(x)
xx = np.column_stack((np.ones(n), x))
print(xx)
pars = linalg.lstsq(xx, y)[0]

In [None]:
pars

In [None]:
np.sum((x - np.mean(x)) * (y - np.mean(y))) / np.sum((x - np.mean(x)) ** 2)

plot the obtained regression line, together with the data

In [None]:
plt.scatter(x, y)
plt.plot(x, b0 + x * b1, color="k", alpha=0.2, label="data generating model")
plt.plot(x, pars[0] + x * pars[1], color="k", label="fitted model prediction")
plt.legend()
plt.xlabel("Age (months)", fontsize=14)
plt.ylabel("Heigth (cm)")

## 5. Polynomial regression

regression prediction as matrix  multiplication:

linear algebra series by 3blue1brown: https://www.youtube.com/watch?v=kjBOesZCoqc&list=PL0-GT3co4r2y2YErbmuJw2L5tW4Ew2O5B

matrix mulitiplication of design matrix with predictor weights results in predicted Y values.

In [None]:
xx = np.column_stack((np.ones(n), x, x**2))
pars = linalg.lstsq(xx, y)[0]
print(pars)

In [None]:
plt.scatter(x, y)
plt.plot(x, pars[0] + pars[1] * x + pars[2] * x**2, color="k")

In [None]:
plt.scatter(x, y)
plt.plot(x, np.matmul(xx, pars))

In [None]:
xx = np.column_stack((np.ones(n), x, x**2, x**3))
pars = linalg.lstsq(xx, y)[0]
plt.scatter(x, y)
plt.plot(x, np.matmul(xx, pars), linestyle="-", marker="o")
# plt.plot(x,np.matmul(xx,pars))

## Homework 1

simulating t-test for combinations of group size and standard deviation:
use all combinations of SD-s and groups sizes above, to simulate 80 scenarios, and store the significant t-test in a 2d numpy array

use the pcolor function of pyplot, to visualize the result
 adding x and y labels (for the parameters)


what do you observe and why? Write a few sentences

## Homework 2
simulation with no mean difference, make a similar systematic simulation,
but with no mean difference (eg: ineffective drug) and calculate the propotion of significant test as you maniupate the SD and the sample size.
what do you observe and why could that be the case?



once you are ready with the figure, compare it to the previous figure, from the the true difference simulation..

what do you observe? Why?

write a short answer (max 5 sentences), and submit to the "texteingabe" in moodle

now you do not need to submit the code, only the figure you have created!
so save the figure and upload it to the moodle homework submission form


