# Dataset description

Data was collected from gym teachers of high school boys.
For each boy, we know the height (in cm) and their 100m sprint time (in s).
Additionally, they were all asked if they liked playing basketball, dodgeball and soccer in the class.

# Analysis

We are interested in the effect of height on sprint times

In [2]:
import pandas as pd
df = pd.read_csv('gym_class.csv')

Set some basic semi-informative priors and run the linear model

In [None]:
import pymc as pm
import pytensor.tensor as pt

with pm.Model() as model:

    icept = pm.Normal('icept',10,10)
    hcoef = pm.Normal('hcoef',0,0.2)
    gcoef = pm.Normal('gcoef',0,1,size=3)

    res = icept + hcoef*df['height'] + pt.dot(gcoef,df[['basketball','dodgeball','soccer']].to_numpy().T)

    err = pm.HalfNormal('err')
    pm.Normal('obs',res,err,observed=df['sprint'])

    trace = pm.sample()

In [None]:
# Plot the distributions 
import arviz as az, matplotlib.pyplot as plt
az.plot_posterior(trace)
plt.show()

In [None]:
# Write the numerical summary table
pm.summary(trace)