# squat deadlift regression
The below notebook will show the relationship between back squat and deadlift for CrossFit athletes based on the information provided by their athlete profiles. The division under examination is Women (45-49), seeing as this is the largest COMPLETE dataset at the time of writing this code.

## imports

In [18]:
#pandas (dataframes)
import pandas as pd
#pymysql (mysql connector)
import pymysql as pms
#numpy (reshaping data)
import numpy as np
#scikit-learn (linear regression, train/test splits)
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
#plotting


# use for implementing custom modules
import sys
sys.path.insert(0, "../custom_modules/")

#custom modules
from dataframer import grab_data

## getting database credentials

In [2]:
#this list object is passed around to functions requiring database access
#...the functions will open/close the connection themselves during calls
creds = []
#database credentials are stored in this file
with open("../database_credentials2.txt") as f:
    db_user = f.readline().strip()
    db_pass = f.readline().strip()
    db_name = f.readline().strip()
    db_host = f.readline().strip()
    creds = [db_host, db_user, db_pass,db_name]

## grabbing the data

In [3]:
#read into dataframe
#note: division 4 references women 45-49
squat_deadlift = grab_data(
    """
    SELECT back_squat_lbs AS squat, deadlift_lbs AS deadlift
    FROM athlete
    WHERE division_id=4
    AND back_squat_lbs > 0
    AND deadlift_lbs > 0;
    """,
    creds
)

### quick summary

In [4]:
#first 5 entries
squat_deadlift.head(5)

Unnamed: 0,squat,deadlift
0,210,305
1,215,300
2,255,340
3,202,275
4,215,305


In [5]:
#pandas inherit statistical descriptor function for pandas dataframes
squat_deadlift.describe()

Unnamed: 0,squat,deadlift
count,1075.0,1075.0
mean,198.645581,251.71814
std,38.392562,43.168152
min,70.0,110.0
25%,175.0,220.0
50%,198.0,245.0
75%,225.0,285.0
max,396.0,385.0


## splitting data into train/test sets

In [10]:
#ratio of data to be used for testing
test_ratio = .8
#random state used for ML initialization
random_state = 0

In [11]:
#doesn't really matter which variable is used here as (in)dependent
x_train, x_test, y_train, y_test = train_test_split(
    #x
    squat_deadlift.squat,
    #y
    squat_deadlift.deadlift,
    test_size=test_ratio,
    random_state=random_state
)

## linear regression
Below we apply scikit-learn's [linear regression](http://scikit-learn.org/stable/auto_examples/plot_isotonic_regression.html#sphx-glr-auto-examples-plot-isotonic-regression-py) to the squat/deadlift data, using squat weight (in lbs) to predict deadlift weight (in lbs). Note that these values are 1 rep maxes (the maximum amount of weight an athlete has attested to lifting for 1 rep).

In [19]:
regressor = LinearRegression()
regressor.fit(x_train[:, np.newaxis], y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)