# Jyo Sahoo

## Research question/interests

My area of concern is the correlation between familial and social relationships and the alcohol consumption of a student. Does the marriage status of the parents affect a student's relationship with their family and/or their significant other and/or their friends? If so then do students turn to alcohol to cope with such familial and/or social discomfort? These questions hold personal value to me.


In [None]:
#imports
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

In [None]:
data = pd.read_csv("../data/raw/student-merged.csv")
data.shape
print(f"number of columns in dataset : {len(data.columns)}")
print(f"number of rows in dataset : {len(data)}")
print(data.head()) #printing the first few rows to tell me what the dataset is about
print(data.isnull().sum()) #there are 0 null values!
data.describe().apply(lambda s: s.apply(lambda x: format(x, 'f')))





In [None]:
print(f"number of students with parents who are cohabiting = {len(data[data['Pstatus']=='T'])}") 
print(f"number of students with parents who are not cohabiting = {len(data[data['Pstatus']=='A'])}")
# I discovered that almost 90% of the students' parents live together.
print(data[data['Pstatus']=='T'].head())




In [None]:
box = plt.boxplot(data[data['Pstatus']=='A']['Walc.x'])
plt.title("Weekend Alcohol Consumption by Students with Parents who do not Cohabit")
plt.plot()



In [None]:
plt.boxplot(data[data['Pstatus']=='T']['Walc.x'])
plt.title("Weekend Alcohol Consumption by Students with Parents who do Cohabit")
plt.plot()

Comparing these two boxplots, I infer that the maximum alcohol consumption of students who's parents do not cohabit is more than those who's parents do cohabit. However 

In [None]:
data.head()
sns.countplot(data=data,x='Pstatus',hue="famrel.x")


The graph above makes it abundantly clar that parents' cohabitation status does not affect the relationship of the students with their parents.


### Task 2 : Analysis Pipeline

In [None]:
# data has been loaded already
# print(data.isnull().sum()) # there are zero null values
data_cleaned = data.copy().drop(['sex','school','address','Medu','Fedu','Mjob','Fjob','reason','guardian.x','guardian.y','traveltime.x','traveltime.y','schoolsup.x','schoolsup.y','paid.x','paid.y','famsup.x','famsup.y','nursery','health.x','health.y','higher.x','higher.y','Dalc.y','Walc.y','famrel.y','freetime.y','absences.y','studytime.y','romantic.y','goout.y','failures.y','activities.y','studytime.x'],axis=1)

# using only one column to store the average of all three math grades
data_cleaned['math_avg'] = (data_cleaned['G1.x']+data_cleaned['G2.x']+data_cleaned['G3.x'])/3 
data_cleaned = data_cleaned.drop(['G1.x','G2.x','G3.x'],axis=1)

# using only one column to store the average of all three math grades
data_cleaned['por_avg'] = (data_cleaned['G1.y']+data_cleaned['G2.y']+data_cleaned['G3.y'])/3
data_cleaned = data_cleaned.drop(['G1.y','G2.y','G3.y'],axis=1)

# renaming 'unnamed: 0' to student_id
data_cleaned = data_cleaned.rename(columns={'Unnamed: 0':'student_id', 'failures.x':'failures', 'activities.x':'activities', 'romantic.x':'romantic_relationship','Walc.x':'Weekend_alcohol','Dalc.x':'Weekday_alcohol', 'famrel.x':'family_relationship','freetime.x':'freetime','goout.x':'socialising','absences.x':'absence'})
print(data_cleaned.columns.values.tolist())







### Task 3 : Method Chaining

In [None]:
def load_and_process(url_or_path):
    df = (
        pd.read_csv(url_or_path)
        .rename(columns={'Unnamed: 0':'student_id', 'failures.x':'failures', 'activities.x':'activities', 'romantic.x':'romantic_relationship','Walc.x':'Weekend_alcohol','Dalc.x':'Weekday_alcohol', 'famrel.x':'family_relationship','freetime.x':'freetime','goout.x':'socialising','absences.x':'absence'})
    )
    df2 = (
        df.assign(math_avg = lambda x :df['G1.x']+df['G2.x']+df['G3.x'])
        .assign(por_avg = lambda x :df['G1.y']+df['G2.y']+df['G3.y'])
        .drop(['G1.x','G2.x','G3.x','G1.y','G2.y','G3.y'],axis=1)
    )

    return df2
    
