# APEX STATS Dataset
Prepared by Michelle Baca Reinke

## Source Attribution

Author: Data Copyright &copy; Cortez et. al. (2008), All Rights Reserved

Title: Student Alcohol Consumption

Source: [Kaggle](https://www.kaggle.com/datasets/uciml/student-alcohol-consumption)

License: CC0: Public Domain

Changes: Data have been adapted for APEX STATS by Michelle Baca Reinke; data has been condensed into a subset of specific variables in the example version.


## Description of the Original Data


The data were obtained in a survey of students math and Portuguese language courses in secondary school. It contains interesting social, gender, and study information about students.   

The original data contain two sets, one full set of the same variables from a math course and another from a language course.

The full dataset can be downloaded [here](https://www.kaggle.com/datasets/uciml/student-alcohol-consumption). The resulting download file will contain both datasets and an R file. Note that there are several (n = 382) students that belong to both datasets. These can be identified by searching for identical attributes that characterize each student, as shown in the annexed R file.

Citation:    

P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7.

## Description of the Example

Access this example using the file `example.csv`.

The example file is a subset of only the math class dataset from the original data.

<br>

The following variables are included:


y: Final grade (numeric: see note below)       
y1: First semster grade (numeric: see note below)    
y3: Second semster grade (numeric: see note below)  
x: Weekday alcohol consumption (numeric: 1 very low to 5 very high)  
x1: Weekend alcohol consumption (numeric: 1 very low to 5 very high)   
x2: Absences (numeric: 0 to 93)        
x3: Sex (dichotomous: 'F' female or 'M' male)    
x4: Age at enrollment (numeric: 15 to 22)     
x5: Resdential area type (dichotomous: 'U' urban or 'R' rural)       
x6: Family size (dichotomous: 'LE3' less or equal to 3 or 'GT3' greater than 3)       
x7: Parental co-habitation (dichotomous: 'T' together or 'A' apart)   
x8: Mother's education (numeric: '0' none, '1' primary education up to 4th grade, '2' 5th to 9th grade, '3' secondary education, or '4' higher education)    
x9: Father's education (numeric: '0' none, '1' primary education up to 4th grade, '2' 5th to 9th grade, '3' secondary education, or '4' higher education)         
x10: Guardian (nominal: 'mother' / 'father' / 'other')    
x11: Commute to school (numeric: '1' less than 15 minutes, '2' 15 to 30 minutes, '3' 30 minutes to 1 hour, or '4' more than 1 hour)    
x12: Weekly study time (numeric: '1' less than 2 hours, '2' 2 to 5 hours, '3' 5 to 10 hours, or '4' more than 10 hours)       
x13: School educational support (dichotomous: yes or no)      
x14: Family educational support (dichotomous: yes or no)        
x15: Extracurricular activities (dichotomous: yes or no)    
x16: Home internet access (dichotomous: yes or no)   
x17: Family relationship quality (numeric: '1' very bad to '5' excellent)    
x18: Free time after school (numeric: '1' very low to '5' very high)    
x19: Social outings with friends (numeric: '1' very low to '5' very high)    
x20: Current health status (numeric: '1' very bad to '5' very good)


<br>

Notes: The grades reflect the Portuguese grading system, rated from 0 to 20. More details on the grading system along with the international equivalence can be found [here](https://www.upt.pt/en/home/internationals/portuguese-grading-system-2/#:~:text=Notes%3A%20A%20grade%20below%20to,are%20even%20more%20rarely%20used.).


## Discipline(s) Represented

- Education
- Sociology

## Dataset Preview

In [1]:
#@title Setup Example Data: Student Alcohol Consumption

# Import library
import pandas as pd

# Read data file: Student Alcohol Consumption
data = pd.read_csv('https://raw.githubusercontent.com/michellebacareinke/APEX/main/teen_alc/example.csv')

# Preview data
data.head()

Unnamed: 0,y,y1,y2,x,x1,x2,x3,x4,x5,x6,...,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20
0,6,5,6,1,1,6,F,18,U,GT3,...,2,2,yes,no,no,no,4,3,4,3
1,6,5,5,1,1,4,F,17,U,GT3,...,1,2,no,yes,no,yes,5,3,3,3
2,10,7,8,2,3,10,F,15,U,LE3,...,1,2,yes,no,no,yes,4,3,2,3
3,15,15,14,1,1,2,F,15,U,GT3,...,1,3,no,yes,yes,yes,3,2,2,5
4,10,6,10,1,2,4,F,16,U,GT3,...,1,2,no,yes,no,no,4,3,2,5


## Exploratory Analyses (untested)

This is a dataset with different types of variables, which opens up the possibility for a wide range of statistical analyses:


1. **Descriptive Statistics**:
* Calculate summary statistics (mean, median, and standard deviation) for the age variable (x4) to understand the age distribution at enrollment.
* Examine the distribution of 1st-semester, 2nd-semester, and final grades (y1, y2, y) using summary statistics and graphical representations (histograms, box plots).

2. **Gender Differences**:
* Perform a t-test to compare the mean age at enrollment (x4) between male and female students.
* Explore if there are significant differences in 1st-semester, 2nd-semester, and final grades (y1, y2, y) based on gender using t-tests.

3. **Correlation Analysis**:     
* Perform a Pearson's correlation between weekday or weekend alcohol consumption (x and x1) and academic performance (y1, y2, y)


4. **ANOVA**:
* Perform a one-way ANOVA to test if there is a significant difference in final grades (y) among different categories of guardians (x10)to determine a student's guardian status has a significant impact on final grades.
* Perform a two-way ANOVA to examine the combined effects of mother's and father's education levels (x8 and x9) on final grades (y) to assess if there are any interactions between the two factors in influencing final grades.




## Potential activity starters

What factors influence academic performance and lifestyle choices? The dataset includes student alcohol consumption, academic performance, personal factors, lifestyle and extracurricular factors, and personal well-being.

1. Is there a relationship between weekday and weekend alcohol consumption and academic performance? i.e. is there a link between alcohol consumption and academic achievement?

2. How do personal factors (absences, sex, age, residential area type, family size, parental co-habitation, and parental education) influence a students' academic performance? i.e. do demographic and family-related factors relate to academic success?

3. Are there effects of family relationship quality, free time after school, social outings with friends, and current health status on student performance?