# APEX STATS Dataset
Prepared by Michelle Baca Reinke

## Source Attribution

Author: [Applied Programming Experiences (APEX)](https://www.google.com/url?q=https%3A%2F%2Fhttps%3A%2F%2Fwww.sjsu.edu%2Fapex%2F)

Title: Class Survey

Source: [APEX Stats Datasets](https://github.com/vectrlab/apex-stats-datasets/)

License: CC0: Public Domain

Changes: Invalid responses were removed; range entries for continuous variables were replaced with middle value; range values for discrete varaibles were replaced with upper limit value


## Description of the Original Data


The dataset contains responses from a survey administered to students in introductory statistics courses at several campuses across the state participating in the [APEX program](https://https://www.sjsu.edu/apex/). The survey gathers information on various aspects including personal habits, academic behaviors, and social interactions. The primary purpose of collecting this data was to create a large, entertaining practice dataset for analyzing patterns and relationships among different variables. Participation was voluntary, and students had the option to select "prefer not to answer" or provide alternative responses as needed.  
<br>

This dataset is intended solely for educational purposes and practice in data analysis. It does not represent real research findings or scientific study. The data were collected from students for illustrative use only, and should not be interpreted as reflecting actual research outcomes or conclusions.

<br>
The student survey questions, raw data, and a cleaned dataset with variable names can be found [here](https://github.com/vectrlab/apex-stats-datasets/tree/main/class_survey).

## Description of the Example

Access this example using the file `example.csv`.  


<br>

The following variables are included:   

y: Participant ID    
x: Preference for dogs or cats   
x1: Number of pets owned   
x2: Favorite type of music   
x3: Restaurant choice for dinner with a friend     
x4: Listen to music while studying or doing homework    
x5: Average daily hours studying or doing homework     
x6: Prefer study with others or alone   
x7: Drink coffee, tea, or other caffeinated drinks      
x8: Drink beer, wine, or other alcoholic drinks  
x9: Engage in vigorous physical activity   
x10: Average daily minutes vigorous physical activity   
x11: Night owl, early bird, other (fill in)    
x12: Average nightly hours sleep    
x13: Average daily hours watching TV or streaming shows     
x14: Frequency of media multitasking (1 to 5)    
x15: Level of orginization (1 to 5)     
x16: Level of procrastination (1 to 5)    
x17: Personality type (extrovert, introvert, ambivert)   
x18: Number of close friends   
x19: Most used social media app   
x20: Average daily minutes using social media  
x21: Number of social media friends/followers  
x22: Academic discipline  
x23: Age in years  
x24: Gender identity  
x25: Political affliliation    
x26: Knowledge of statistics (1 to 5)   
x27: Knowledge of computer programming (1 to 5)



## Discipline(s) Represented

* Sociology
* Psychology
* Education
* Data Science

## Dataset Preview

In [1]:
#@title Setup Example Data: Class Survey

# Import library
import pandas as pd

# Read data file: Academic Success
data = pd.read_csv('https://raw.githubusercontent.com/vectrlab/apex-stats-datasets/refs/heads/main/class_survey/example.csv')

# Preview data
data.head()

Unnamed: 0,y,x,x1,x2,x3,x4,x5,x6,x7,x8,...,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27
0,1,Dog,0,Pop,Sushi,No,2.0,Alone,"Yes, regularly","Yes, infrequently",...,3,Not applicable,0.0,0.0,Health sciences,37,Man,Democratic,2,1
1,2,Cat,2,Pop,Soul food,Yes,1.0,With others,"Yes, infrequently","Yes, regularly",...,1,Instagram,90.0,900.0,Health sciences,28,Woman,Democratic,3,1
2,3,Dog,0,Classical,Vietnamese,No,4.0,With others,No,No,...,1,Instagram,120.0,840.0,Sciences,20,Man,Democratic,2,3
3,4,Cat,0,Electronic,Mexican,Yes,3.0,Alone,No,No,...,2,Not applicable,0.0,0.0,Engineering,44,Woman,Democratic,2,5
4,5,Dog,1,Other,Sushi,Yes,4.0,Alone,"Yes, regularly","Yes, infrequently",...,5,Facebook,60.0,1800.0,Social sciences,49,Woman,Independent,3,3


## Potential Activity Starters
* Is there a relationship between caffeine intake and sleep hours? Does it differ depending on whether a person is a night owl or early bird?
* What factors are associated with a higher number of close friends?
* How does the academic discipline influence whether a person prefers to study alone or with others?

## Exploratory Analyses (untested)

This is a dataset with different types of variables, which opens up the possibility for a wide range of statistical analyses:


1. **Descriptive Statistics**:
* Calculate the mean, median, standard deviation, and range for:
   * x5: Average daily hours studying or doing homework
   * x10: Average daily minutes vigorous physical activity
   * x12: Average nightly hours sleep
   * x13: Average daily hours watching TV or streaming shows
   * x20: Average daily minutes using social media
   * x21: Number of social media friends/followers

2. **Distributions**:
* Examine the distribution of categorical variables using frequency tables and bar charts for:
    * x: Preference for dogs or cats
    * x1: Number of pets owned
    * x2: Favorite type of music
    * x3: Restaurant choice for dinner with a friend
    * x17: Personality type (extrovert, introvert, ambivert)
    * x19: Most used social media app
    * x22: Academic discipline

3. **Correlation Analysis**:     
* Run a Pearson's correlation to test the relationship between:
    * x5: Average daily hours studying or doing homework and x12: Average nightly hours sleep
    * x10: Average daily minutes vigorous physical activity and x12: Average nightly hours sleep
    * x20: Average daily minutes using social media and x21: Number of social media friends/followers

4. **Independent Samples *t*-Test**:
* Perform an independent samples t-test to examine differences between:
    * x17: Personality type (e.g., extrovert vs introvert) on x13: Average daily hours watching TV or streaming shows
    * x11: Night owl vs early bird on x12: Average nightly hours sleep

5. **ANOVA**:
* Conduct a one-way ANOVA to test for differences in:
    * x22: Academic discipline on x5: Average daily hours studying or doing homework
    * x19: Most used social media app on x20: Average daily minutes using social media