## 1. Provincial Differences in Big Five Inventory Personality Traits
### Research Question 
How do personality traits measured by the Big Five Inventory vary across different provinces in Canada?
### Variables
- Province (GEO_province)
- Big Five Inventory traits (PSYCH_big_five_inventory_*): 44 personality trait measurements, including:
    - Extraversion items (e.g., talks, energy, outgoing)
    - Agreeableness items (e.g., helpful, forgives, considerate)
    - Conscientiousness items (e.g., carefully, hard_worker, organized)
    - Neuroticism items (e.g., sad, tense, worries)
    - Openness items (e.g., original, imagination, artistic)
#### Plan on Exploration
- Data Quality Assessment
    - Check for missing values in both province and personality items
    - Use `df.dropna()` to remove rows with missing values
    - Verify each province has a sufficient number of observations for analysis
    - Verify each sample has one of the big five inventory traits
- Summary Statistics: 
    - Calculate means, standard deviations, and confidence intervals for each Big Five trait by province.
- Frequency Distributions: 
    - Plot histograms for each trait by province to identify any skewness or outliers.
- Visualization    
    1. Stacked Bar Charts
        - Show the average scores for each personality trait across provinces
        - Allow for easy comparison of trait distributions between provinces
    2. Radar Charts
        - Display personality profiles for each province
        - Facilitate comparison of overall personality patterns
    3. Heat Map
        - Create a color-coded matrix of provinces vs. personality traits
        - Highlight areas of high and low trait expressions across provinces
        - Useful for identifying clusters of similar provinces or traits
### Analysis Plan
- MANOVA (Multivariate Analysis of Variance)
1. Module setup
    - Independent Variable: Province
     - Dependent Variables:  the five Big Five traits (Extraversion, Agreeableness, Conscientiousness, Neuroticism, Openness) as the dependent variables.
- Assumptions Check:
    - Multivariate Normality: Check for normality of each trait within each province using Q-Q plots and Shapiro-Wilk tests.
    - Homogeneity of Covariance Matrices: Use Box’s M test to check if covariance matrices are equal across provinces, a requirement for MANOVA.
    - Independence: Ensure observations are independent of one another, which is a standard assumption in MANOVA.
2. Hypotheses:
    - Null Hypothesis (H0): There are no significant differences in the combination of personality trait scores across different provinces of Canada.

    - Alternative Hypothesis (H1): There are significant differences in the combination of personality trait scores across at least two provinces of Canada.
     - Interpretation:

        - If p-value < 0.05, reject null hypothesis and this suggests significant differences in personality traits across provinces.
        - If p-value ≥ 0.05, fail to reject null hypothesis, , no significant provincial differences are found.

3. Follow-up Analysis (if MANOVA is significant)

    1. Univariate ANOVAs
    
    - For each trait that shows significant multivariate results, perform individual one-way ANOVAs to identify which traits vary significantly by province.

        - Conduct separate ANOVAs for each personality trait
        - Apply Bonferroni correction for multiple comparisons
        - Calculate effect sizes (partial η²) for significant traits
    2. Pairwise Comparisons
    
    - Use Tukey’s HSD or Games-Howell (if variances are unequal) to pinpoint which provinces differ significantly on specific traits.
        - Tukey's HSD test for traits with significant ANOVA results
        - Games-Howell test if variances are unequal
        - Create comparison matrices showing significant provincial differences
    3. Effect Size Analysis
    
    -  Report partial η² for effect size in each ANOVA to indicate the magnitude of any significant differences.
        - Calculate Cohen's d for significant provincial differences
        - Compute confidence intervals for effect sizes
        - Create effect size plots for visualization

### Limitations:

- Provinces with smaller populations may have less reliable estimates due to smaller sample sizes
- The analysis doesn't account for within-province variations
- Multiple testing increases Type I error risk
- Unequal provincial sample sizes may affect results

## 2. Gender and Life Satisfaction
### Research Question
Is there a significant difference in life satisfaction between genders?
### Variables
- **Life Satisfaction Score (WELLNESS_life_satisfaction)**: Outcome variable.
- **Gender (DEMO_gender)**: Categorical predictor, indicating participants' gender.
#### Plan on Exploration
- Data Quality Assessment
    - Check for missing values in both gender and life satisfaction variables
    - Examine response patterns for potential bias
    - Verify adequate sample sizes across gender categories
    - Identify potential outliers
- Descriptive Statistics
    - Calculate means, medians, standard deviations, and confidence intervals by gender
        - Mean Life Satisfaction for each gender: I will use it to compare average scores, highlighting any central tendency differences.
    - Compute skewness and kurtosis for life satisfaction within each gender group
    - Create frequency tables showing distribution of responses
- Visualization
    - Histogram or Kernel Density Plot of life satisfaction by gender 
        - These plots will show the distribution of life satisfaction for each gender. The histogram is suitable for seeing frequency distributions, while a density plot shows a smoothed curve, making trends across gender groups easier to observe.
    - Box Plots
        - Displays median, quartiles, and potential outliers
        - Facilitates comparison of distributions between genders
        - Helps assess homogeneity of variance
### Analysis Plan
An independent samples t-test will be conducted to determine if the difference in average life satisfaction scores between genders is statistically significant.
- Calculate t-statistic and degrees of freedom
- Determine p-value
- Compute effect size 
- Calculate confidence intervals for mean difference

This test assumes:
   - The distribution of life satisfaction scores is approximately normal for each gender group.
   - The variances of life satisfaction scores across genders are similar (homogeneity of variance).


Null Hypothesis (H0): There is no difference in life satisfaction scores between genders.


Alternative Hypothesis (H1): There is a difference in life satisfaction scores between genders.


It is anticipated that there may be differences in life satisfaction scores between genders due to various societal or psychological factors. If the p-value from the t-test is less than 0.05, we would reject the null hypothesis, concluding that gender has a significant effect on life satisfaction scores.

## 3. Number of close friends and Mental Health Status
### Research Question
Is there a significant relationship between the number of close friends a person has and their self-reported mental health status?
### Variables
- **Number of close friends (CONNECTION_social_num_close_friends)**: Number of close friends, predictor variable
- **Mental health status (WELLNESS_self_rated_mental_health)**: MENTAL HEALTH at present time, outcome variable
#### Plan on Exploration
- Visualization: Bar Plot
    - This type of plot can show the proportion of each mental health category for different ranges of friend numbers, allowing us to visualize any trends.
    - The bar plot will show if there's a trend in mental health scores across different friend group categories
    - We expect to see higher bars (better mental health scores) for groups with more friends
- We'll also look at basic descriptive statistics for each friend group, including sample sizes, means, and standard deviations, to ensure we have sufficient data for meaningful analysis.
### Analysis Plan
- Hypothesis Test: One-Way ANOVA
    - This choice is driven by the nature of our variables and research question. Our independent variable - the number of close friends - has been categorized into four groups (0-2 friends, 3-5 friends, 6-10 friends, and 10+ friends), while our dependent variable is mental health scores on a scale from Poor to Excellent. While mental health scores are technically ordinal, we will treat them as continuous for this analysis. 
    - ANOVA is particularly suitable here because we're comparing means across multiple groups. Unlike a t-test, which can only compare two groups, ANOVA allows us to examine differences across all four friend groups simultaneously. This is more efficient and reduces the risk of Type I errors that would occur if we conducted multiple pairwise t-tests. The test will tell us whether there are statistically significant differences in mental health scores between any of our friend groups.
    
 - Null Hypothesis (H0): There is no significant difference in mean mental health scores across the different friend group categories.

- Alternative Hypothesis (H1): There is a significant difference in mean mental health scores across at least two of the friend group categories.

#### Interpretation:

- If p-value < 0.05, we reject the null hypothesis and conclude that there are significant differences in mental health scores across the friend group categories.
- If p-value ≥ 0.05, we fail to reject the null hypothesis and conclude that there isn't enough evidence to say that mental health scores differ significantly across friend group categories.

Post-hoc Analysis: If the ANOVA result is significant, we might want to conduct post-hoc tests (like Tukey's HSD) to determine which specific groups differ from each other.

#### Limitations:

ANOVA assumes normality of data within each group and homogeneity of variances. We might need to check these assumptions.
We're treating the ordinal mental health scores as continuous, which might not be entirely accurate.
By grouping friends into categories, we lose some of the granularity in the data.

### Preference for group members:
- Yue Yu
- Lauren Chiu
- Yuxin Xu