# Analyzing Students' Mental Health in SQL

In this project, I'll perform exploratory data analysis on a dataset around mental health of domestic and international students. I'll perform SQL querying to look at how social connectedness and cultural issues affect mental health. Finally, I'll visualize the results of my analysis using the Python Plotly package.

## The Data

This survey was conducted in 2018 at an international Japanese university and the associated study was published in 2019. It was approved by several ethical and regulatory boards.

The study found that international students have a higher risk of mental health difficulties compared to the general population, and that social connectedness and acculturative stress are predictive of depression.

Social connectedness: measure of belonging to a social group or network.

Acculturative stress: stress associated with learning about and intergrating into a new culture.

[See paper for more info, including data description.](https://www.mdpi.com/2306-5729/4/3/124/htm)

[Link to the data.](https://www.mdpi.com/2306-5729/4/3/124/s1)

### Inspect the Data

Our data is in one table that includes all of the survey data. There are 50 fields and, according to the paper, 268 records. Each row is a student.

You can check the schema on the left.

1. Check if the data has 268 records.

In [None]:
-- Count the number of records in the data
SELECT COUNT(*) AS total_numbers
FROM students

Unnamed: 0,total_numbers
0,286


2. Inspect the dataset to see what the fields look like.

In [None]:
-- Inspect the data and limit the output to 5 records
SELECT *
FROM students 
LIMIT 5

Unnamed: 0,inter_dom,region,gender,academic,age,age_cate,stay,stay_cate,japanese,japanese_cate,english,english_cate,intimate,religion,suicide,dep,deptype,todep,depsev,tosc,apd,ahome,aph,afear,acs,aguilt,amiscell,toas,partner,friends,parents,relative,profess,phone,doctor,reli,alone,others,internet,partner_bi,friends_bi,parents_bi,relative_bi,professional_bi,phone_bi,doctor_bi,religion_bi,alone_bi,others_bi,internet_bi
0,Inter,SEA,Male,Grad,24,4,5,Long,3,Average,5,High,,Yes,No,No,No,0,Min,34,23,9,11,8,11,2,27,91,5,5,6,3,2,1,4,1,3,4,,Yes,Yes,Yes,No,No,No,No,No,No,No,No
1,Inter,SEA,Male,Grad,28,5,1,Short,4,High,4,High,,No,No,No,No,2,Min,48,8,7,5,4,3,2,10,39,7,7,7,4,4,4,4,1,1,1,,Yes,Yes,Yes,No,No,No,No,No,No,No,No
2,Inter,SEA,Male,Grad,25,4,6,Long,4,High,4,High,Yes,Yes,No,No,No,2,Min,41,13,4,7,6,4,3,14,51,3,3,3,1,1,2,1,1,1,1,,No,No,No,No,No,No,No,No,No,No,No
3,Inter,EA,Female,Grad,29,5,1,Short,2,Low,3,Average,No,No,No,No,No,3,Min,37,16,10,10,8,6,4,21,75,5,5,5,5,5,2,2,2,4,4,,Yes,Yes,Yes,Yes,Yes,No,No,No,No,No,No
4,Inter,EA,Female,Grad,28,5,1,Short,1,Low,3,Average,Yes,No,No,No,No,3,Min,37,15,12,5,8,7,4,31,82,5,5,5,2,5,2,5,5,4,4,,Yes,Yes,Yes,No,Yes,No,Yes,Yes,No,No,No


3. How many international and domestic students are in the data set?

In [None]:
-- Count the number of international and domestic students
SELECT inter_dom, COUNT(*)
FROM students 
GROUP BY inter_dom

Unnamed: 0,inter_dom,count
0,Dom,67
1,,18
2,Inter,201


4. Look into the 18 unassigned rows to understand what they could be.

In [None]:
-- Query the data to see all records where inter_dom is neither 'Dom' nor 'Inter'
SELECT *
FROM students 
WHERE inter_dom NOT LIKE 'D%' and inter_dom NOT LIKE 'I%'

Unnamed: 0,inter_dom,region,gender,academic,age,age_cate,stay,stay_cate,japanese,japanese_cate,english,english_cate,intimate,religion,suicide,dep,deptype,todep,depsev,tosc,apd,ahome,aph,afear,acs,aguilt,amiscell,toas,partner,friends,parents,relative,profess,phone,doctor,reli,alone,others,internet,partner_bi,friends_bi,parents_bi,relative_bi,professional_bi,phone_bi,doctor_bi,religion_bi,alone_bi,others_bi,internet_bi
0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,46.0,222.0,,,,,,,,,
1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,19.0,249.0,,,,,,,,,
2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,65.0,203.0,,,,,,,,,
3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,21.0,247.0,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,45.0,223.0,,,,,,,,,
5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,,,,,,,,,,,,,,,,96.0,42.0,,65.0,,,,,,,,,,,,,,,,,,,,,145.0,128.0,137.0,66.0,61.0,30.0,46.0,19.0,65.0,21.0,45.0
7,,,,,,,,,,,,,,,,172.0,54.0,,107.0,,,,,,,,,,,,,,,,,,,,,123.0,140.0,131.0,202.0,207.0,238.0,222.0,249.0,203.0,247.0,223.0
8,,,,,,,,,,,,,,,,,172.0,,73.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,,,,,,,,,,,,,,,,,,,15.0,,,,,,,,,,,,,,,,,,,,,145.0,128.0,137.0,66.0,61.0,30.0,46.0,19.0,65.0,21.0,45.0


5. Where are the international students from?

In [None]:
-- See what Region international students are from
SELECT region, COUNT (*) AS total_students
FROM students 
WHERE inter_dom = 'Inter'
GROUP BY region

Unnamed: 0,region,total_students
0,SA,18
1,EA,48
2,JAP,2
3,Others,11
4,SEA,122


### Understanding the scores

6. Find the minimum, maximum, and average of each of the diagnostic tests (PHQ-9, SCS, ASISS). This information is in the paper, but it's good practice to look this up yourself during analysis.

In [None]:
-- Find out the basic summary statistics of the diagnostic tests for all students
SELECT MIN("todep") AS min_phq, 
	   MAX("todep") AS max_phq, 
       ROUND(AVG("todep"), 2) AS avg_phq, 
       MIN("tosc") AS min_scs, 
       MAX("tosc") AS max_scs, 
       ROUND(AVG("tosc"), 2) AS avg_scs, 
       MIN("toas") AS min_as, 
       MAX("toas") AS max_as, 
       ROUND(AVG("toas"), 2) AS avg_as
FROM students;

Unnamed: 0,min_phq,max_phq,avg_phq,min_scs,max_scs,avg_scs,min_as,max_as,avg_as
0,0,25,8.19,8,48,37.47,36,145,72.38


**On your own:**

7. What are the summary statistics for domestic students and international students?

In [None]:
-- Write a query that looks at the statistics of each student group on one table, remembering to avoid the empty rows

SELECT inter_dom, 
	   MIN("todep") AS min_phq, 
	   MAX("todep") AS max_phq, 
       ROUND(AVG("todep"), 2) AS avg_phq, 
       MIN("tosc") AS min_scs, 
       MAX("tosc") AS max_scs, 
       ROUND(AVG("tosc"), 2) AS avg_scs, 
       MIN("toas") AS min_as, 
       MAX("toas") AS max_as, 
       ROUND(AVG("toas"), 2) AS avg_as
FROM students 
WHERE inter_dom IN ('Inter', 'Dom')
GROUP BY inter_dom;

Unnamed: 0,inter_dom,min_phq,max_phq,avg_phq,min_scs,max_scs,avg_scs,min_as,max_as,avg_as
0,Inter,0,25,8.04,11,48,37.42,36,145,75.56
1,Dom,0,23,8.61,8,48,37.64,36,112,62.84


### International Focus

The study found that international students presented a higher risk of having mental health difficulties. Recall, the data is also skewed towards international students. Let's take a closer look at this student group.

8. How does the age of the international student impact the scores?

In [None]:
-- Find the average scores for each age group of international students, and view them in order
SELECT age, 
       ROUND(AVG("todep"), 2) AS avg_phq, 
       ROUND(AVG("tosc"), 2) AS avg_scs, 
       ROUND(AVG("toas"), 2) AS avg_as
FROM students
WHERE inter_dom = 'Inter'
GROUP BY age
ORDER BY age;

Unnamed: 0,age,avg_phq,avg_scs,avg_as
0,17,4.67,37.33,70.67
1,18,8.75,34.11,80.61
2,19,8.44,37.9,74.1
3,20,7.35,38.21,73.26
4,21,9.23,37.74,75.23
5,22,8.36,38.14,70.43
6,23,9.67,32.0,81.25
7,24,4.67,42.33,74.33
8,25,6.11,37.33,80.78
9,27,10.0,35.0,42.0


**On Your Own**

9. See how another variable may impact the score.

In [None]:
-- Find the average scores by length of stay for international students, and view them in order

SELECT stay , 
       ROUND(AVG("todep"), 2) AS avg_phq, 
       ROUND(AVG("tosc"), 2) AS avg_scs, 
       ROUND(AVG("toas"), 2) AS avg_as
FROM students
WHERE inter_dom = 'Inter'
GROUP BY stay
ORDER BY stay;

Unnamed: 0,stay,avg_phq,avg_scs,avg_as
0,1,7.48,38.11,72.8
1,2,8.28,37.08,77.67
2,3,9.09,37.13,78.0
3,4,8.57,33.93,87.71
4,5,0.0,34.0,91.0
5,6,6.0,38.0,58.67
6,7,4.0,48.0,45.0
7,8,10.0,44.0,65.0
8,10,13.0,32.0,50.0


# Interactive plots with Plotly
Check out Introduction to Data Visualization with Plotly in Python to learn more.

In [None]:
# Import plotly packages
import plotly.express as px
import plotly.graph_objects as go

In [None]:
-- Make sure the data you want is saved to a variable
SELECT *
FROM students;

Unnamed: 0,inter_dom,region,gender,academic,age,age_cate,stay,stay_cate,japanese,japanese_cate,english,english_cate,intimate,religion,suicide,dep,deptype,todep,depsev,tosc,apd,ahome,aph,afear,acs,aguilt,amiscell,toas,partner,friends,parents,relative,profess,phone,doctor,reli,alone,others,internet,partner_bi,friends_bi,parents_bi,relative_bi,professional_bi,phone_bi,doctor_bi,religion_bi,alone_bi,others_bi,internet_bi
0,Inter,SEA,Male,Grad,24.0,4.0,5.0,Long,3.0,Average,5.0,High,,Yes,No,No,No,0.0,Min,34.0,23.0,9.0,11.0,8.0,11.0,2.0,27.0,91.0,5.0,5.0,6.0,3.0,2.0,1.0,4.0,1.0,3.0,4.0,,Yes,Yes,Yes,No,No,No,No,No,No,No,No
1,Inter,SEA,Male,Grad,28.0,5.0,1.0,Short,4.0,High,4.0,High,,No,No,No,No,2.0,Min,48.0,8.0,7.0,5.0,4.0,3.0,2.0,10.0,39.0,7.0,7.0,7.0,4.0,4.0,4.0,4.0,1.0,1.0,1.0,,Yes,Yes,Yes,No,No,No,No,No,No,No,No
2,Inter,SEA,Male,Grad,25.0,4.0,6.0,Long,4.0,High,4.0,High,Yes,Yes,No,No,No,2.0,Min,41.0,13.0,4.0,7.0,6.0,4.0,3.0,14.0,51.0,3.0,3.0,3.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,,No,No,No,No,No,No,No,No,No,No,No
3,Inter,EA,Female,Grad,29.0,5.0,1.0,Short,2.0,Low,3.0,Average,No,No,No,No,No,3.0,Min,37.0,16.0,10.0,10.0,8.0,6.0,4.0,21.0,75.0,5.0,5.0,5.0,5.0,5.0,2.0,2.0,2.0,4.0,4.0,,Yes,Yes,Yes,Yes,Yes,No,No,No,No,No,No
4,Inter,EA,Female,Grad,28.0,5.0,1.0,Short,1.0,Low,3.0,Average,Yes,No,No,No,No,3.0,Min,37.0,15.0,12.0,5.0,8.0,7.0,4.0,31.0,82.0,5.0,5.0,5.0,2.0,5.0,2.0,5.0,5.0,4.0,4.0,,Yes,Yes,Yes,No,Yes,No,Yes,Yes,No,No,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
281,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,128,140,,,,,,,,,
282,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,137,131,,,,,,,,,
283,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,66,202,,,,,,,,,
284,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,61,207,,,,,,,,,


### Histograms

In [None]:
# Create the histogram figure
fig = px.histogram(
    
# Select the dataframe
    data_frame=data,
    
# Select the column you want to visualize
    x="todep",

# Select the additional column to determine different groups of data
    color="inter_dom",

# Clean up the labels
    title="Distribution of the total scores from the PHQ-9 test",
    labels={"inter_dom":"Type of Student", "ToDep":"Total PHQ-9 Scores"})

# Show the figure
fig.show()

### Box Plots

In [None]:
# Create the box plot figure
fig = px.box(
    
# Select the dataframe
    data_frame=data,

# Select the column you want to visualize
    x="todep",
    
# Select the additional column to determine different groups of data    
    color="inter_dom",

# Select the additional column to determine what information is shown when you hover over the plot
    hover_data=["age"],

# Select the option to view all data points
    points="all",

# Clean up the labels
    title="Box plot of the total scores from the PHQ-9 test",
    labels={"inter_dom":"Type of Student", "ToDep":"Total PHQ-9 Scores"})

# Show the figure
fig.show()

### Correlation Plot

In [None]:
# List the columns that are continuous variables
continuous_variables = ['age', 'stay', 'japanese', 'english', 'todep', 'tosc', 'apd', 'ahome', 'aph', 'afear', 'acs', 'aguilt', 'amiscell', 'toas', 'partner', 'friends', 'parents', 'relative', 'profess', 'phone', 'doctor', 'reli', 'alone', 'others', 'internet']

# Create a subset dataframe only the columns of the continous variables
data_cont = data[continuous_variables]

# Create a pearson correlation
data_corr = data_cont.corr(method='pearson')

# Build the Heatmap
fig = go.Figure(go.Heatmap(x=data_corr.columns, y=data_corr.columns, z=data_corr.values.tolist(), zmin=-1, zmax=1))
                
# Adjust the plot size
fig.update_layout(width=800, height=800)

# Show the plot
fig.show()

### Dropdown Interactivity

In [None]:
# Create the figure
fig = go.Figure()

# Write a for loop to loop over the variable you want to have in the drop down
for stu_type in ['Inter', 'Dom']:
    df = data[data.inter_dom == stu_type]
    fig.add_trace(go.Histogram(x=df["todep"], name=stu_type))

# Create the dropdown buttons
dropdown_buttons = [
    {'label':'All', 'method':'update', 'args':[{'visible': [True, True]}, {'title': 'All'}]},
    {'label':'International', 'method':'update', 'args':[{'visible': [True, False]}, {'title': 'International'}]},
    {'label':'Domestic', 'method':'update', 'args':[{'visible':[False, True]}, {'title':'Domestic'}]}
]

# Add the dropdown to the figure
fig.update_layout(
    {'updatemenus':[{'type':"dropdown",
         'x': 1.3,
         'y': 0.5,
         'showactive':True,
         'active':0,
         'buttons': dropdown_buttons}]
    }
)

# Show the figure
fig.show()