<a href="https://colab.research.google.com/github/vickie005/SDG3-MentalHealth-NLP/blob/main/SDG3_MentalHealth_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%writefile README.md
# 🌍 SDG 3: Good Health & Well-being
### 🧠 Project: Predicting Mental Health Risk from Social Media Posts

---

## Overview
This project addresses **Sustainable Development Goal (SDG) 3: Good Health & Well-being** by using **Natural Language Processing (NLP)** to predict emotional states from social media posts.
Early detection of stress, anxiety, or other mental health risks can help raise awareness and support interventions.

---

## 🎯 Objectives
- Analyze social media posts to identify emotional indicators.
- Build a supervised learning model to classify emotions such as joy, sadness, anger, or fear.
- Support mental health monitoring and awareness programs through AI insights.

---

## 🧰 Tools & Libraries
- **Python** (Google Colab)
- `pandas` for data handling
- `scikit-learn` for machine learning models
- `nltk` for text preprocessing
- `matplotlib` & `seaborn` for visualization
- `datasets` from Hugging Face for ready-to-use social media data

---

## 📊 Machine Learning Approach
**Algorithm Used:** Logistic Regression (can also experiment with Naive Bayes or BERT for better accuracy)
1. **Data Loading:** Use Hugging Face `emotion` dataset (directly in Python, no downloads).
2. **Preprocessing:** Clean text, remove stopwords, normalize.
3. **Feature Extraction:** Convert text to numeric vectors using TF-IDF.
4. **Model Training:** Train Logistic Regression classifier on labeled emotions.
5. **Evaluation:** Use classification report and confusion matrix to evaluate performance.

---

## 🌱 Ethical Considerations
- Ensure user privacy by using only anonymized or public posts.
- Be aware of potential bias in social media datasets.
- Avoid misinterpreting emotions or overgeneralizing mental health status.

---

## 💡 Expected Impact
By identifying emotional patterns in social media posts, this model can:
- Help mental health organizations detect trends in stress or depression.
- Support early interventions and awareness campaigns.
- Encourage responsible use of AI in mental health monitoring.

---

## 📎 Deliverables
- `SDG3_MentalHealthNLP.ipynb` – Colab notebook with full workflow
- `README.md` – Project documentation
- Screenshots of visualizations and evaluation results
- Optional project pitch or short video demo

---

## SDG 3 Mental Health NLP Project presentation:
https://gamma.app/docs/Predicting-Mental-Health-Risk-from-Social-Media-Posts-7e4g5ubrri7941x

---
## ✨ Quote
> “AI can be the bridge between innovation and sustainability.” — UN Tech Envoy


In [None]:
!pip install datasets scikit-learn nltk seaborn matplotlib

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from datasets import load_dataset
nltk.download('stopwords')

In [None]:
dataset = load_dataset("emotion")
train_data = dataset['train']
test_data = dataset['test']

# Convert to pandas DataFrame (for easier handling)
df_train = pd.DataFrame(train_data)
df_test = pd.DataFrame(test_data)

df_train.head()

In [None]:
dataset["train"].features["label"]

In [None]:
# Explore and Visualize Data
sns.countplot(x=df_train['label'])
plt.title('Distribution of Emotion Labels')
plt.show()

In [None]:
# Data Preprocessing
from nltk.corpus import stopwords
stop_words = stopwords.words('english')

def preprocess(text):
    text = text.lower()
    return ' '.join([word for word in text.split() if word not in stop_words])

df_train['clean_text'] = df_train['text'].apply(preprocess)
df_test['clean_text'] = df_test['text'].apply(preprocess)


In [None]:
# Feature Extraction (TF-IDF)
vectorizer = TfidfVectorizer(max_features=5000)
X_train = vectorizer.fit_transform(df_train['clean_text'])
X_test = vectorizer.transform(df_test['clean_text'])

y_train = df_train['label']
y_test = df_test['label']

In [None]:
# Train Model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

In [None]:
# Evaluate Model
y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()


**Project Report: SDG 3 - Mental Health NLP Project**

**SDG Problem Addressed:**  
Mental health is a growing concern worldwide. Social media posts contain valuable signals about users’ emotional states, which can indicate stress, anxiety, or other mental health risks.

**ML Approach Used:**  
I used **Natural Language Processing (NLP)** with supervised learning. The workflow includes:
- Text preprocessing (cleaning, stopwords removal, normalization)
- Feature extraction using TF-IDF
- Logistic Regression classification
- Evaluation using accuracy, F1-score, and confusion matrix

**Results:**  
The model achieved an overall **accuracy of 87%** on this test set. The **weighted F1-score is 0.86**, and the **macro F1-score is 0.80**. Confusion matrix and visualizations indicate the model can reasonably distinguish between emotions like joy, sadness, anger, and fear.

**Ethical Considerations:**  
- Only anonymized/public posts were used to protect privacy.  
- Dataset bias may exist; results should not be overgeneralized.  
- The project promotes responsible use of AI in mental health monitoring.

**Conclusion:**  
This project demonstrates how AI can support mental health initiatives by detecting emotional signals in social media posts, helping organizations plan early interventions.


**View the SDG 3 Mental Health NLP Project presentation:**  
[Click here to open the slides](https://gamma.app/docs/Predicting-Mental-Health-Risk-from-Social-Media-Posts-7e4g5ubrri7941x)