Skip to content

usmankhalid06/CCMH

Repository files navigation

CCMH: Cross-Condition Mental Health Intelligent System

This repository contains the code for CCMH (Cross-Condition Mental Health), an intelligent decision support system for mental health text analysis using Blind Source Separation (BSS) methods.

📄 Paper

Title: CCMH: An Intelligent System for Cross-Condition Mental Health Text Analysis via Semantic Dictionary Learning

Status: Submitted to Expert Systems With Applications

Authors: Muhammad Usman Khalid, Shafiq ur Rehman, Malik Muhammad Nauman, Hatoon S. AlSagri, Sheikh Naeem Shafqat

📊 Dataset

This project uses the Reddit Mental Health Dataset by Low et al. (2020).

Download: https://zenodo.org/records/3941387

Citation:

Low, D. M., Rumker, L., Talkar, T., Torous, J., Cecchi, G., & Ghosh, S. S. (2020). 
Natural Language Processing Reveals Vulnerable Mental Health Support Groups and 
Heightened Health Anxiety on Reddit During COVID-19: Observational Study. 
Journal of Medical Internet Research, 22(10), e22635.

🚀 Getting Started

Prerequisites

  • MATLAB R2020a or later
  • Python 3.8+ (for sentence embeddings)
  • sentence-transformers library (pip install sentence-transformers)
  • Required MATLAB toolboxes:
    • Statistics and Machine Learning Toolbox
    • Signal Processing Toolbox

Installation

  1. Clone this repository:
git clone https://github.com/usmankhalid06/CCMH.git
cd CCMH
  1. Download the dataset from Zenodo

  2. Extract the dataset to your working directory

  3. Install Python dependencies:

pip install sentence-transformers pandas numpy

📁 File Structure

CCMH/
├── script_Sentence_Transformer_preCovid.m  # Main analysis pipeline
├── clean_reddit_post.m                      # Text preprocessing
├── get_sentence_embeddings.m                # Sentence transformer interface
├── find_K_multiple_criteria.m               # Dictionary size selection (AIC/BIC)
├── my_KSVD.m                               # K-SVD algorithm
├── my_ODL.m                                # Online Dictionary Learning
├── my_ACSD.m                               # Adaptive Consistent Sequential DL
├── SDL.m                                   # Shared Dictionary Learning (proposed)
├── my_sparse_encode.m                      # Sparse coding implementation
└── README.md

💻 Usage

Step 1: Preprocess Data

% Clean and preprocess Reddit posts
cleaned_text = clean_reddit_post(raw_posts);

Step 2: Generate Sentence Embeddings

% Generate 384-dimensional embeddings using all-MiniLM-L6-v2
embeddings = get_sentence_embeddings(cleaned_text);

Step 3: Run Main Analysis

% Execute complete pipeline
script_Sentence_Transformer_preCovid

This will:

  1. Load preprocessed data
  2. Determine optimal dictionary size (K) using AIC/BIC
  3. Learn dictionaries using all four algorithms
  4. Perform statistical validation
  5. Generate figures

📖 Core Functions

Dictionary Learning Algorithms

  • my_KSVD.m - K-SVD dictionary learning
  • my_ODL.m - Online dictionary learning with LARS
  • my_ACSD.m - Adaptive consistent sequential dictionary learning
  • SDL.m - Shared dictionary learning (our proposed method)

For K-SVD and ODL you need to download SPAMS toolbox from here https://thoth.inrialpes.fr/people/mairal/spams/ to run mexOMP and mexLasso

Utilities

  • clean_reddit_post.m - Text preprocessing (remove HTML, URLs, formatting)
  • get_sentence_embeddings.m - Generate sentence transformer embeddings
  • find_K_multiple_criteria.m - Model selection (AIC, BIC, variance explained)
  • my_sparse_encode.m - Sparse coding with adaptive L1 regularization

🔧 Key Parameters

  • Dictionary size (K): Determined by 70% variance explained criterion
  • Sparsity (λ): 20 for cross-condition analysis, algorithm-specific for training
  • Iterations: 30 for all dictionary learning methods

📊 Outputs

The analysis generates:

  • Learned dictionary atoms for each algorithm
  • Activation matrices (11 conditions × K atoms)
  • Cross-algorithm validation metrics
  • Condition clustering visualizations
  • Discriminative atom analysis

🧪 Reproducing Results

To reproduce paper results:

% Ensure dataset is in path
addpath('path/to/reddit/data');

% Run main script
script_Sentence_Transformer_preCovid

% Results will be saved in figures/ directory

📧 Contact

For questions or issues, please contact:

🙏 Acknowledgments

This work was supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University (IMSIU) (grant number IMSIU-DDRSP2504).

📚 Citation

If you use this code, please cite:

@article{khalid2025ccmh,
  title={CCMH: An Intelligent System for Cross-Condition Mental Health Text Analysis via Semantic Dictionary Learning},
  author={Khalid, Muhammad Usman and Rehman, Shafiq ur and Nauman, Malik Muhammad and AlSagri, Hatoon S. and Shafqat, Sheikh Naeem},
  journal={Expert Systems With Applications},
  year={2025},
  note={Submitted}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages