
# 03. DSM-5 Rule-based Multi-label Labeling (Sanitized)

This notebook documents the **rule-based DSM-5 multi-label labeling logic** used in the study.
To ensure ethical compliance and prevent data leakage, **only synthetic placeholders** are used.

**Key goals**
- Make the labeling logic fully transparent
- Preserve exact algorithmic structure used in experiments
- Avoid releasing any protected raw text



## 1. DSM-5 Criteria Index

We define nine DSM-5 depressive disorder criteria (A1â€“A9).
Each criterion is associated with a curated lexicon used for rule-based matching.


In [None]:

DSM5_CRITERIA = {
    "A1": "Depressed mood",
    "A2": "Anhedonia",
    "A3": "Appetite or weight change",
    "A4": "Sleep disturbance",
    "A5": "Psychomotor agitation or retardation",
    "A6": "Fatigue or loss of energy",
    "A7": "Feelings of worthlessness or guilt",
    "A8": "Diminished ability to think or concentrate",
    "A9": "Recurrent thoughts of death or suicide"
}

DSM5_CRITERIA



## 2. DSM-5 Lexicon (Sanitized)

The original study used expert-curated Korean lexical expressions.
Here, we provide **structural placeholders** to demonstrate the labeling logic.


In [None]:

DSM5_LEXICON = {
    "A1": [["sad"], ["hopeless"], ["depressed"]],
    "A2": [["no", "interest"], ["nothing", "fun"]],
    "A3": [["appetite", "loss"], ["overeating"]],
    "A4": [["cannot", "sleep"], ["insomnia"]],
    "A5": [["restless"], ["slowed"]],
    "A6": [["tired"], ["no", "energy"]],
    "A7": [["worthless"], ["guilty"]],
    "A8": [["cannot", "focus"], ["forgetful"]],
    "A9": [["want", "die"], ["suicide"]]
}



## 3. Rule-based Matching Function

A criterion is marked **True** if *all tokens* in any lexicon pattern
are present in the tokenized sentence.


In [None]:

def rule_based_match(tokens, lexicon):
    for pattern in lexicon:
        if all(token in tokens for token in pattern):
            return True
    return False



## 4. Multi-label Assignment

Each input text is evaluated independently for all DSM-5 criteria,
resulting in a **multi-hot label vector**.


In [None]:

def assign_dsm5_labels(tokens):
    labels = {}
    for criterion, patterns in DSM5_LEXICON.items():
        labels[criterion] = rule_based_match(tokens, patterns)
    return labels



## 5. Example with Synthetic Input


In [None]:

synthetic_tokens = ["i", "feel", "tired", "and", "cannot", "sleep"]

assign_dsm5_labels(synthetic_tokens)



## 6. Notes on Reproducibility and Ethics

- The actual experiments used a **clinically curated Korean DSM-5 lexicon**
- Raw counseling text and social media data are not released
- This notebook provides **full algorithmic transparency** while complying with
  data protection laws and platform terms of service

This approach aligns with responsible reproducibility practices in mental health NLP research.
