# 💭🧠🤍 Custom NER model for mental health purposes 🖤🧠💭

## Installing all necessary libraries and packages

In [7]:
pip install spacy -q

Note: you may need to restart the kernel to use updated packages.


In [8]:
!python -m spacy download en_core_web_sm en_core_web_lg -q

[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')


In [16]:
import spacy
from spacy.lang.en import English
from spacy.pipeline import EntityRuler
from spacy import displacy
import json
import os
import re
import random
from tqdm import tqdm
import subprocess

%reload_ext autoreload
%autoreload 2
import datahelper as dt

## Exploring performance of generalized spaCy models

In [28]:
test="""After a decades-long pause, psychedelics are again being intensely investigated for treating a wide range of 
        neuropsychiatric ailments including depression, anxiety, addiction, post-traumatic stress disorder, anorexia, 
        and chronic pain syndromes. The classic serotonergic psychedelics psilocybin and lysergic acid diethylamide and 
        nonclassic psychedelics 3,4-methylenedioxymethamphetamine and ketamine are increasingly appreciated as neuroplastogens 
        given their potential to fundamentally alter mood and behavior well beyond the time window of measurable exposure. 
        Imaging studies with psychedelics are also helping advance our understanding of neural networks and connectomics. 
        This resurgence in psychedelic science and psychedelic-assisted therapy has potential significance for the fields of 
        neurosurgery and neuro-oncology and their diverse and challenging patients, many of whom continue to have mental health 
        issues and poor quality of life despite receiving state-of-the-art care. In this study, we review recent and ongoing 
        clinical trials, the set and setting model of psychedelic-assisted therapy, potential risks and adverse events, proposed 
        mechanisms of action, and provide a perspective on how the safe and evidence-based use of psychedelics could potentially 
        benefit many patients, including those with brain tumors, pain syndromes, ruminative disorders, stroke, SAH, TBI, and 
        movement disorders. By leveraging psychedelics' neuroplastic potential to rehabilitate the mind and brain, novel 
        treatments may be possible for many of these patient populations, in some instances working synergistically with current 
        treatments and in some using subpsychedelic doses that do not require mind-altering effects for efficacy. This review aims 
        to encourage broader multidisciplinary collaboration across the neurosciences to explore and help realize the transdiagnostic 
        healing potential of psychedelics."""
test = re.sub("[\n ]+", " ", test)

### Small English Model

In [18]:
nlp = spacy.load("en_core_web_sm")
doc = nlp(test)
displacy.render(doc, style="ent", jupyter=True)

### Large English Model

In [19]:
nlp = spacy.load("en_core_web_lg")
doc = nlp(test)
displacy.render(doc, style="ent", jupyter=True)

### Conclusion
We can clearly see that neither small nor large spacy models was not able to catch any entities from the provided abstract correctly. It is not surprising since these are very generalized english models and therefore they are not traind on a specific set of biological data. Most likely this model has never seen anything like this abstract before. Moreover, as we can see below, our model doesn't even posess the labels that we want. Thus, we will explore some other models and the ways to customize a model according to our needs.

In [27]:
for pipe_name in nlp.pipe_names:
    if pipe_name == 'ner':
        component = nlp.get_pipe(pipe_name)
        if hasattr(component, "labels"):
            print(component.labels)

('CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART')


## Creating a custom model

In [13]:
!python -m spacy init fill-config model/base_config.cfg model/config.cfg

[+] Auto-filled config with all values
[+] Saved config
model\config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy


In [14]:
!python -m spacy train model/config.cfg --output ./model --paths.train data/train.spacy --paths.dev data/test.spacy

[i] Saving to output directory: model
[i] Using CPU
[1m
[+] Initialized pipeline
[1m
[i] Pipeline: ['tok2vec', 'ner']
[i] Initial learn rate: 0.001
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00    200.50    0.00    0.00    0.00    0.00
200     200       3015.06   6329.03    0.00    0.00    0.00    0.00
400     400          0.00      0.00    0.00    0.00    0.00    0.00
600     600          0.00      0.00    0.00    0.00    0.00    0.00
800     800          0.00      0.00    0.00    0.00    0.00    0.00
1000    1000          0.00      0.00    0.00    0.00    0.00    0.00
1200    1200          0.00      0.00    0.00    0.00    0.00    0.00
1400    1400          0.00      0.00    0.00    0.00    0.00    0.00
1600    1600          0.00      0.00    0.00    0.00    0.00    0.00
[+] Saved pipeline to output directory
model\model-last


[2023-03-26 13:59:57,185] [INFO] Set up nlp object from config
[2023-03-26 13:59:57,202] [INFO] Pipeline: ['tok2vec', 'ner']
[2023-03-26 13:59:57,213] [INFO] Created vocabulary
[2023-03-26 13:59:57,213] [INFO] Finished initializing nlp object
[2023-03-26 13:59:57,395] [INFO] Initialized pipeline components: ['tok2vec', 'ner']


In [20]:
nlp = spacy.load("./model/model-best")

In [31]:
test = """The PAWS Study used an online platform to deliver a cross-sectional survey instrument designed to assess participants' 
        retrospective perspectives on the mental health effects of classic psychedelic use, as well as predictors of positive and 
        negative outcomes from this use. Because of our focus on classic psychedelic agents (i.e., tryptamines and phenethylamines 
        with a primary mechanism of action believed to be agonism of the serotonin 5HT2A receptor), we did not query 
        3,4-Methylenedioxymethamphetamine (MDMA), which has a different mechanism of action and tends to produce different acute 
        effects than classic psychedelics. Given the widespread emerging use of ketamine as a “psychedelic-like” agent for the 
        treatment of major depressive disorder, we included this agent in our survey, although we recognize that it is not a 
        classic psychedelic."""
test = re.sub("[\n ]+", " ", test)

In [32]:
doc = nlp(test)

In [33]:
options = dt.pretty_colors()
spacy.displacy.render(doc, style="ent", jupyter=True, options=options)

☹