# PsyNer project

## Installing all necessary libraries and functionalities

In [8]:
pip install spacy

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [11]:
!python -m spacy download en_core_web_sm

Defaulting to user installation because normal site-packages is not writeable
Collecting en-core-web-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 12.8/12.8 MB 3.5 MB/s eta 0:00:00
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.5.0
[38;5;2m[+] Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [12]:
!python -m spacy download en_core_web_lg

Defaulting to user installation because normal site-packages is not writeable
Collecting en-core-web-lg==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.5.0/en_core_web_lg-3.5.0-py3-none-any.whl (587.7 MB)
     ---------------------------------------- 587.7/587.7 MB ? eta 0:00:00
Installing collected packages: en-core-web-lg
Successfully installed en-core-web-lg-3.5.0
[38;5;2m[+] Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')


## Generalized spaCy models

In [14]:
import spacy

In [15]:
test="After a decades-long pause, psychedelics are again being intensely investigated for treating a wide range of neuropsychiatric ailments including depression, anxiety, addiction, post-traumatic stress disorder, anorexia, and chronic pain syndromes. The classic serotonergic psychedelics psilocybin and lysergic acid diethylamide and nonclassic psychedelics 3,4-methylenedioxymethamphetamine and ketamine are increasingly appreciated as neuroplastogens given their potential to fundamentally alter mood and behavior well beyond the time window of measurable exposure. Imaging studies with psychedelics are also helping advance our understanding of neural networks and connectomics. This resurgence in psychedelic science and psychedelic-assisted therapy has potential significance for the fields of neurosurgery and neuro-oncology and their diverse and challenging patients, many of whom continue to have mental health issues and poor quality of life despite receiving state-of-the-art care. In this study, we review recent and ongoing clinical trials, the set and setting model of psychedelic-assisted therapy, potential risks and adverse events, proposed mechanisms of action, and provide a perspective on how the safe and evidence-based use of psychedelics could potentially benefit many patients, including those with brain tumors, pain syndromes, ruminative disorders, stroke, SAH, TBI, and movement disorders. By leveraging psychedelics' neuroplastic potential to rehabilitate the mind and brain, novel treatments may be possible for many of these patient populations, in some instances working synergistically with current treatments and in some using subpsychedelic doses that do not require mind-altering effects for efficacy. This review aims to encourage broader multidisciplinary collaboration across the neurosciences to explore and help realize the transdiagnostic healing potential of psychedelics."

In [16]:
nlp = spacy.load("en_core_web_sm")
doc = nlp(test)
for ent in doc.ents:
    print (ent.text, ent.label_)

decades DATE
3,4-methylenedioxymethamphetamine QUANTITY
SAH ORG
TBI ORG


In [17]:
nlp = spacy.load("en_core_web_lg")
doc = nlp(test)
for ent in doc.ents:
    print (ent.text, ent.label_)

decades DATE
3,4 CARDINAL
neuroplastogens PERSON
connectomics WORK_OF_ART
SAH ORG


We can clearly see that neither small nor large spacy models was not able to catch any entities from the provided abstract correctly. It is not surprising since these are very generalized english models and therefore they are not traind on a specific set of biological data. Most likely this model has never seen anything like this abstract before. Moreover, as we can see below, our model doesn't even posess the labels that we want. Thus, we will explore some other models and the ways to customize a model according to our needs.

In [27]:
for pipe_name in nlp.pipe_names:
    if pipe_name == 'ner':
        component = nlp.get_pipe(pipe_name)
        if hasattr(component, "labels"):
            print(component.labels)

('CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART')


CARDINAL: Numerals that do not fall under another type
DATE: Absolute or relative dates or periods
EVENT: Named hurricanes, battles, wars, sports events, etc.
FAC: Buildings, airports, highways, bridges, etc.
GPE: Countries, cities, states
LANGUAGE: Any named language
LAW: Named documents made into laws
LOC: Non-GPE locations, mountain ranges, bodies of water
MONEY: Monetary values, including unit
NORP: Nationalities or religious or political groups
ORDINAL: "first", "second", etc.
ORG: Companies, agencies, institutions, etc.
PERCENT: Percentage, including "%"
PERSON: People, including fictional
PRODUCT: Objects, vehicles, foods, etc. (not services)
QUANTITY: Measurements, as of weight or distance
TIME: Times smaller than a day
WORK_OF_ART: Titles of books, songs, etc.

## Creating a custom model

In [39]:
from spacy.lang.en import English
from spacy.pipeline import EntityRuler
import json
import os
import random
from tqdm import tqdm

%reload_ext autoreload
%autoreload 2
import datahelper as dt

### Creating a training dataset by applying a rule-based model to a set of texts

In [None]:
dir_path = f"{os.getcwd()}\\data_txt"
txt_files = [f for f in os.listdir(dir_path) if f.endswith(".txt")]

for file in txt_files:
    dt.upgrade_data(f"data_txt/{file}")
    with open (f"data_txt/{file}", "r", encoding="utf-8") as f:
        names = []
        for line in f.readlines():
            if line.strip() != '':
                names.append(line.strip())
        names = list(set(names))
    dt.save_data(f"data_json/{file[:-4]}.json", names)

In [None]:
labels = {"anxiety":"ANXIETY DISORDERS", 
          'bipolar':'BIPOLAR DISORDERS', 
          'depressive':"DEPRESSIVE DISORDERS", 
          'dissociative': "DISSOCIATIVE DISORDERS", 
          'drugs': "PSYCHEDELIC DRUGS", 
          'eating':"EATING DISORDERS", 
          'neurocog':"NEURO-COGNITIVE DISORDERS",
          'neurodev':"NEURO-DEVELOPMENTAL DISORDERS", 
          'nonsubstance':"NON-SUBSTANCE RELATED DISORDERS", 
          'ocd':"OBSESSIVE-COMPULSIVE AND RELATED DISORDERS", 
          'other':"OTHER DISORDERS", 
          'paraphilias':"PARAPHILIAS", 
          'parasomnias':"PARASOMNIAS", 
          'personality':"PERSONALITY DISORDERS", 
          'schizophrenia':"SCHIZOPHRENIA SPECTRUM AND OTHER PSYCHOTIC DISORDERS", 
          'sexual':"SEXUAL DYSFUNCTIONS", 
          'sleep':"SLEEP-WAKE DISORDERS", 
          'somatic':"SOMATIC SYMPTOM RELATED DISORDERS", 
          'substance':"SUBSTANCE-RELATED DISORDERS", 
          'trauma':"TRAUMA AND STRESS RELATED DISORDERS",
          'elimination':"ELIMINATION DISORDERS",
          'disruptive':"DISRUPTIVE IMPULSE-CONTROL, AND CONDUCT DISORDERS"}

In [None]:
for file in txt_files:
    patterns = dt.create_training_data(patterns, f"data_json/{file[:-4]}.json", labels[file[:-4]])
dt.generate_rules(patterns)

In [None]:
test = "After a decades-long pause, psychedelics are again being intensely investigated for treating a wide range of neuropsychiatric ailments including depression, anxiety, addiction, post-traumatic stress disorder, anorexia, and chronic pain syndromes. The classic serotonergic psychedelics psilocybin and lysergic acid diethylamide and nonclassic psychedelics 3,4-methylenedioxymethamphetamine and ketamine are increasingly appreciated as neuroplastogens given their potential to fundamentally alter mood and behavior well beyond the time window of measurable exposure. Imaging studies with psychedelics are also helping advance our understanding of neural networks and connectomics. This resurgence in psychedelic science and psychedelic-assisted therapy has potential significance for the fields of neurosurgery and neuro-oncology and their diverse and challenging patients, many of whom continue to have mental health issues and poor quality of life despite receiving state-of-the-art care. In this study, we review recent and ongoing clinical trials, the set and setting model of psychedelic-assisted therapy, potential risks and adverse events, proposed mechanisms of action, and provide a perspective on how the safe and evidence-based use of psychedelics could potentially benefit many patients, including those with brain tumors, pain syndromes, ruminative disorders, stroke, SAH, TBI, and movement disorders. By leveraging psychedelics' neuroplastic potential to rehabilitate the mind and brain, novel treatments may be possible for many of these patient populations, in some instances working synergistically with current treatments and in some using subpsychedelic doses that do not require mind-altering effects for efficacy. This review aims to encourage broader multidisciplinary collaboration across the neurosciences to explore and help realize the transdiagnostic healing potential of psychedelics."

In [None]:
nlp = spacy.load("psy_ner")

In [None]:
dt.test_model(nlp, test)