##Sample text is a snippet from a scientific case report on Artifactual Hypoglycemia: A Condition That Should Not Be Forgotten from [Frontiers in Medicine](https://www.frontiersin.org/articles/10.3389/fendo.2022.951377/full)

## scispaCy python [library](https://allenai.github.io/scispacy/)

In [5]:
%%capture
!pip install scispacy

In [6]:
%%capture
!pip install 'https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_scibert-0.5.0.tar.gz'
#~785k vocabulary and allenai/scibert-base as the transformer

In [3]:
%%capture
!pip install 'https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_jnlpba_md-0.5.0.tar.gz'
# Trained on the JNLPBA corpus for entities like DNA,Cell type,Cell line,RNA,Protein 

In [4]:
%%capture
!pip install 'https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_bc5cdr_md-0.5.0.tar.gz'
#Trained on BC5CDR corpus for Disease and Chemical entities

In [7]:
%%capture
!pip install 'https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_bionlp13cg_md-0.5.0.tar.gz'
#Trained on the BIONLP13CG corpus for about 16 types of entities like Organ, Organism,Cell,Cancer,Cellular Component,Pathological_formation

In [8]:
import scispacy
import spacy
from spacy import displacy
import en_core_sci_scibert
import en_ner_jnlpba_md
import en_ner_bc5cdr_md
import en_ner_bionlp13cg_md
from pprint import pprint
import pandas as pd

In [9]:
case_report = """Background: Hypoglycemia is uncommon in people who are not being treated for diabetes mellitus and, when present, the differential diagnosis is broad. Artifactual hypoglycemia describes discrepancy between low capillary and normal plasma glucose levels regardless of symptoms and should be considered in patients with Raynaud’s phenomenon.

Case Presentation: A 46-year-old female patient with a history of a sleeve gastrectomy started complaining about episodes of lipothymias preceded by sweating, nausea, and dizziness. During one of these episodes, a capillary blood glucose was obtained with a value of 24 mg/dl. She had multiple emergency admissions with low-capillary glycemia. An exhaustive investigation for possible causes of hypoglycemia was made for 18 months. The 72h fasting test was negative for hypoglycemia. A Raynaud’s phenomenon was identified during one appointment.

Conclusion: Artifactual hypoglycemia has been described in various conditions including Raynaud’s phenomenon, peripheral arterial disease, Eisenmenger syndrome, acrocyanosis, or hypothermia. With this case report, we want to reinforce the importance of being aware of this diagnosis to prevent anxiety, unnecessary treatment, and diagnostic tests."""

In [10]:
def display_entities(model,document,colour_options):
    """ 
    This function displays word entities

    Parameters: 
         model(module): A pretrained model from ScispaCy(https://allenai.github.io/scispacy/)
         document(str): Document to be processed
         colour_options(dict): Dictionary of entities and colors to be rendered in displacy image

    Returns: Image rendering and list of named/unnamed word entities and entity labels 
     """
    nlp = model.load()
    doc = nlp(document)
    displacy_image = displacy.render(doc, jupyter=True,style='ent',options=colour_options)
    entity_and_label = pprint(set([(X.text, X.label_) for X in doc.ents]))
    return  displacy_image, entity_and_label

In [11]:
colors = {"entity": "yellow"}
colour_options = {"ents": ["Entity"],"colors": colors}

In [12]:
display_entities(en_core_sci_scibert,case_report,colour_options)

{('Artifactual', 'ENTITY'),
 ('Eisenmenger syndrome', 'ENTITY'),
 ('Hypoglycemia', 'ENTITY'),
 ('Raynaud’s', 'ENTITY'),
 ('acrocyanosis', 'ENTITY'),
 ('anxiety', 'ENTITY'),
 ('appointment', 'ENTITY'),
 ('capillary blood glucose', 'ENTITY'),
 ('case report', 'ENTITY'),
 ('complaining', 'ENTITY'),
 ('conditions', 'ENTITY'),
 ('diabetes mellitus', 'ENTITY'),
 ('diagnosis', 'ENTITY'),
 ('diagnostic tests', 'ENTITY'),
 ('differential diagnosis', 'ENTITY'),
 ('discrepancy', 'ENTITY'),
 ('dizziness', 'ENTITY'),
 ('emergency admissions', 'ENTITY'),
 ('episodes', 'ENTITY'),
 ('fasting test', 'ENTITY'),
 ('female', 'ENTITY'),
 ('glycemia', 'ENTITY'),
 ('history', 'ENTITY'),
 ('hypoglycemia', 'ENTITY'),
 ('hypothermia', 'ENTITY'),
 ('identified', 'ENTITY'),
 ('investigation', 'ENTITY'),
 ('lipothymias', 'ENTITY'),
 ('low capillary', 'ENTITY'),
 ('low-capillary', 'ENTITY'),
 ('months', 'ENTITY'),
 ('multiple', 'ENTITY'),
 ('nausea', 'ENTITY'),
 ('negative', 'ENTITY'),
 ('normal plasma glucose leve

(None, None)

In [13]:
colors = {"DNA": "yellow","CELL_TYPE":"green", "CELL_LINE":"red", "RNA":"brown", "PROTEIN":"pink"}
colour_options = {"ents": ["DNA", "CELL_TYPE", "CELL_LINE","RNA", "PROTEIN"],"colors": colors}

In [14]:
display_entities( en_ner_jnlpba_md,case_report,colour_options)



set()


(None, None)

In [15]:
colors = {"DISEASE": "yellow","CHEMICAL": "red"}
colour_options = {"ents": ["DISEASE","CHEMICAL"],"colors": colors}

In [16]:
display_entities( en_ner_bc5cdr_md,case_report,colour_options)

{('Eisenmenger syndrome', 'DISEASE'),
 ('Hypoglycemia', 'DISEASE'),
 ('Raynaud’s phenomenon', 'DISEASE'),
 ('acrocyanosis', 'DISEASE'),
 ('anxiety', 'DISEASE'),
 ('capillary blood glucose', 'DISEASE'),
 ('diabetes mellitus', 'DISEASE'),
 ('dizziness', 'DISEASE'),
 ('glucose', 'CHEMICAL'),
 ('hypoglycemia', 'DISEASE'),
 ('hypothermia', 'DISEASE'),
 ('lipothymias', 'DISEASE'),
 ('nausea', 'DISEASE'),
 ('peripheral arterial disease', 'DISEASE')}


(None, None)

In [17]:
colors = {"AMINO_ACID":"yellow", "ANATOMICAL_SYSTEM":"green", "CANCER":"red", "CELL":"brown", "CELLULAR_COMPONENT":"pink", "DEVELOPING_ANATOMICAL_STRUCTURE": "blue", "GENE_OR_GENE_PRODUCT":"orange", 
          "IMMATERIAL_ANATOMICAL_ENTITY":"lightblue", "MULTI-TISSUE_STRUCTURE":"lightgreen", "ORGAN":"purple", "ORGANISM":"grey","ORGANISM_SUBDIVISION":"Cyan", "ORGANISM_SUBSTANCE":"magneta", "PATHOLOGICAL_FORMATION":"lilac",
          "SIMPLE_CHEMICAL":"wine", "TISSUE":"lemon"}
colour_options = {"ents": ["AMINO_ACID", "ANATOMICAL_SYSTEM", "CANCER", "CELL", "CELLULAR_COMPONENT", "DEVELOPING_ANATOMICAL_STRUCTURE", "GENE_OR_GENE_PRODUCT", "IMMATERIAL_ANATOMICAL_ENTITY", "MULTI-TISSUE_STRUCTURE", 
                           "ORGAN", "ORGANISM","ORGANISM_SUBDIVISION", "ORGANISM_SUBSTANCE", "PATHOLOGICAL_FORMATION", "SIMPLE_CHEMICAL", "TISSUE"],"colors": colors}

In [18]:
display_entities(en_ner_bionlp13cg_md,case_report,colour_options)

{('capillary', 'TISSUE'),
 ('patient', 'ORGANISM'),
 ('patients', 'ORGANISM'),
 ('people', 'ORGANISM'),
 ('peripheral arterial', 'MULTI_TISSUE_STRUCTURE'),
 ('plasma glucose', 'ORGANISM_SUBSTANCE')}


(None, None)

In [19]:
def entities_and_label_extractor(model,document):
    """ 
    This function returns word entities

    Parameters: 
         model(module): A pretrained model from ScispaCy(https://allenai.github.io/scispacy/)
         document(str): Document to be processed
         

    Returns: list of named/unnamed word entities and entity labels 
     """
    nlp = model.load()
    doc = nlp(document)
    entity_and_label = set([(X.text, X.label_) for X in doc.ents])
    return entity_and_label

In [20]:
bionlp_ner = entities_and_label_extractor(en_ner_bionlp13cg_md,case_report)

In [21]:
type(bionlp_ner)

set

In [22]:
bionlp_ner

{('capillary', 'TISSUE'),
 ('patient', 'ORGANISM'),
 ('patients', 'ORGANISM'),
 ('people', 'ORGANISM'),
 ('peripheral arterial', 'MULTI_TISSUE_STRUCTURE'),
 ('plasma glucose', 'ORGANISM_SUBSTANCE')}

In [23]:
bionlp_entities_dataframe = pd.DataFrame(bionlp_ner,columns=['Entity','Label'])  #save returned values of entities and their labels in a pandas dataframe
bionlp_entities_dataframe['Ner_model'] = 'bionlp13cg'  #include a column with constant value of NER model
bionlp_entities_dataframe

Unnamed: 0,Entity,Label,Ner_model
0,peripheral arterial,MULTI_TISSUE_STRUCTURE,bionlp13cg
1,plasma glucose,ORGANISM_SUBSTANCE,bionlp13cg
2,patients,ORGANISM,bionlp13cg
3,capillary,TISSUE,bionlp13cg
4,patient,ORGANISM,bionlp13cg
5,people,ORGANISM,bionlp13cg


In [24]:
bionlp_entities_dataframe.to_csv('bionlp_entities.csv',index=False)

In [25]:
pd.read_csv('/content/bionlp_entities.csv')

Unnamed: 0,Entity,Label,Ner_model
0,peripheral arterial,MULTI_TISSUE_STRUCTURE,bionlp13cg
1,plasma glucose,ORGANISM_SUBSTANCE,bionlp13cg
2,patients,ORGANISM,bionlp13cg
3,capillary,TISSUE,bionlp13cg
4,patient,ORGANISM,bionlp13cg
5,people,ORGANISM,bionlp13cg
