## Download NDC Drug names and PMC related articles

### Training set : The European Bioinformatics Institute (EMBL-EBI) PMC articles
#### Product Name : 30 
#### Texts context: 1,700

#### Example Training set

```
{"keyword": "olmesartan medoxomil", "texts": "A rapid, simple and sensitive high-performance liquid chromatography (HPLC) method has been developed for quantification of olmesartan medoxomil (OLM) and amlodipine besylate (AM) in plasma. The assay enables the measurement of OLM and AM for therapeutic drug monitoring with a minimum detectable limit of 2 ng mL. The method involves a simple, one-step extraction procedure and analytical recovery was above 50%. The separation was performed on an analytical 250 \u00d7 4.6 mm Eurospher 100(-5) C18 column. The wavelength was set at 239 nm. The mobile phase was a mixture of acetonitrile:0.05 M ammonium acetate buffer: 0.1 mL triethylamine at pH 6.8 was selected at a flow rate of 1.0 mL min. The calibration curve for the determination of OLM and AM in plasma was linear over the range 2-2500 and 8-10,000 ng mL AM and OLM. The coefficients of variation for interday and intraday assay were found to be <15%. The method can be applied to a pharmacokinetic and pharmacodynamic study of OLM and AM in a combined dosage form."}
{"keyword": "Carvedilol", "texts": "Meta-Analysis Comparing Metoprolol and Carvedilol on Mortality Benefits in Patients With Acute Myocardial Infarction."}
{"keyword": "Carvedilol", "texts": "Carvedilol for the treatment of red scrotum syndrome."}
...

```

In [None]:
from FetchNDC_PMC import downloadFile, unzip, readProduct, splitDataSet, download

# download the ndc drug list 
filename = "ndctext.zip"
url = "https://www.accessdata.fda.gov/cder/ndctext.zip"
unzipdir = "ndc"
productfile = unzipdir + "/product.txt"

downloadFile(url, filename)
unzip(filename, unzipdir)

# read product csv file
df = readProduct(productfile)
fieldname = 'PROPRIETARYNAME'
train_x, test_x = splitDataSet(df, fieldname)

# download PMC articles related to drug names
test_corps_name = "pmc_ner_corps_test.json"
training_corps_name = "pmc_ner_corps_train.json"
label = 'MEDICINE'


download(test_x, test_corps_name)
download(train_x, training_corps_name)

ndctext.zip
699 train_x
175 test_x
 keyword Nausea Vomiting load...
0 found
 keyword Pear load...
49 found
 keyword Lithium Carbonate load...
29 found
 keyword Buprenorphine load...
138 found
 keyword Myoview load...
34 found
 keyword GRAPHITES load...
0 found
 keyword Tizanidine load...
64 found
 keyword Nitrofurantoin Macrocrystals load...
0 found
 keyword Rabofen DM load...
0 found
 keyword Clindamycin hydrochloride load...
12 found
 keyword Artificial Tears Lubricant Eye Drops load...
0 found
 keyword Oxygen load...
97 found
 keyword Regular Strength Aspirin EC load...
0 found
 keyword Alert load...
103 found
 keyword Raloxifene Hydrochloride load...
3 found
 keyword CEPHALEXIN load...
1 found
 keyword Prefest load...
2 found
 keyword Nature Knows Allergy Relief For Kids load...
0 found
 keyword NeutraCaine load...
0 found
 keyword Leukotrap - AS-3 Solution load...
0 found
 keyword Doxazosin mesylate load...
13 found
 keyword Nitrofurantion load...
7 found
 keyword 4 Kids Cold and 

## Convert json pmc crops to conll_03 format and combine base conll_03 corps

In [None]:
from jsonCorps2conll03 import mkdir, convertJson2Conll_03, combineFiles

conll_03_dir = 'corps/conll_03/' #internal
pmc_conll_03_output_folder = 'pmc_conll_03' #internal
final_output_dir = "final_corps/conll_03/" #internal
corps = [(test_corps_name,'eng.testa',200), (test_corps_name, 'eng.testb',200), (training_corps_name,'eng.train', 1000)]

mkdir(pmc_conll_03_output_folder)
mkdir(final_output_dir)
for corp in corps:
    json_corps_name, output_corps_name, maxrow = corp
    pmc_corps = convertJson2Conll_03(json_corps_name, pmc_conll_03_output_folder, output_corps_name, label, max=maxrow )
    combineFiles([pmc_corps, conll_03_dir+output_corps_name], final_output_dir + output_corps_name)

## Train the NER 

In [None]:
from jsonCorps2conll03 import rmdir

# this is the folder in which train, test and dev files reside
conll_03_corps_folder = 'final_corps'
model_output_folder = 'medicine-ner'
rmdir(conll_03_corps_folder)
train(conll_03_corps_folder, model_output_folder)

## Test the NER model

In [10]:
from flair.data import Sentence
from flair.models import SequenceTagger

conll_03_corps_folder = 'final_corps'
model_output_folder = 'medicine-ner'

# make a sentence
sentence = Sentence("""
Previous studies have demonstrated that glucocorticoid hormones, including dexamethasone, induced alterations in intracellular calcium homeostasis in acute lymphoblastic leukemia (ALL) cells. However, the mechanism by which intracellular calcium homeostasis participates in dexamethasone sensitivity and resistance on ALL cells remains elusive. Here, we found that treatment of cells with dexamethasone resulted in increased intracellular calcium concentrations through store-operated calcium entry stimulation, which was curtailed by store-operated calcium channel blockers. We show that BAPTA-AM, an intracellular Ca2+ chelator, synergistically enhances dexamethasone lethality in two human ALL cell lines and in three primary specimens. This effect correlated with the inhibition of the prosurvival kinase ERK1/2 signaling pathway. Chelating intracellular calcium with Bapta-AM or inhibiting ERK1/2 with PD98059 significantly potentiated dexamethasone-induced mitochondrial membrane potential collapse, reactive oxygen species production, cytochrome c release, caspase-3 activity, and cell death. Moreover, we show that thapsigargin elevates intracellular free calcium ion level, and activates ERK1/2 signaling, resulting in the inhibition of dexamethasone-induced ALL cells apoptosis. Together, these results indicate that calcium-related ERK1/2 signaling pathway contributes to protect cells from dexamethasone sensitivity by limiting mitochondrial apoptotic pathway. This report provides a novel resistance pathway underlying the regulatory effect of dexamethasone on ALL cells.
""")
# load the NER tagger
tagger = SequenceTagger.load_from_file(model_output_folder + '/final-model.pt')

# run NER over sentence
tagger.predict(sentence)
print(sentence)
print('The following NER tags are found:')

# iterate over entities and print
for entity in sentence.get_spans('ner'):
    print(entity)

2019-03-15 15:58:23,845 loading file medicine-ner/final-model.pt
Sentence: "
Previous studies have demonstrated that glucocorticoid hormones, including dexamethasone, induced alterations in intracellular calcium homeostasis in acute lymphoblastic leukemia (ALL) cells. However, the mechanism by which intracellular calcium homeostasis participates in dexamethasone sensitivity and resistance on ALL cells remains elusive. Here, we found that treatment of cells with dexamethasone resulted in increased intracellular calcium concentrations through store-operated calcium entry stimulation, which was curtailed by store-operated calcium channel blockers. We show that BAPTA-AM, an intracellular Ca2+ chelator, synergistically enhances dexamethasone lethality in two human ALL cell lines and in three primary specimens. This effect correlated with the inhibition of the prosurvival kinase ERK1/2 signaling pathway. Chelating intracellular calcium with Bapta-AM or inhibiting ERK1/2 with PD98059 signific

In [11]:
from flair.data import Sentence
from flair.models import SequenceTagger

conll_03_corps_folder = 'final_corps'
model_output_folder = 'medicine-ner'


# load the NER tagger
tagger = SequenceTagger.load_from_file(model_output_folder + '/final-model.pt')

def detect(tagger, text):
    print('===============================================')
    sentence = Sentence(text)
    tagger.predict(sentence)
    print(sentence)
    print('--------------------------------')

    # iterate over entities and print
    for entity in sentence.get_spans('ner'):
        print(entity)


2019-03-15 15:58:33,137 loading file medicine-ner/final-model.pt


# Example Fluoxetine

### PMC article : Fluoxetine

![title](fluoxetine_article.png)

In [18]:
detect(tagger, """Chronic Fluoxetine Induces the Enlargement of Perforant Path-Granule Cell Synapses in the Mouse Dentate Gyrus""")

Sentence: "Chronic Fluoxetine Induces the Enlargement of Perforant Path-Granule Cell Synapses in the Mouse Dentate Gyrus" - 15 Tokens
--------------------------------
MEDICINE-span [2]: "Fluoxetine"


### Twitter : Fluoxetine

![title](fluoxetine_user.png)

In [19]:
# Recognize
detect(tagger, """sleepless nights, feeling worthless, lifes trash, this shit aint worth it man, fluoxetine isnt doing nothing. ive come to conclusion life is fucking worthless and i wish everyone the best, fuck this, i cant handle this shit mentally anymore, fuck life,  im done with life.  bye""")

Sentence: "sleepless nights, feeling worthless, lifes trash, this shit aint worth it man, fluoxetine isnt doing nothing. ive come to conclusion life is fucking worthless and i wish everyone the best, fuck this, i cant handle this shit mentally anymore, fuck life, im done with life. bye" - 46 Tokens
--------------------------------
MEDICINE-span [13]: "fluoxetine"


## example fluoxetine twitter 1

![title](fluoxetine_pmc.png)

In [20]:
# Twitter 
detect(tagger, """Publication alert: combining CBT with #fluoxetine might be superior to either therapy for adolescents with #depression. Model-based random forest method applied in a study by @HeidiBaya Seibold, T.Hothorn, S.Foster, M.Mohler-Kuo, """)

Sentence: "Publication alert: combining CBT with #fluoxetine might be superior to either therapy for adolescents with #depression. Model-based random forest method applied in a study by @HeidiBaya Seibold, T.Hothorn, S.Foster, M.Mohler-Kuo," - 30 Tokens
--------------------------------
ORG-span [4]: "CBT"
MEDICINE-span [6]: "#fluoxetine"


### Not detected

In [21]:
# Problem 
detect(tagger, """Starting back on fluoxetine tonight, I did pick up the prescription a few days ago but in the past when I’ve started/upped meds I’ve had to call in sick to work due to side effects. So I waited. Now I have 3 days off to adjust""")

Sentence: "Starting back on fluoxetine tonight, I did pick up the prescription a few days ago but in the past when I’ve started/upped meds I’ve had to call in sick to work due to side effects. So I waited. Now I have 3 days off to adjust" - 46 Tokens
--------------------------------
ORG-span [21]: "I’ve"
PER-span [24]: "I’ve"
DATE-span [42,43]: "3 days"
