## Co-Occurence of Term Analysis

Co-occurence of terms analysis: check how often pre-selected cognitive terms appear in abstracts with ERP terms. 

This analysis searches through pubmed for papers that contain specified ERP and COG terms. Data extracted is the count of the number of papers with both terms. This is used to infer what cognitive terms each ERP is affiliated with. 

NOTE:
- COG terms here are a somewhat arbitrary selection: need a better set of terms, less arbitrarily selected. 

In [8]:
# Import custom code
from erpsc.count import *

In [9]:
# Initialize object for term count co-occurences. 
counts = ERPSCCount()

In [10]:
# Load ERPS and terms from file
counts.set_erps_file('erps.txt')
counts.set_terms_file('cognitive_terms.txt')

In [3]:
# OR: Set small set of ERPs and terms for tests

# Small test set of words
erps = ['N400', 'P600']
cog_terms = ['language', 'memory'] 

# Add ERPs and terms
counts.set_erps(erps)
counts.set_terms(cog_terms)

In [11]:
# Scrape the co-occurence of terms data
counts.scrape_data()

In [12]:
# Check the most commonly associated COG term for each ERP
counts.check_cooc_erps()

For the  P50   the most common association is 	 cognitive  with 	 %06.65
For the  P100  the most common association is 	 vision     with 	 %10.50
For the  P200  the most common association is 	 auditory   with 	 %32.91
For the  P300  the most common association is 	 auditory   with 	 %20.94
For the  P3a   the most common association is 	 auditory   with 	 %51.16
For the  P3b   the most common association is 	 attention  with 	 %41.61
For the  P400  the most common association is 	 cognitive  with 	 %12.64
For the  P600  the most common association is 	 language   with 	 %58.72
For the  N50   the most common association is 	 somatosensory with 	 %09.29
For the  N100  the most common association is 	 auditory   with 	 %63.55
For the  N170  the most common association is 	 face       with 	 %73.54
For the  N200  the most common association is 	 cognitive  with 	 %34.24
For the  MMN   the most common association is 	 auditory   with 	 %67.71
For the  N2b   the most common association is 	 

In [13]:
# Check the most commonly associated ERP for each term
counts.check_cooc_terms()

For  language             the strongest associated ERP is 	 P600  with 	 %58.72
For  memory               the strongest associated ERP is 	 N400  with 	 %22.05
For  attention            the strongest associated ERP is 	 N2pc  with 	 %89.10
For  motor                the strongest associated ERP is 	 MMN   with 	 %14.58
For  decision making      the strongest associated ERP is 	 ERN   with 	 %04.68
For  vision               the strongest associated ERP is 	 CNV   with 	 %11.02
For  pain                 the strongest associated ERP is 	 LDN   with 	 %12.98
For  auditory             the strongest associated ERP is 	 MMN   with 	 %67.71
For  emotion              the strongest associated ERP is 	 N170  with 	 %16.61
For  categorization       the strongest associated ERP is 	 N170  with 	 %07.41
For  reward               the strongest associated ERP is 	 ERN   with 	 %05.99
For  spatial              the strongest associated ERP is 	 N2pc  with 	 %37.07
For  somatosensory        the strongest 

In [14]:
# Check the terms with the most papers
counts.check_top()

The most studied ERP is  ['SEP', 'somatosensory evoked potential']  with  2210240 papers
The most studied term is  social  with   691910  papers


In [15]:
# Check how many papers were found for each term - ERPs
counts.check_counts('erp')

['P50'] -    70880
['P100', 'P1'] -     2218
['P200', 'P2'] -      869
['P300', 'P3'] -    10283
['P3a'] -      819
['P3b'] -      805
['P400'] -      269
['P600'] -      545
['N50'] -      603
['N100', 'N1'] -      919
['N170'] -      945
['N200', 'N2'] -      628
['MMN', 'mismatch negativity', 'N2a'] -     2332
['N2b'] -      496
['N2c'] -      188
['N270'] -       56
['N2pc'] -      321
['N300'] -      165
['N400'] -     1891
['N600'] -       12
['C1'] -    30161
['LPC', 'late positive component'] -     2732
['MRPC', 'movement-related cortical potential'] -       45
['CNV', 'contingent negative variation'] -     6490
['PINV', 'post-imperative negative variation'] -       93
['ERN', 'Ne', 'error related negativity'] -      918
['ELAN', 'early left anterior negativity'] -      521
['CPS', 'closure positive shift'] -     6649
['LRP', 'lateralized readiness potential'] -     3371
['LDN', 'late difference negativity'] -      578
['ORN', 'object related negativity'] -     2537
['SEP', 'so

In [16]:
# Check how many papers were found for each term - COGs
counts.check_counts('term')

language           -     144340
memory             -     222760
attention          -     345367
motor              -     364520
decision making    -     154464
vision             -     159812
pain               -     592870
auditory           -     116744
emotion            -      29502
categorization     -      11713
reward             -      32465
spatial            -     230700
somatosensory      -      34872
face               -     169128
cognitive          -     289037
awareness          -     113428
tactile            -      14359
pain               -     592870
learning           -     290473
reasoning          -      16814
social             -     691910


In [17]:
# Save pickle file of results
counts.save_pickle('test')

In [18]:
# Load from pickle file
counts = load_pickle_counts('test')

## TEST CODE

In [None]:
# Make a wordcloud
from wordcloud import WordCloud
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
wordcloud = WordCloud().generate_from_frequencies(words_analysis.results[0].freqs)

In [None]:
type(words_analysis.results[0].freqs)

words_analysis.results[0].freqs.plot(500)

In [None]:
words_analysis.results[0].freqs

## Test Code

In [None]:
# TEST IMPORTS
#import requests
#import nltk
#from bs4 import BeautifulSoup

In [None]:

#page = requests.get('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&field=word&term=“N270”AND”Language”')
#page = requests.get('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&field=word&term=“P300”&retmax=10')
page = requests.get('http://www.ncbi.nlm.nih.gov/pubmed/27354714')

page_soup = BeautifulSoup(page.content)

#counts = page_soup.find_all('count')

#for i in range(0, len(counts)):
#    count = counts[i]
#    ext = count.text
#    print int(ext)

art_page = requests.get('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=' + id_strs)

art_page_soup = BeautifulSoup(art_page.content, "xml")

In [None]:
aa = ERPC()
aa.set_path('Users')