## ERP SCANR

This notebook is the overview notebook for the ERP-SCANR project (erpsc).

ERPSC is an attempt to use automated web-scraping and text mining to summarize research on ERPs. Hopefully this project will serve as a type of automated meta-analysis, and also a way to pull out patterns in ERP research.

Logic:
- At it's core, the erpsc code provides a base to set terms - both ERP terms and other terms, such as cognitive and/or disease terms - and mine papers for these terms, using the NCBI e-utils. You can also specify a list of 'exclusion words' that will be specifically avoided in ERP searches. 

Current analysis takes two forms:
- Co-Occurence analysis: searches for co-occurences of terms (ERPs and others), and simply analyses how commonly these terms occur together, in both absolute and relative terms. 
- Word analysis:

Settings:
- Several settings can be set and are explorable. For example:
    - Database: EUtils can search different databases, most relevantly, 'pubmed' or 'pmc', which have some differences.
    - Search Area: although somewhat contingent upon the database, the search area can be set, for all 'TIAB' (title and abstract), 'ARTI' (all available article text), WORD (all searchable words), etc. 

Future Work:
- 




The code in this notebook simply uses the base class to load and check the terms 

NOTE:
- Known issue: Some ERP terms often return papers where the same name is used for something else. Will need some sort of quality control procedure to check that the papers that are scraped actually refer to what is wanted. 

These notebooks just run the code and display results. The actual code is in the 'erpsc' custom module. 

In [None]:
# Import custom code
from erpsc.base import Base

In [2]:
# Load a test object to test all terms
check = Base()

In [3]:
# Load erps and cognitive terms from file
check.set_erps_file()
check.set_terms_file('cognitive')

In [4]:
# Check the ERPs that are being used. Entries in the same line as used as synonyms.
check.check_erps()

List of ERPs used: 

P40
P50
P60
P100, P1
P200, P2
P300, P3
P3a
P3b
P340
P350
P400
P600, syntactice positive shift,  SPS
N50
N60
N80
N100,  N1
N140
N170, Vertex Positive Potential, VPP
N200, N2
MMN, mismatch negativity, N2a
N2a
N2b
N2c
N240
N270
N2pc
N300
N400
N450
N600
C1
LPC, late positive component
LPP,  late Positive Potential
MRPC, movement-related cortical potential
CNV, contingent negative variation
PINV, post-imperative negative variation
ERN, Ne, error related negativity
ELAN, early left anterior negativity
CPS, closure positive shift
LRP, lateralized readiness potential
LDN, late difference negativity
ORN, object related negativity
SEP, somatosensory evoked potential
VsEP, vestibular evoked potential
BP, pre-motor potential, readiness potential, Bereitschaftspotential
Pe, error-related positivity, error positivity, post-error positivity
CRN, correct-related negativity, correct response negativity
MFN, Medial frontal negativity
SPN, Stimulus-Preceding Negativity
ADAN, Anterior

In [5]:
# Check the cognitive terms used
check.check_terms()

List of terms used: 

attention
auditory, audition
awareness
categorization
cognitive
decision making
emotion, emotional
face, facial
language
learning
memory
motor
pain
phonology, phonological
reasoning
reward
semantic, semantics
spatial
speech
social
somatosensory
tactile
vision, visual


In [6]:
# Load the disease terms to check
check.set_terms_file('disease')

Unloading previous terms words.


In [7]:
# Check the disease terms
check.check_terms()

List of terms used: 

ADHD
alchohol abuse
alzheimer
anorexia
anxiety
aphasia
autism
bipolar
dementia
depression
dyslexia
epilepsy, seizure
insomnia
migraine
parkinson
PTSD
psychosis
schizophrenia
stroke


In [8]:
# Add exclusion words
check.set_exclusions_file()

In [9]:
# Check the exclusion terms used
check.check_exclusions()

List of exclusion words used: 

P40	 : 
P50	 : protein
P60	 : 
P100	 : gene
P200	 : 
P300	 : protein
P3a	 : 
P3b	 : 
P340	 : 
P350	 : 
P400	 : protein
P600	 : 
N50	 : genome
N60	 : 
N80	 : 
N100	 : 
N140	 : 
N170	 : 
N200	 : 
MMN	 : 
N2a	 : 
N2b	 : ttn
N2c	 : cell
N240	 : 
N270	 : 
N2pc	 : 
N300	 : 
N400	 : 
N450	 : 
N600	 : 
C1	 : istamycin
LPC	 : liver
LPP	 : 
MRPC	 : cell
CNV	 : number
PINV	 : eiec
ERN	 : 
ELAN	 : purkinje
CPS	 : antioxidant
LRP	 : 
LDN	 : cell
ORN	 : strain
SEP	 : 
VsEP	 : genetic
BP	 : protein
Pe	 : 
CRN	 : 
MFN	 : 
SPN	 : 
ADAN	 : 
FRN	 : 
Nc	 : 
Pd	 : 
EDAN	 : 
ADAN	 : 
NSW	 : 
PSW	 : 
EPN	 : 
SN	 : 
