Confidence Scanner

General

The Confidence Scanner is a project for automated collection and analysis of scientific text samples. The functionality can be broken into two primary steps. In the data collection phase, paper abstracts are gathered from PubMed, a repository for scientific papers specifically in Medicine and the Life Sciences, and press releases are scraped from EurekAlert, a public database for scientific press releases from a variety of research institutions and fields. Analysis consists of four different tests--Readability, Sentiment, Subjectivity, and Confidence. A paper written for the CogSci 2018 conference further detailing the analysis can be found here and a conference poster can be found here.

Organization and Usage

consc

Contains all functionality for web scraping both using the PubMed EUtils API and EurekAlert's Advanced Search feature.

analysis

Readability is based primarily around the Flesch-Kincaid Reading Ease score, although calculation is also done for other measures of textual difficulty such as SMOG Index. Sentiment is measured using both the VADER and Liu Hu lexicons, independently shown to indicate the polarity of text samples. Subjectivity is measured with an SVM trained on NLTK's built in subjectivity lexicon. Confidence analysis utilizes a Linguistic Inquiry and Word Count (LIWC) method with a lexicon curated by the researchers.

scripts

Scripts are separated into analysis and collection. The collection script can be run to retrieve press releases and paper abstracts for given search terms published in a desired date range. The analysis scripts can be run on this data.

data

The data used in the writing of the paper is available in the Data folder on GitHub.

Development

Functions were developed and tested in Python 2.7 and Python 3.4, and may not be fully compatible with other versions of Python.

Software developed in collaboration with Tom Donoghue and the lab of Professor Voytek at UC San Diego and takes some inspiration from ERP_SCANR: https://github.com/TomDonoghue/ERP_SCANR.

Constructive criticism, corrections, and potential improvements are welcome.

E-mail: wdfox3{at}gmail{dot}com

Citation

If this software was useful for your work, please credit it by citing our paper:

W. FOX, T. DONOGHUE. Confidence Levels in Scientific Writing: Automated Mining of Primary Literature and Press Releases. Cognitive Science Society. Madison, WI, USA, 2018.

Dependencies

NumPy
NLTK
BeautifulSoup
Selenium
textstat (Python 2.7 Only)

Name		Name	Last commit message	Last commit date
Latest commit History 227 Commits
Data		Data
consc		consc
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Confidence Scanner

General

Organization and Usage

consc

analysis

scripts

data

Development

Citation

Dependencies

About

Releases

Packages

Contributors 2

Languages

wdfox/ConfidenceScanner

Folders and files

Latest commit

History

Repository files navigation

Confidence Scanner

General

Organization and Usage

consc

analysis

scripts

data

Development

Citation

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages