Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

README.md

Natural Language Processing of Zooniverse Talk Data

[using Python+NLTK]

Basics

Python script to train a Naive Bayesian Classifier with NLTK - based on https://github.com/abromberg/sentiment_analysis_python

Classifier is trained using 1.6M Tweets pre-procesed at Sanford and available at http://help.sentiment140.com/for-students. Other training data can also be used but is not saved in the repo's training-data folder because it's too large.

Script and HTML template are designed for specific Zooniverse data. This is extracted from the Zooniverse discussion platform 'Talk' - please contact rob@zooniverse.org for more information.

I/O

Inputs are a CSV dump of text comments, and NLTK+training data. Outputs are CSV for of sentiment scores, and HTML files to show positive and negative comments

It runs with the filename as a param, i.e. python process_comments.py example_input_file.csv

Example Results

The most positive sentiment images from Galaxy Zoo based on Talk threads with 5 or more comments. The most positive sentiment images from Snapshot Serengeti based on Talk threads with 5 or more comments.

Example Image

Images are linked to Talk page, and shown with:

  • Zooniverse ID in the top-left
  • Number of comments top-right
  • Positive and Negative scores in the bottom-left (colour-coded)

About

Natural Language Processing of Zooniverse Talk Data

Resources

License

Releases

No releases published

Languages

You can’t perform that action at this time.