# Introduction

- Author: Yunho Maeng (Yonsei University) yunhomaeng@yonsei.ac.kr
- Paper link: TDB
- This code guide you how to extract features from reviews using sentiment

## Sentiment Score Extraction
### NLTK 
- Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
https://www.nltk.org/api/nltk.sentiment.html
- Reference code , Grishma Jena(2019) https://colab.research.google.com/drive/1Q4yorsQT6eVHmd8H07h7diinQEhCkyng#scrollTo=iFkqRxTo3iVc
- Paper used the `compound score` to extract sentiment feature. Please see more discussion here. https://stackoverflow.com/questions/40325980/how-is-the-vader-compound-polarity-score-calculated-in-python-nltk

In [1]:
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer 

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/yunho/nltk_data...


In [3]:
restaurant_reviews = ["Great place to visit in Nashville. I like this movie but the theater is not comfortable",
"The food took too long to come, the service was slow.",
"Everything was amazing.",
"Place closed down a month ago.",
"Had to wait in line for an hour, but the food was worth the wait.",
"I like this movie but the theater is not comfortable",
"The hero is not good man but story is so great",
]
  
sentiment_analyzer = SentimentIntensityAnalyzer()
for sentence in restaurant_reviews:
     print(sentence)
     # NLTK Sentiment Score
     print(sentiment_analyzer.polarity_scores(sentence)['compound'])

Great place to visit in Nashville. I like this movie but the theater is not comfortable
-0.0652
The food took too long to come, the service was slow.
0.0
Everything was amazing.
0.5859
Place closed down a month ago.
0.0
Had to wait in line for an hour, but the food was worth the wait.
0.3291
I like this movie but the theater is not comfortable
-0.422
The hero is not good man but story is so great
0.8738


### Watson NLU version (if you needed) 
- This is initial version of mine. To use this approach more generally, I added the code as an open source version. 
- Github Link (Gist) https://gist.github.com/yunho0130/1d6edb1ac9b002480fa4a72de3f8de89 (This link is currently `private`. After publishing paper, it would be open to public) 


In [0]:
from google.colab import drive
drive.mount('/content/drive')

In [0]:
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Tue Jun  6 15:58:06 2017
@author: Yunho
"""

# Import Package

import sys
import os
import pickle 
import json
import csv
sys.path.append(os.path.join(os.getcwd(),'..'))
import watson_developer_cloud
import watson_developer_cloud.natural_language_understanding.features.v1 as features
print "--sys.version--"
print sys.version

# parameter management
csv_file = "VR_HMD_DATA_ALL_old.csv"
csv_output_file ="test_data_w.csv"

wcs_username='your-own-watson-username'
wcs_password='your-watson-password'
pickle_csv_senti = 'temp_NLU_csv_dict.csv'


# Watson API Credential
input_test =  u"Submerge yourself into the Galaxy VR world. Very fun and easy to work, love the controller, however not all games in oculus store are compatible. Also haven't figured out how to use as normal vr without oculus. but deffinetly worth every penny invested. Face your fears is a must download, feels so REAL!!!"

def watson_nlu_setiment(raw_text_input,nlu_name=wcs_username,nlu_pass=wcs_password):
    try:
        nlu = watson_developer_cloud.NaturalLanguageUnderstandingV1(version='2017-02-27',
        username=nlu_name,password=nlu_pass)
        raw_text = raw_text_input
    # API implementation Sentiment
        
        print '=== NLU runing ==='
        nlu.analyze(text=raw_text, features=[features.Sentiment()])
        
        sen_result = nlu.analyze(text=raw_text, features=[features.Sentiment()])
        # string to json
        json_sen_result = json.dumps(sen_result)
        # json to dict
        dict_sen_result = json.loads(json_sen_result)
        # dict to score 
        sentiment_score = dict_sen_result['sentiment']['document']['score']
    #    print sentiment_score
    except: 
        sentiment_score = 'NULL'
    return sentiment_score

# CSV file load and make a sentiment score
def csv_to_sent_score(pickle_file):
    reader = []
    temp_data = {}
    fieldnames = ['no', 'senti_score', 'comments']
    with open(pickle_csv_senti, 'w') as csvfile: 
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        with open(csv_file, "rU") as csvfile:
            reader = csv.DictReader(csvfile)
            print csvfile
            i = 1
            for row in reader:
                temp_no = row['no']
                writer.writerow({'no' : temp_no , 'senti_score' : watson_nlu_setiment(row['comments']), 'comments' : row['comments']})
                print i
                i = i+1
    return 1

## Main
print '=== main==='
csv_to_sent_score(pickle_csv_senti)
print '=== main end ==='
#print watson_nlu_setiment(input_test)

## Sentence segmentation
- Reference code, Grishma Jena(2019) https://colab.research.google.com/drive/1Q4yorsQT6eVHmd8H07h7diinQEhCkyng#scrollTo=iFkqRxTo3iVc

In [0]:
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize

In [0]:
sample_data = ["Today is a cold Sunday morning. I am at the Nashville School of Law. \
               I am here for PyTennessee where I can learn more about Python."]

In [0]:
def get_sent_tokens(data):
    """Sentence tokenization"""
    sentences = []
    for sent in data:
        sentences.extend(sent_tokenize(sent))
    print(sentences)
    return sentences

In [0]:
sample_sentences = get_sent_tokens(sample_data)

['Today is a cold Sunday morning.', 'I am at the Nashville School of Law.', 'I am here for PyTennessee where I can learn more about Python.']
