## creating a dict-based sentiment analyser

In [26]:
import pandas as pd 
import nltk
from IPython.display import display
pd.set_option("display.max_columns", None)

# load in small_corpus.csv file that was created in dataset_prep.ipynb

In [27]:
reviews = pd.read_csv("small_corpus.csv")
reviews.head()

Unnamed: 0.1,Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,reviewerName,reviewText,summary,unixReviewTime,vote,style,image
0,52054,1.0,False,"07 5, 2004",A14NLQE68Z29I,B0000TSR4C,B. Convery,"No, it wasn't. I remember being very excited when Driver 1 was supposed to come out. You were supposed to be a gangster, running from the law, a game built oin the same engine (same physics) as the destruction derby games.\nBUT, whoever made Driver 1 thought it way too controversial to make a villian as the lead, so they decided to make it an ""on the run"" police officer. that was weak and i lost interest in it, don't care how great the game was. do i want a watered down version of an artist's vision? please, why don't you put that fig leaf on ""David""'s unit in Florence.\nSo, here we go, now we have driv3r, a couple years after the GTA wave breaks. yay! in driv3r they finally let you get out of the car and do foot missions, fight, shoot, run over people, now that the coast is clear. there is nothing innovative about what they are trying to do here. driv3r is not the original. any company that forfeits it's edginess to be politically correct will not get my money. might as well have tipper gore in the passenger seat the whole time, telling tanner what he could and could not do.","""GTA 3D was spawned by the Driver series!!!!""",1088985600,2.0,{'Platform:': ' PlayStation2'},
1,400332,1.0,True,"03 9, 2017",A3R2NG8D67YRQG,B00XWE60HS,leopoldo Ramirez,:( lol,One Star,1489017600,,{'Format:': ' Video Game'},
2,400537,1.0,False,"06 24, 2016",A3SXA8CICSY67W,B00XWQZPQ8,MackJackWaggon,"Here's all you need to know about this game. It all happens at night in the rain. It's incredibly repetitive and it feels like you are playing the same missions every time, it literally gets boring after race 5.\n\nThe story line cut scenes are RIDICULOUS! they are all live action, it's basically people in their 40s spouting bad dialogue in Ebonics for some reason, doesn't add to the game or draw you in.\n\nNo manual transmission. It's dumbed down game play that gets boring instantly.\n\nNo wheel support, I'm serious! NO WHEEL SUPPORT IN A TRIPLE A RACING GAME IN 2015!! WTF!!!!!\n\nPlease, for the love of God, do not buy this title, send a message to these people. Nobody wants this Dog crap game.",This game is a spectacular failure.,1466726400,,{'Format:': ' Video Game'},
3,492564,1.0,True,"03 7, 2017",A5IALBY2QXSS,B00O2D9PBQ,NotTakinItAnymore,Game Sucked.\nNuff Said.,Sucks.,1488844800,,,
4,424209,1.0,False,"12 9, 2015",A3HDE6BKZVGKTG,B016NZFGP4,shwiftyone,"The game is incredibly limited. The graphics are pretty solid, there just isn't enough in the game to keep it interesting for more than a day or two. All of the weapons you grind to level up for have major drawbacks and you just end up going back to the guns you got early on. The game gets super boring super fast. I played for about 3 days and haven't touched it since. Total waste of money.",Welcome to Call of Duty Star Wars without any option for a story mode or zombies,1449619200,10.0,"{'Edition:': ' Exclusive SteelBook', 'Platform:': ' PlayStation 4'}",


## tokenising words and sentences in the reviews
which word tokeniser is best for this case? (stop words is not suitable for sentiment analysis!)
1. treebank word tokeniser
2. casual tokeniser
3. stemming
4. lemmatisation

## treebank word tokeniser

In [28]:
from nltk.tokenize import TreebankWordTokenizer
from string import punctuation
import string

In [29]:
tb_tokeniser = TreebankWordTokenizer()

# clean code: lowercase, remove whitespaces and punctuation
def clean_text(text):
    text = str(text)
    
    # maps each punctuation to None - in format: maketrans(str to be replaced, str to be replacing, str to dlt)
    translator = str.maketrans("","",punctuation)
    text = text.translate(translator)
    
    # remove html breaks with a space
    text = text.replace("<br />", " ")
    
    text = text.lower()
    
    return text

reviews["rev_text_lower"] = reviews["reviewText"].apply(clean_text)

In [30]:
reviews[["reviewText","rev_text_lower"]].sample(4)

Unnamed: 0,reviewText,rev_text_lower
1631,What I don't liked about the game was that you're in an unreal city. I like Gran Turismo cause you compete on real locations. The other thing I didn't like is that you've to use many unrealistic things like the turbo function in order to win and not your real skill as a pilot.,what i dont liked about the game was that youre in an unreal city i like gran turismo cause you compete on real locations the other thing i didnt like is that youve to use many unrealistic things like the turbo function in order to win and not your real skill as a pilot
267,"very silly retarded game , childish boring story .",very silly retarded game childish boring story
3914,A very awesome game,a very awesome game
372,"This game is not fun, and I have trouble believing that anyone who actually played it would give it a good review. The 'gameplay' consists of a bug named Issun telling you, and often showing you, exactly what to do. Then there is a great deal of dialogue, and then the\nbug tells/shows you what to do next, and it all repeats. The graphics and sound are decent, but nothing you can't find in any number of graphic novels or games. I think that there is a consensus that one is supposed to say good things about this game because it is very artsy looking and off the beaten track. The 'gameplay' itself is garbage though; I wish I could get my money back, and I got it used for $9.99.",this game is not fun and i have trouble believing that anyone who actually played it would give it a good review the gameplay consists of a bug named issun telling you and often showing you exactly what to do then there is a great deal of dialogue and then the\nbug tellsshows you what to do next and it all repeats the graphics and sound are decent but nothing you cant find in any number of graphic novels or games i think that there is a consensus that one is supposed to say good things about this game because it is very artsy looking and off the beaten track the gameplay itself is garbage though i wish i could get my money back and i got it used for 999


In [33]:
# tokenisation
reviews["tb_tokens"] = reviews["rev_text_lower"].apply(lambda rev: tb_tokeniser.tokenize(str(rev)))

In [34]:
pd.set_option("display.max_colwidth", None)
reviews[["reviewText","tb_tokens"]].sample(3)

Unnamed: 0,reviewText,tb_tokens
1569,"I have played the game but haven't finished it. The lack of a proper joystick controller makes this game a *needs improvement*.\nNot a big fan of games that actually remove game pad controller functionality to force one to use the keyboard.\nyes, there's way of working around it, however why remove it to begin with ?\n\nI have placed this in the closet and will be taking a look at it when I'm finished playing all my other games that don't for to have to be looking constantly at the keyboard and missing the action that's on the BIG screen..","[i, have, played, the, game, but, havent, finished, it, the, lack, of, a, proper, joystick, controller, makes, this, game, a, needs, improvement, not, a, big, fan, of, games, that, actually, remove, game, pad, controller, functionality, to, force, one, to, use, the, keyboard, yes, theres, way, of, working, around, it, however, why, remove, it, to, begin, with, i, have, placed, this, in, the, closet, and, will, be, taking, a, look, at, it, when, im, finished, playing, all, my, other, games, that, dont, for, to, have, to, be, looking, constantly, at, the, keyboard, and, missing, the, action, thats, on, the, big, screen]"
1218,Doesn't fit in any of my controllers. Very stiff movement.,"[doesnt, fit, in, any, of, my, controllers, very, stiff, movement]"
4362,"From the minute I met Maya, my new personal trainer I was hooked. Maya is a virtual fitness ""person"" who will help you meet your fitness goals.\n\nThe program begins with Maya greeting you and giving you a fitness assessment. Here you will input your height, weight and activity level and you will do a series of exercises; crunches, sit-ups, push-ups, and jumping jacks. Then you check your heart rate and input the information. Based on your level of fitness Maya will recommend an exercise program for you. She will tell you what you need to do and how many days a week you need to do it, to help you achieve your goals.\n\nMaya will give you a selection of spaces to exercise in and ask what equipment you have. If you own a stepper, stability ball, weights and heart rate monitor, she will incorporate these into the workout. I would highly recommend adding all the equipment - especially the weights and heart rate monitor as these can really help you achieve your fitness goals.\n\nAfter selecting your space (different virtual rooms with a different feel to each one) Maya begins your personalized program. During the workout she will ask you how you are feeling or you can at any time increase or decrease the intensity of the workout. At the bottom of the screen a scroll tells you what is coming up next in the workout ie squats, sidesteps, a stepper etc. You can also change the music selection - over 70 songs are available including hip hop, latin and 80's. The bottom also counts down the time of your workout - a fabulous feature.\n\nOver time as you exercise, you will ""unlock"" treats from Maya like new workout spaces. Because she has so many ""moves"" your execise time with Maya will vary from one day to the next and keep your interest.\n\nMaya also will hold you accountable. If you miss a day she will ask (nicely) where you were. As you exercise she will encourage you. Maya's voice is a beautiful, easy to listen to voice and you can lower or raise the volume of her voice and/or the music.\nMaya herself is a buff looking brunette, it would have been nice if one could personalize their trainer's looks, and I will bet that's not to far down the road!\n\nMaya also has a peaceful room where she will lead you in a nice yoga interlude. As you progress through the program Maya will help you adjust your goals and set new ones.\n\nShe will also set meal plans for you if desired.\n\nI have to admit Maya has amazed me, besides kicking my booty in our exercise sessions. And the price point of $35 is so reasonable. Others love it too and there is an online community at [...] devoted to discussions on this program.\n\nHighly recommended!\n\nUPDATE: I had a question about the program and was very pleased with the Yourself! Fitness representative Caroline's fast response and helpfulness. Double delight - a great program with super customer service!","[from, the, minute, i, met, maya, my, new, personal, trainer, i, was, hooked, maya, is, a, virtual, fitness, person, who, will, help, you, meet, your, fitness, goals, the, program, begins, with, maya, greeting, you, and, giving, you, a, fitness, assessment, here, you, will, input, your, height, weight, and, activity, level, and, you, will, do, a, series, of, exercises, crunches, situps, pushups, and, jumping, jacks, then, you, check, your, heart, rate, and, input, the, information, based, on, your, level, of, fitness, maya, will, recommend, an, exercise, program, for, you, she, will, tell, you, what, you, need, to, do, and, how, many, ...]"


## casual tokeniser

In [35]:
from nltk.tokenize.casual import casual_tokenize 

In [39]:
reviews["casual_tokens"] = reviews["rev_text_lower"].apply(lambda rev: casual_tokenize(str(rev)))

reviews[["reviewText", "casual_tokens", "tb_tokens"]].sample(4)

Unnamed: 0,reviewText,casual_tokens,tb_tokens
3560,"If you're a beatles fan of any form, this is a must have. 'nuff said\n(some good songs are missing from basic tracklist, but are available as payed downloads)","[if, youre, a, beatles, fan, of, any, form, this, is, a, must, have, nuff, said, some, good, songs, are, missing, from, basic, tracklist, but, are, available, as, payed, downloads]","[if, youre, a, beatles, fan, of, any, form, this, is, a, must, have, nuff, said, some, good, songs, are, missing, from, basic, tracklist, but, are, available, as, payed, downloads]"
2279,"The description says it is made for DS and 3DS games but here's the problem, once you get the games in it is harder than ever to get them OUT. So I have to spend time modifying the case with a pair of scissors just so I don't break my games. My advice is to think before buying this product because you run the risk of breaking your games.","[the, description, says, it, is, made, for, ds, and, 3ds, games, but, heres, the, problem, once, you, get, the, games, in, it, is, harder, than, ever, to, get, them, out, so, i, have, to, spend, time, modifying, the, case, with, a, pair, of, scissors, just, so, i, dont, break, my, games, my, advice, is, to, think, before, buying, this, product, because, you, run, the, risk, of, breaking, your, games]","[the, description, says, it, is, made, for, ds, and, 3ds, games, but, heres, the, problem, once, you, get, the, games, in, it, is, harder, than, ever, to, get, them, out, so, i, have, to, spend, time, modifying, the, case, with, a, pair, of, scissors, just, so, i, dont, break, my, games, my, advice, is, to, think, before, buying, this, product, because, you, run, the, risk, of, breaking, your, games]"
1520,"After days of playing this game, here is my review. Campaign story line, and graphics are grate. Now Multipleyer graphics are ok. The reason why I give only two stars is because, I have to agree with other people about hackers. There is a difference between good players and hackers, good players die when you shoot them, Hackers you can unloads a hole gun and don't die. Also I have good Internet connection but my games freezes when playing multiplayer slot. I spoken close to enemy's, and o yea this game signs me out of PSN all the time is really frustrating... All I can say is Amazon did grate in there part. Also I suggest to everyone just to rent the game from Redbox then if you like it buy it. Again Amazon did grate in there part.","[after, days, of, playing, this, game, here, is, my, review, campaign, story, line, and, graphics, are, grate, now, multipleyer, graphics, are, ok, the, reason, why, i, give, only, two, stars, is, because, i, have, to, agree, with, other, people, about, hackers, there, is, a, difference, between, good, players, and, hackers, good, players, die, when, you, shoot, them, hackers, you, can, unloads, a, hole, gun, and, dont, die, also, i, have, good, internet, connection, but, my, games, freezes, when, playing, multiplayer, slot, i, spoken, close, to, enemys, and, o, yea, this, game, signs, me, out, of, psn, all, the, time, is, ...]","[after, days, of, playing, this, game, here, is, my, review, campaign, story, line, and, graphics, are, grate, now, multipleyer, graphics, are, ok, the, reason, why, i, give, only, two, stars, is, because, i, have, to, agree, with, other, people, about, hackers, there, is, a, difference, between, good, players, and, hackers, good, players, die, when, you, shoot, them, hackers, you, can, unloads, a, hole, gun, and, dont, die, also, i, have, good, internet, connection, but, my, games, freezes, when, playing, multiplayer, slot, i, spoken, close, to, enemys, and, o, yea, this, game, signs, me, out, of, psn, all, the, time, is, ...]"
3553,origina brand new,"[origina, brand, new]","[origina, brand, new]"
