# Women's E-Commerce Clothing Reviews

## Joseph O'Malley & Sara Tuncten
### 10/6/2018


## Overview

https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews/home

This is a Women’s Clothing E-Commerce dataset revolving around the reviews written by customers. Its nine supportive features offer a great environment to parse out the text through its multiple dimensions. Because this is real commercial data, it has been anonymized, and references to the company in the review text and body have been replaced with “retailer”.


In [3]:
# import module(s) into namespace
import pandas as pd #we almost always need pandas because we like data frames
import numpy as np
pd.set_option('display.max_colwidth', 150) #important for getting all the text



In [4]:
## load in dataset from csv

filename = "/Users/ultrajosef/Documents/JosephOMalley_CodePortfolio/Python ~ Text Analysis/data/womens_clothing_data.csv"
review_df = pd.read_csv(filename, index_col = 0) 

##check to make sure it's in dataframe and dimensions
print(type(review_df))
print(review_df.shape)
#23846 rows, 10 columns
review_df.head()


<class 'pandas.core.frame.DataFrame'>
(23486, 10)


Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name
0,767,33,,Absolutely wonderful - silky and sexy and comfortable,4,1,0,Initmates,Intimate,Intimates
1,1080,34,,"Love this dress! it's sooo pretty. i happened to find it in a store, and i'm glad i did bc i never would have ordered it online bc it's petite. ...",5,1,4,General,Dresses,Dresses
2,1077,60,Some major design flaws,I had such high hopes for this dress and really wanted it to work for me. i initially ordered the petite small (my usual size) but i found this to...,3,0,0,General,Dresses,Dresses
3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, flirty, and fabulous! every time i wear it, i get nothing but great compliments!",5,1,0,General Petite,Bottoms,Pants
4,847,47,Flattering shirt,This shirt is very flattering to all due to the adjustable front tie. it is the perfect length to wear with leggings and it is sleeveless so it pa...,5,1,6,General,Tops,Blouses


In [5]:
pd.set_option('display.max_colwidth', 15000)
print(review_df['Review Text'][44:45])
print(type(review_df['Review Text']))

44    Tried this on today at my local retailer and had to have it. it is so comfortable and flattering. it's too bad the picture online has the model tucking it into the skirt because you can't see the ruching across the front. a little dressier alternative to a plain tee and reasonably priced for retailer. 5'8"" and i generally wear a 6, the small fit well. will probably be back for the black!
Name: Review Text, dtype: object
<class 'pandas.core.series.Series'>


### T1 - Text processing


Count vector with min/max frequency, N-grams, and stopwords.
This method combines several others and provides the best output of those tried.

In [6]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction import text
import math

#we've just created our corpus
#the resulting object type is an array

In [7]:
# set max df to not include anything occuring in over 50 percent of responses
# min_df to include only those in at least 1 percent of responses
# set binary to FALSE - we want counts
# set n-gram range 1-2
cv9 = CountVectorizer(binary=False, min_df= .01, ngram_range = (1, 2), stop_words='english') #define the transformation
# only asking it to make changes based on document frequency

## .value.astype('U') needed to transform data to unicode
cv9_chat = cv9.fit_transform(review_df['Review Text'].values.astype('U')) #apply the transformation

print(type(cv9_chat))
print(cv9_chat.shape)
pd.DataFrame(cv9_chat.toarray(),columns = cv9.get_feature_names()).head()

#23486 rows, 465 columns

<class 'scipy.sparse.csr.csr_matrix'>
(23486, 465)


Unnamed: 0,10,115,12,26,27,able,absolutely,absolutely love,actually,add,...,worn,worth,wouldn,xl,xs,xxs,year,years,yellow,zipper
0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [8]:
##check effects of stopwords on feature space

In [9]:
names = cv9.get_feature_names()   #create list of feature names
print(type(names), len(names))

count = np.sum(cv9_chat.toarray(), axis = 0) # convert list to array to add up feature counts 
count2 = count.tolist()  # convert numpy array to list

count_df = pd.DataFrame(count2, index = names, columns = ['count']) # create a dataframe from the list
sorted_count = count_df.sort_values(['count'], ascending = False)

print(sorted_count.head(20))
print(sorted_count.tail(10))


<class 'list'> 465
             count
dress        10567
love          8951
size          8772
fit           7325
like          7149
wear          6439
great         6117
just          5608
fabric        4798
small         4729
color         4605
look          4039
really        3925
ordered       3850
little        3775
perfect       3774
flattering    3519
soft          3343
comfortable   3060
cute          3042
                      count
good quality            246
totally                 245
fits great              245
sandals                 244
booties                 243
shows                   242
texture                 240
received compliments    240
fell love               239
fit true                237


## Edit stopwords

In [10]:
#NLTK Stopwords
from nltk.corpus import stopwords 
nltkstopwords = stopwords.words("english") #pull out the words within the default nltk stopwords list

#print(type(nltkstopwords))
#print(len(nltkstopwords))
#print(nltkstopwords)

#179 words 

In [11]:
#Sci kit learn stopwords
from sklearn.feature_extraction import text #import package

sklstopwords = text.ENGLISH_STOP_WORDS #pull out words in sklearn stopwords list.  Note the different syntax
print(type(sklstopwords))
print(len(sklstopwords))
#print(sklstopwords)

#318 words

<class 'frozenset'>
318


We will use sci kit stopwords since it includes more, we will add/remove a few words from this list then reapply our new stopword list to the count vectorizer

### Add and Remove words from stopword list

In [12]:
#look at the feature names we have left after parameters in cv9 - what do we need to remove? what should be included?
#print(names)

In [13]:
# we are starting with the most exhaustive list (Sci kit learn stopwords) and will adjust from there
# add to stoplist:"arrived", "area", "end", "ended", "got", "know", "went", "try", "said", "item", "probably?","thing","things","sides"
our_stopwords = set(sklstopwords)
print(len(our_stopwords))
add_set = set(['arrived','area','end','ended','got','know','went','try','said','item','probably','thing','things','sides'])
our_stopwords.update(add_set)
print(len(our_stopwords))

318
332


In [14]:
# remove from stoplist: 'never', 'least','less', 'hasnt', 'couldnt', 'cannot', 'not', 'no', 'un', 'but', "keep","well", "top", "move", "detail", "beyond", "thick", "anyone" ,very"
remove_set = set(['never', 'least','less', 'hasnt', 'couldnt', 'cant', 'cannot', 'not', 'no', 'un', 'but', 'keep', 'well','top','move','detail','beyond','thick','anyone','no','very'])
our_stopwords = our_stopwords.difference(remove_set) #only retains common words across both objects
print(len(our_stopwords))
print(type(our_stopwords))

312
<class 'set'>


In [15]:
#now we will see the length of our feature space using cv9 parameters with our_stopwords
cv10 = CountVectorizer(binary=False, min_df= .01, ngram_range = (1, 2), stop_words = our_stopwords) #define the transformation

## .value.astype('U') needed to transform data to unicode
cv10_chat = cv10.fit_transform(review_df['Review Text'].values.astype('U')) #apply the transformation

print(type(cv10_chat))
print(cv10_chat.shape)
pd.DataFrame(cv10_chat.toarray(),columns = cv10.get_feature_names()).head()

names10 = cv10.get_feature_names()   #create list of feature names
count10 = np.sum(cv10_chat.toarray(), axis = 0) # add up feature counts 
count10_2 = count10.tolist()  # convert numpy array to list
count_df10 = pd.DataFrame(count10_2, index = names10, columns = ['count']) # create a dataframe from the list
count_df10.sort_values(['count'], ascending = False)[0:10]  #arrange by count instead of alphabetical (top 20)
#495 columns

<class 'scipy.sparse.csr.csr_matrix'>
(23486, 483)


Unnamed: 0,count
but,16556
dress,10567
not,9799
love,8951
size,8772
very,8217
top,7418
fit,7325
like,7149
wear,6439





## T2 - Custom Sentiment Dictionary

Create a sentiment dictionary from one of the sources in class or find/create your own (potential bonus points for appropriate creativity). Using your dictionary, create sentiment labels for the text entries in your corpus.


In [16]:
pathname2 = "/Users/ultrajosef/Downloads/Dictionaries/"

In [17]:
### Trying HL sentiment dictionary

#This dictionary is created for social media, blogs, and reviews.  It is good for comparison (of opinions), which will work well 
#with the clothing dataset (as most people compare clothes and sizes relative to their personal baselines)

In [18]:
review_df = review_df[review_df['Review Text'].notnull()]
#adding review text column as one word to apply to function
review_df['reviewtext'] = review_df['Review Text']


In [19]:
##convert to lowercase
review_df['newtext'] = review_df['reviewtext'].str.lower()

In [20]:
# replace contractions
# code borrowed from http://stackoverflow.com/questions/27845796/replacing-words-matching-regular-expressions-in-python
import re

replacement_patterns = [
(r'won\'t', 'will not'),
(r'can\'t', 'cannot'),
(r'i\'m', 'i am'),
(r'ain\'t', 'is not'),
(r'(\w+)\'ll', '\g<1> will'),
(r'(\w+)n\'t', '\g<1> not'),
(r'(\w+)\'ve', '\g<1> have'),
(r'(\w+)\'s', '\g<1> is'),
(r'(\w+)\'re', '\g<1> are'),
(r'(\w+)\'d', '\g<1> would'),
(r'\W+', ' ')
]

class RegexpReplacer(object):
    def __init__(self, patterns=replacement_patterns):
        self.patterns = [(re.compile(regex), repl) for (regex, repl) in patterns]
    def replace(self, text):
        s = text
        for (pattern, repl) in self.patterns:
            (s, count) = re.subn(pattern, repl, s)
        return s

In [21]:
##apply custom replacement
replacer = RegexpReplacer()
review_df['newtext'] = review_df['newtext'].map(lambda x: replacer.replace(x))


In [22]:
#trying HL
#some sort into buckets
HLpos = [line.strip() for line in  open(pathname2+'HLpos.txt','r')]
HLneg = [line.strip() for line in  open(pathname2 +'HLneg.txt','r',encoding = 'latin-1')]
print("HL pos  size: " + str(len(HLpos)))
print(HLpos[0:10])
print("HL neg  size: " + str(len(HLneg)))
print(HLneg[0:10])

HL pos  size: 2006
['a+', 'abound', 'abounds', 'abundance', 'abundant', 'accessable', 'accessible', 'acclaim', 'acclaimed', 'acclamation']
HL neg  size: 4783
['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable', 'abominably', 'abominate', 'abomination', 'abort', 'aborted']


In [23]:
def hl_sent(inputstring):

    poscount = 0
    negcount = 0
    
    for word in inputstring.split(): 
        if HLpos.count(word.rstrip('?:!.,;')):
            poscount +=1
        elif HLneg.count(word.rstrip('?:!.,;')):
            negcount +=1
     
    
    if poscount+negcount > 0:
        t = float((poscount - negcount)/(poscount+negcount))    
    else:
        t = 0
    
    
    if t > 0:
        tone = "Positive"
    elif t < 0:
        tone = "Negative"
    else:
        tone = "Neutral"
    
    return tone

In [24]:
#will take a while to run
review_df['hlsent'] = review_df.reviewtext.apply(lambda x: hl_sent(x))

In [25]:
##using HL
review_df.iloc[400:500][['newtext','reviewtext','hlsent']]

Unnamed: 0,newtext,reviewtext,hlsent
416,i really love this lace up shirt but i only liked it in black on me i like it open like the model is wearing it but i had to have it a little more closed because the lace part does go down a ways and i felt like i was revealing a little too much i would likely wear it open but then have to pair it with a cami underneath to feel comfortable i absolutely love the whole outfit as pictured and also reviewed the polka dot pants i wish the laces were a little longer than they are if you have i,"I really love this lace-up shirt, but i only liked it in black on me. i like it open like the model is wearing it, but i had to have it a little more closed because the lace part does go down a ways-and i felt like i was revealing a little too much. i would likely wear it open, but then have to pair it with a cami underneath to feel comfortable. i absolutely love the whole outfit as-pictured, and also reviewed the polka dot pants. i wish the laces were a little longer than they are-if you have i",Positive
417,i love byron lars dresses and this design is on point the ruffle at the neckline is so pretty and the dress fits like a dream however the fabric i would have loved it if this dress had a heavier feel this is sadly going back today,"I love byron lars dresses, and this design is on-point. the ruffle at the neckline is so pretty, and the dress fits like a dream. however -- the fabric!!! i would have loved it if this dress had a heavier feel. this is, sadly, going back today.",Positive
418,love this blouse it s super comfy looks awesome with jeans this blouse runs true to size i purchased in my normal size small,"Love this blouse, it;s super comfy, looks awesome with jeans. this blouse runs true to size i purchased in my normal size small.",Positive
419,i fell in love with this dress when i saw it online and due to the slim fit i ordered a size up a 2 petite up from my normal 0 petite when i received it i was surprised about two things 1 the material was kind of puffy not bad just weird and 2 it was too big on top rare for a petite size even though it fit everywhere else i wanted to love it but had to return would be gorgeous for someone else,"I fell in love with this dress when i saw it online and due to the ""slim fit,"" i ordered a size up -- a 2 petite up from my normal 0 petite. when i received it, i was surprised about two things: 1) the material was kind of puffy (not bad, just weird), and 2) it was too big on top - rare for a petite size - even though it fit everywhere else. i wanted to love it, but had to return. would be gorgeous for someone else!",Positive
420,i was hesitant based on the reviews but i am glad i ordered this dress in blue the material is like a french dot texture that is soft but still a bit structured i had no issues with the fit it is appropriately just a little oversized the styling is very mod,"I was hesitant based on the reviews, but i'm glad i ordered this dress (in blue). the material is like a french dot texture that is soft but still a bit structured. i had no issues with the fit. it's appropriately just a little oversized. the styling is very mod!",Positive
421,great feature perfect lacing do not need to worry that it is too low since there is material behind the bottom of the lacing like 3 4 length sleeve gives top the right proportion,Great feature...perfect lacing...do not need to worry that it is too low since there is material behind the bottom of the lacing. like 3/4 length sleeve. gives top the right proportion.,Positive
422,i purchased this jacket in green x small a while back and wasn t 100 sure about it due to the size i m 5 3 117 lbs and a 33a and thought it was a little snug so i tried on the small and that was way too big so i kept the xs i have worn it a few times and it does look great however i can only wear thin tops with it recently i purchased the black x small and this i love it s looser then the green so i can get away with thicker tops i do recommend both jackets they are pricey for what,"I purchased this jacket in green, x-small a while back and wasn?t 100% sure about it due to the size. i?m 5?3?, 117 lbs and a 33a and thought it was a little snug so i tried on the small and that was way too big so i kept the xs. i have worn it a few times and it does look great however i can only wear thin tops with it. recently i purchased the black, x-small and this i love. it?s looser then the green so i can get away with thicker tops. \r\ni do recommend both jackets. they are pricey for what",Positive
423,i love this dress i mean it si really pretty in person however the breast area is just too small i cannot wear a bra with it and my older breasts just droop not flattering they are barely covered i am a bit disappointed at that but if you are smaller up there i say give it a try i am 115 lbs 26 5 ion waist 30dd and xs petite was great everywhere but chest colors and fabric are great i love that the different colors are different types of fabric too bad,"I love this dress, i mean it si really pretty in person, however, the breast area is just too small... i can't wear a bra with it, and my ""older"" breasts just droop, not flattering. they are barely covered... i am a bit disappointed at that, but if you are smaller up there, i say give it a try... i am 115 lbs, 26.5 ion waist, 30dd and xs petite was great everywhere but chest.\r\n\r\ncolors and fabric are great, i love that the different colors are different types of fabric... too bad.",Positive
424,i love the lace up design and bought the red xsp fabric is a bit thin and mediocre quality but over all happy with purchase wish this top came in navy and white as well even a navy white stripe would be a fun option too thanks you for offering this top in a petite size,"I love the lace up design and bought the red xsp, fabric is a bit thin and mediocre quality, but over all happy with purchase. wish this top came in navy and white as well. even a navy/white stripe would be a fun option too. thanks you for offering this top in a petite size. :)",Positive
426,i got this shirt in the mail today and was really excited to try it on other reviewers said that it ran large so i ordered a size down and it fit perfect i looked in the mirror and noticed the ruffles were misaligned and obviously so i want to exchange it in the store but seeing that the size is not longer available online i am not really sure i will find another one in my size,"I got this shirt in the mail today and was really excited to try it on. other reviewers said that it ran large so i ordered a size down and it fit perfect. i looked in the mirror and noticed the ruffles were misaligned and obviously so. i want to exchange it in the store but seeing that the size is not longer available online, i'm not really sure i'll find another one in my size.",Positive


### T3 - Adjustments

Consider one of the entries in your corpus that had a surprising label.  How would you change your analysis to get the “right” label? Show specific results. 

There are many examples where customers wanted to return the product, but the review was assigned a positive sentiment - we need to change this.


### Custom Replacement

In [26]:
#custom replace words alluding to return the product with "return"
#add "return" to HLNeg - returning an item is negative in our context

#add too to HLneg - too usually was describing something negative
#combine return, return it, bring back, bring it back, going back. 96 - going back, 88 - returning, 68 - return, 
#99 - going back, 10 - going back, 57 - returning it

# using dictionaries
#replace versions of return with return
return_dict = {'returning':'return', 'going back':'return', 'bring back':'return','bring it back':'return', 'retuning':'return'}
print(type(return_dict))

<class 'dict'>


In [27]:
#you can apply the change to a column of a data frame
for origword, newword in return_dict.items():
    review_df.newtext = review_df.newtext.str.replace(origword.lower(), newword)
    
review_df[88:99]

Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name,reviewtext,newtext,hlsent
88,845,38,Huge,"Really cute piece, but it's huge. i ordered an xxs petite and it was unfortunately extremely wide and not flattering. returning.",2,0,4,General Petite,Tops,Blouses,"Really cute piece, but it's huge. i ordered an xxs petite and it was unfortunately extremely wide and not flattering. returning.",really cute piece but it is huge i ordered an xxs petite and it was unfortunately extremely wide and not flattering return,Positive
89,836,24,"Pretty, but not for me...","I bought this top online in the burnt orange color and was so excited to get it. when i tried it on, the fit was fine but it just lacked...something. the back was a little bit too long, the front was a little bit too short and it lacked the overall tailored look that i was after. gorgeous fabric and top, but not for me. i wanted something more for $150! bought the velvet tunic instead ;)",4,1,3,General,Tops,Blouses,"I bought this top online in the burnt orange color and was so excited to get it. when i tried it on, the fit was fine but it just lacked...something. the back was a little bit too long, the front was a little bit too short and it lacked the overall tailored look that i was after. gorgeous fabric and top, but not for me. i wanted something more for $150! bought the velvet tunic instead ;)",i bought this top online in the burnt orange color and was so excited to get it when i tried it on the fit was fine but it just lacked something the back was a little bit too long the front was a little bit too short and it lacked the overall tailored look that i was after gorgeous fabric and top but not for me i wanted something more for 150 bought the velvet tunic instead,Positive
90,1078,51,Sweet flattering dress,"I love cute summer dresses and this one, especially because it is made out of linen, is unique. it is very well-made with a design that is quite flattering. i am 5 foot 6 and a little curvy with a 38 c bust and i got a size 10. it fits well although it is difficult to zip up because the material has no give. the perfect dress to wear to italy or france! now i just have to book my tickets!",4,1,0,General Petite,Dresses,Dresses,"I love cute summer dresses and this one, especially because it is made out of linen, is unique. it is very well-made with a design that is quite flattering. i am 5 foot 6 and a little curvy with a 38 c bust and i got a size 10. it fits well although it is difficult to zip up because the material has no give. the perfect dress to wear to italy or france! now i just have to book my tickets!",i love cute summer dresses and this one especially because it is made out of linen is unique it is very well made with a design that is quite flattering i am 5 foot 6 and a little curvy with a 38 c bust and i got a size 10 it fits well although it is difficult to zip up because the material has no give the perfect dress to wear to italy or france now i just have to book my tickets,Positive
91,850,29,,"This top is so much prettier in real life than it is on the model. the pattern and texture are both lovely, and the peplum is surprisingly flattering. it is definitely on the short side, but i think that gives it a modern look. the fabric does not stretch at all, but i still think it fits tts. if you have a very large chest you may want to go up a size, but otherwise i would order your normal size.",5,1,5,General Petite,Tops,Blouses,"This top is so much prettier in real life than it is on the model. the pattern and texture are both lovely, and the peplum is surprisingly flattering. it is definitely on the short side, but i think that gives it a modern look. the fabric does not stretch at all, but i still think it fits tts. if you have a very large chest you may want to go up a size, but otherwise i would order your normal size.",this top is so much prettier in real life than it is on the model the pattern and texture are both lovely and the peplum is surprisingly flattering it is definitely on the short side but i think that gives it a modern look the fabric does not stretch at all but i still think it fits tts if you have a very large chest you may want to go up a size but otherwise i would order your normal size,Positive
94,850,23,"Beautifully made, but not versatile","This shirt caught my eye because of how beautiful it was. i love the shape, design, and and the color. it's perfect for spring and summer with some white pants. unfortunately, i don't see any possibilities for this shirt to be worn any other way. so far, it doesn't work with any of my jeans, skirts, or shorts. i usually prefer items with more versatility for outfits, so i'm still on the fence if i'm going to keep it or not. with that aside, it seriously is a great quality shirt with a beautiful",4,1,3,General Petite,Tops,Blouses,"This shirt caught my eye because of how beautiful it was. i love the shape, design, and and the color. it's perfect for spring and summer with some white pants. unfortunately, i don't see any possibilities for this shirt to be worn any other way. so far, it doesn't work with any of my jeans, skirts, or shorts. i usually prefer items with more versatility for outfits, so i'm still on the fence if i'm going to keep it or not. with that aside, it seriously is a great quality shirt with a beautiful",this shirt caught my eye because of how beautiful it was i love the shape design and and the color it is perfect for spring and summer with some white pants unfortunately i do not see any possibilities for this shirt to be worn any other way so far it does not work with any of my jeans skirts or shorts i usually prefer items with more versatility for outfits so i am still on the fence if i am going to keep it or not with that aside it seriously is a great quality shirt with a beautiful,Positive
95,863,83,Casual elegance!,"Purchased this top online, and when i received it was very pleased.\r\nit has and elegant cut and yet is a casual fabric.\r\nlove that the sleeves run longer......ads to the overall look.\r\nalso loved the v neckline.........enhances the feel of the overall style.\r\nwith various necklaces this top has limitless options!\r\nthe color states moss....which i usually think of as greenish brown......i found it to be more of a taupe.\r\nwould have liked it to have a green tones, however it is still a fantastic f",5,1,14,General,Tops,Knits,"Purchased this top online, and when i received it was very pleased.\r\nit has and elegant cut and yet is a casual fabric.\r\nlove that the sleeves run longer......ads to the overall look.\r\nalso loved the v neckline.........enhances the feel of the overall style.\r\nwith various necklaces this top has limitless options!\r\nthe color states moss....which i usually think of as greenish brown......i found it to be more of a taupe.\r\nwould have liked it to have a green tones, however it is still a fantastic f",purchased this top online and when i received it was very pleased it has and elegant cut and yet is a casual fabric love that the sleeves run longer ads to the overall look also loved the v neckline enhances the feel of the overall style with various necklaces this top has limitless options the color states moss which i usually think of as greenish brown i found it to be more of a taupe would have liked it to have a green tones however it is still a fantastic f,Positive
96,845,44,,"I usually wear a medium and bought a small. it fit ok, but had no shape and was not flattering. i love baby doll dresses and tops, but this was a tent. my daughter saw me try it on and said ""that's a piece of tablecloth."" it's going back.",1,0,0,General Petite,Tops,Blouses,"I usually wear a medium and bought a small. it fit ok, but had no shape and was not flattering. i love baby doll dresses and tops, but this was a tent. my daughter saw me try it on and said ""that's a piece of tablecloth."" it's going back.",i usually wear a medium and bought a small it fit ok but had no shape and was not flattering i love baby doll dresses and tops but this was a tent my daughter saw me try it on and said that is a piece of tablecloth it is return,Positive
97,861,44,Huge,"I was very excited to order this top in red xs. so cute, but it was huge, shapeless and support thin! it had to go back. i should've looked at other reviews.",1,0,0,General Petite,Tops,Knits,"I was very excited to order this top in red xs. so cute, but it was huge, shapeless and support thin! it had to go back. i should've looked at other reviews.",i was very excited to order this top in red xs so cute but it was huge shapeless and support thin it had to go back i should have looked at other reviews,Positive
99,861,33,Pernette henley,"I am in need of easy comfortable tops for everyday wear. i bought this top mostly because of the cute buttons. when i received it, it looked exactly as it does in the picture online, however, the buttons kept slipping out of their homes because the holes were slightly too big. the shirt fit but was just a tad snug near the upper arms, which would stretch and loosen up throughout the day. it's definitely a comfortable shirt, but it felt more like a pajama top. it's going back.",3,0,17,General Petite,Tops,Knits,"I am in need of easy comfortable tops for everyday wear. i bought this top mostly because of the cute buttons. when i received it, it looked exactly as it does in the picture online, however, the buttons kept slipping out of their homes because the holes were slightly too big. the shirt fit but was just a tad snug near the upper arms, which would stretch and loosen up throughout the day. it's definitely a comfortable shirt, but it felt more like a pajama top. it's going back.",i am in need of easy comfortable tops for everyday wear i bought this top mostly because of the cute buttons when i received it it looked exactly as it does in the picture online however the buttons kept slipping out of their homes because the holes were slightly too big the shirt fit but was just a tad snug near the upper arms which would stretch and loosen up throughout the day it is definitely a comfortable shirt but it felt more like a pajama top it is return,Positive
100,861,39,Comfy,"At first i wasn't sure about it. the neckline is much lower and wavy than i thought. but after wearing it, it really is comfortable. it stretches a lot, so i wear a cami underneath so when i lean forward i'm not showing the world my torso.",4,1,0,General Petite,Tops,Knits,"At first i wasn't sure about it. the neckline is much lower and wavy than i thought. but after wearing it, it really is comfortable. it stretches a lot, so i wear a cami underneath so when i lean forward i'm not showing the world my torso.",at first i was not sure about it the neckline is much lower and wavy than i thought but after wearing it it really is comfortable it stretches a lot so i wear a cami underneath so when i lean forward i am not showing the world my torso,Positive


In [28]:
#add return to negative dictionary
type(HLneg)

#convert to set
HLneg = set(HLneg)
another_set = set(['return', 'too', 'unflattering'])
HLneg.update(another_set) #adds another_set to existing my_stopwords object

#convert back to list
HLneg = list(HLneg)

In [29]:
#apply dictionary again and review how changes affected results
review_df['hlsent2'] = review_df.newtext.apply(lambda x: hl_sent(x))

In [30]:
review_df.iloc[300:315] #312 unflattering changed from neutral to negative

Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name,reviewtext,newtext,hlsent,hlsent2
312,844,34,Very unflattering,True to size on the neckline and arms but extremely large and puffy in the torso. very unflattering cut!,3,0,0,General Petite,Tops,Blouses,True to size on the neckline and arms but extremely large and puffy in the torso. very unflattering cut!,true to size on the neckline and arms but extremely large and puffy in the torso very unflattering cut,Neutral,Negative
313,836,32,Adorable,"I purchased the floral patterned version and get complimented every time i wear it. i found it to be pretty true to size, even after washing. it's a little sheer, so you'd definitely want to wear a camisole underneath for work. it's a great top for spring/summer!",4,1,4,General,Tops,Blouses,"I purchased the floral patterned version and get complimented every time i wear it. i found it to be pretty true to size, even after washing. it's a little sheer, so you'd definitely want to wear a camisole underneath for work. it's a great top for spring/summer!",i purchased the floral patterned version and get complimented every time i wear it i found it to be pretty true to size even after washing it is a little sheer so you would definitely want to wear a camisole underneath for work it is a great top for spring summer,Positive,Positive
314,836,60,Didn't work for me,"I thought this top was adorable in the store and online. it just didn't work for me. although it fit, it flares out too much in the front and just wasn't flattering on me. i am 5' 5"" and 128 lbs. and ordered the small.",3,0,3,General,Tops,Blouses,"I thought this top was adorable in the store and online. it just didn't work for me. although it fit, it flares out too much in the front and just wasn't flattering on me. i am 5' 5"" and 128 lbs. and ordered the small.",i thought this top was adorable in the store and online it just did not work for me although it fit it flares out too much in the front and just was not flattering on me i am 5 5 and 128 lbs and ordered the small,Positive,Positive
315,836,21,Beautiful top!,"Love this top! made with 100% cotton, a vintage look, and flattering details this top is a winner for me. i think it fits true to size (got my regular size 0) and i did not need the petite and i am fairly short (5'3""). it is somewhat see through, but with wearing a nude bra and not wearing it to work, i think it can be worn without a cami. the perfect lightweight, comfortable, standout piece for the summer time :)",5,1,8,General,Tops,Blouses,"Love this top! made with 100% cotton, a vintage look, and flattering details this top is a winner for me. i think it fits true to size (got my regular size 0) and i did not need the petite and i am fairly short (5'3""). it is somewhat see through, but with wearing a nude bra and not wearing it to work, i think it can be worn without a cami. the perfect lightweight, comfortable, standout piece for the summer time :)",love this top made with 100 cotton a vintage look and flattering details this top is a winner for me i think it fits true to size got my regular size 0 and i did not need the petite and i am fairly short 5 3 it is somewhat see through but with wearing a nude bra and not wearing it to work i think it can be worn without a cami the perfect lightweight comfortable standout piece for the summer time,Positive,Positive
316,836,59,Love this blouse,"I really like this blouse a lot. very very easy to wear!! i wore with pencil skirt to work and with skort as shown similar on model with sandals on weekend. very flattering and great blue color!!! very happy with this purchase. highly recommend. i am 5'6"" short torso and my usual 6 worked.",5,1,0,General,Tops,Blouses,"I really like this blouse a lot. very very easy to wear!! i wore with pencil skirt to work and with skort as shown similar on model with sandals on weekend. very flattering and great blue color!!! very happy with this purchase. highly recommend. i am 5'6"" short torso and my usual 6 worked.",i really like this blouse a lot very very easy to wear i wore with pencil skirt to work and with skort as shown similar on model with sandals on weekend very flattering and great blue color very happy with this purchase highly recommend i am 5 6 short torso and my usual 6 worked,Positive,Positive
317,844,53,Lovely printed blouse,I just purchased this beautiful printed blouse in the pink color and love it! i almost always wear a size small at retailer (34d-27-35) and the fit and length are both perfect on me. if you are smaller chested you can easily go down a size. i absolutely had to have this whe i first saw it at the store and noticed how popular it was as i had to order it due to it selling out like hot cakes there. what i like about it is the texture and the ruffles at the front plus the length of the sleeves stop ri,5,1,4,General Petite,Tops,Blouses,I just purchased this beautiful printed blouse in the pink color and love it! i almost always wear a size small at retailer (34d-27-35) and the fit and length are both perfect on me. if you are smaller chested you can easily go down a size. i absolutely had to have this whe i first saw it at the store and noticed how popular it was as i had to order it due to it selling out like hot cakes there. what i like about it is the texture and the ruffles at the front plus the length of the sleeves stop ri,i just purchased this beautiful printed blouse in the pink color and love it i almost always wear a size small at retailer 34d 27 35 and the fit and length are both perfect on me if you are smaller chested you can easily go down a size i absolutely had to have this whe i first saw it at the store and noticed how popular it was as i had to order it due to it selling out like hot cakes there what i like about it is the texture and the ruffles at the front plus the length of the sleeves stop ri,Positive,Positive
318,936,27,"Love, love, love!",Bought this on a whim and it exceeded my expectations. i didn't know what to expect with the quality of the fabric but this is incredibly soft and warm. haven't worn it outside yet but i can see this already as one of my favorite items. i'm usually an extra-small but the xxs also fits. it's a great buy especially since it's on sale now.,5,1,2,General Petite,Tops,Sweaters,Bought this on a whim and it exceeded my expectations. i didn't know what to expect with the quality of the fabric but this is incredibly soft and warm. haven't worn it outside yet but i can see this already as one of my favorite items. i'm usually an extra-small but the xxs also fits. it's a great buy especially since it's on sale now.,bought this on a whim and it exceeded my expectations i did not know what to expect with the quality of the fabric but this is incredibly soft and warm have not worn it outside yet but i can see this already as one of my favorite items i am usually an extra small but the xxs also fits it is a great buy especially since it is on sale now,Positive,Positive
319,895,62,Loveee,"This is an awesome vest - so soft, cozy, and i cannot wait to wear it through fall and winter. for sake of not repeating all the positive aspects that the previous reviewers did, i'll mention the one flaw...no pockets :( still totally worth full price in my mind though.",5,1,7,General,Tops,Fine gauge,"This is an awesome vest - so soft, cozy, and i cannot wait to wear it through fall and winter. for sake of not repeating all the positive aspects that the previous reviewers did, i'll mention the one flaw...no pockets :( still totally worth full price in my mind though.",this is an awesome vest so soft cozy and i cannot wait to wear it through fall and winter for sake of not repeating all the positive aspects that the previous reviewers did i will mention the one flaw no pockets still totally worth full price in my mind though,Positive,Positive
320,836,41,,"I find that maeve shirts tend to run a little small. i'm usually an 8 but needed this in a 10. this shirt is reallly just perfect. great sleeve length. just the right amount of v neck. beautiful pattern with a vintage feel. i love the combo of stripes, polka dots and sweet flowers.",5,1,23,General,Tops,Blouses,"I find that maeve shirts tend to run a little small. i'm usually an 8 but needed this in a 10. this shirt is reallly just perfect. great sleeve length. just the right amount of v neck. beautiful pattern with a vintage feel. i love the combo of stripes, polka dots and sweet flowers.",i find that maeve shirts tend to run a little small i am usually an 8 but needed this in a 10 this shirt is reallly just perfect great sleeve length just the right amount of v neck beautiful pattern with a vintage feel i love the combo of stripes polka dots and sweet flowers,Positive,Positive
321,831,50,My favorite new blouse,"This blouse is so pretty. i love the long tie. the pattern is very unique. it is a thin, light weight fabric so you can easily wear it underneath a leather jacket.",5,1,3,General,Tops,Blouses,"This blouse is so pretty. i love the long tie. the pattern is very unique. it is a thin, light weight fabric so you can easily wear it underneath a leather jacket.",this blouse is so pretty i love the long tie the pattern is very unique it is a thin light weight fabric so you can easily wear it underneath a leather jacket,Positive,Positive


#### export wordcount for visualizations

In [32]:

## .value.astype('U') needed to transform data to unicode
cv10_chat = cv10.fit_transform(review_df['newtext'].values.astype('U')) #apply the transformation

df_counts = count_df10.reset_index()
df_counts.to_csv("BIA6304_final_wordcounts.csv", encoding='utf-8', index=False)


## Topic Modeling

In [29]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics.pairwise import euclidean_distances, cosine_similarity
import warnings
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 
# Suppress warnings from pandas library

import nltk
pd.set_option('display.max_colwidth', 150000) #important for getting all the text


#now we will see the length of our feature space using cv9 parameters with our_stopwords
#cv10 = CountVectorizer(binary=False, min_df= .01, ngram_range = (1, 2), stop_words = our_stopwords) #define the transformation
cv10_chat = cv10.fit_transform(review_df['newtext'].values.astype('U')) #apply the transformation
#pd.DataFrame(cv10_chat.toarray(),columns = cv10.get_feature_names()).head()

#names10 = cv10.get_feature_names()   #create list of feature names
#count10 = np.sum(cv10_chat.toarray(), axis = 0) # add up feature counts 
#count10_2 = count10.tolist()  # convert numpy array to list
#count_df10 = pd.DataFrame(count10_2, index = names10, columns = ['count']) # create a dataframe from the list
#count_df10.sort_values(['count'], ascending = False)[0:10]  #arrange by count instead of alphabetical (top 20)

In [30]:
from sklearn.decomposition import NMF

n_topics = 10
n_top_words = 5

# Fit the NMF model
nmf_10_5 = NMF(n_components=n_topics, random_state=1).fit(cv10_chat)
review_components_10_5 = nmf_10_5.components_


names_reviews = cv10.get_feature_names()

#print(names_reviews)

In [31]:
# nice function for printing topic information - 
# https://stackoverflow.com/questions/34429635/topic-modelling-assign-a-document-with-top-2-topics-as-category-label-sklear

def print_top_words(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        print("Topic #%d:" % topic_idx)
        print(" ".join([feature_names[i]
                        for i in topic.argsort()[:-n_top_words - 1:-1]]))
    print()

In [32]:
print("\nTopics for reviews in NMF model:")
print_top_words(nmf_10_5, names_reviews, n_top_words)


Topics for reviews in NMF model:
Topic #0:
love color perfect soft colors
Topic #1:
dress beautiful love dress fabric dresses
Topic #2:
not did did not does does not
Topic #3:
but little but not really bit
Topic #4:
size true true size ordered fits
Topic #5:
top love top cute tops fabric
Topic #6:
very flattering very flattering soft comfortable
Topic #7:
like look really looks just
Topic #8:
fit great jeans perfect color
Topic #9:
wear small medium usually large



In [40]:
from sklearn.decomposition import NMF

n_topics15 = 20
n_top_words15 = 15

# Fit the NMF model
nmf_20_15 = NMF(n_components=n_topics15, random_state=1).fit(cv10_chat)
review_components_20_15 = nmf_20_15.components_


names_reviews15 = cv10.get_feature_names()

#print(names_reviews)

In [41]:
print("\nTopics for reviews in NMF model:")
print_top_words(nmf_20_15, names_reviews15, n_top_words15)


Topics for reviews in NMF model:
Topic #0:
love love dress colors love top absolutely fits comfortable absolutely love style flattering soft super fell wanted love fell love
Topic #1:
dress beautiful love dress dresses slip flattering dress but dress very summer material bust fits gorgeous online wedding
Topic #2:
not does does not but not sure not sure flattering tight material think not flattering good not think price retailer
Topic #3:
but little but not bit cute large big think usually thought sale return price tried wanted
Topic #4:
size true true size usual fits usual size ordered smaller large runs ordered size normal size but size small wear size
Topic #5:
top love top tops cute beautiful top but pretty large white lace looks bra tank colors shoulders
Topic #6:
very flattering very flattering comfortable well soft very comfortable material very soft pretty nice very pretty very cute very nice cute
Topic #7:
like looks feel looks like model looked feel like not like look like l

In [36]:
#try with 5 topics and 10 words
n_topics2 = 5
n_top_words2 = 10

# Fit the NMF model
nmf_5_10 = NMF(n_components=n_topics2, random_state=1).fit(cv10_chat)
review_components_5_10 = nmf_5_10.components_

names_reviews2 = cv10.get_feature_names()

print("\nTopics for reviews in NMF model:")
print_top_words(nmf_5_10, names_reviews2, n_top_words2)


Topics for reviews in NMF model:
Topic #0:
top love very great wear color like soft fabric flattering
Topic #1:
dress very love wear beautiful love dress fabric perfect flattering great
Topic #2:
not did did not like does just does not look but not fabric
Topic #3:
but little like really but not small bit just fit wear
Topic #4:
size fit small ordered true true size large medium petite wear



In [37]:
#try with 20 topics and 3 words
n_topics3 = 20
n_top_words3 = 3

# Fit the NMF model
nmf_20_3 = NMF(n_components=n_topics3, random_state=1).fit(cv10_chat)
review_components_20_3 = nmf_20_3.components_

names_reviews3 = cv10.get_feature_names()

print("\nTopics for reviews in NMF model:")
print_top_words(nmf_20_3, names_reviews3, n_top_words3)


Topics for reviews in NMF model:
Topic #0:
love love dress colors
Topic #1:
dress beautiful love dress
Topic #2:
not does does not
Topic #3:
but little but not
Topic #4:
size true true size
Topic #5:
top love top tops
Topic #6:
very flattering very flattering
Topic #7:
like looks feel
Topic #8:
fit perfectly fit perfectly
Topic #9:
wear usually wear usually
Topic #10:
small medium large
Topic #11:
great looks looks great
Topic #12:
just right just right
Topic #13:
color sweater beautiful
Topic #14:
did did not not
Topic #15:
fabric soft nice
Topic #16:
shirt cute little
Topic #17:
ordered petite xs
Topic #18:
look really well
Topic #19:
perfect length little



In [42]:
#label each review with topic assignment

import math

#define a function for cosine similarity - the latest version in sklearn doesn't take vectors
def cosine_similarity(a, b):
    return sum([i*j for i,j in zip(a, b)])/(math.sqrt(sum([i*i for i in a]))* math.sqrt(sum([i*i for i in b])))



In [43]:
# define a function for determining the most similar topic
# credit Leo Ji

def topic_sim(arr, feature_names, n_top_words, topics):
    """
    @type  arr: array of number
    @param arr: vectorizer number in an array.
    @type  feature_names: array of string
    @param feature_names: The array of feature names.
    @type  n_top_words: number
    @param n_top_words: The number of topics to return.
    @type  topics: array of string
    @param topics: Complete list of topics from topic extraction.
    
    @rtype:   top topics
    @return:  top topics in string separated by space.
    """
    top_sim = 0
    top_topic = np.array([])
    # iterate over topics
    for idx, topic in enumerate(topics):
        # calculate cosine similarity - substitute euclidean distance if that is your preferred metric
        # could switch to euclidean_distances
        sim = cosine_similarity(arr, topic)
        if sim > top_sim:
            top_sim = sim
            top_topic = topic
    
    # argsort sort is in ascending order, so pick last n_top_words from it
    selected_topic_index = top_topic.argsort()[:-n_top_words-1:-1]
    # return the text feature names by indeing back into feature_names (assigned earlier)   
    return " ".join([feature_names[i] for i in selected_topic_index])



In [47]:
# create a vector of topic labels that can be appended to the original dataframe

import time

t0 = time.time()

#apply most similar topic to each document
review_df['topics_20_15'] = np.ma.apply_along_axis(topic_sim, axis=1, 
        arr=cv10_chat.toarray(), feature_names=names_reviews15, n_top_words=15, topics=review_components_20_15)
t1 = time.time()

t1-t0

  import sys


1783.0239443778992

In [50]:
#count topics and view dataframe
review_df['topics_20_15'].value_counts()

not does does not but not sure not sure flattering tight material think not flattering good not think price retailer                          2405
but little but not bit cute large big think usually thought sale return price tried wanted                                                    2360
dress beautiful love dress dresses slip flattering dress but dress very summer material bust fits gorgeous online wedding                     2294
very flattering very flattering comfortable well soft very comfortable material very soft pretty nice very pretty very cute very nice cute    1637
size true true size usual fits usual size ordered smaller large runs ordered size normal size but size small wear size                        1507
top love top tops cute beautiful top but pretty large white lace looks bra tank colors shoulders                                              1291
love love dress colors love top absolutely fits comfortable absolutely love style flattering soft super fell wanted lo

In [51]:
review_df[['newtext','topics_20_15']].head(10)

Unnamed: 0,newtext,topics_20_15
0,absolutely wonderful silky and sexy and comfortable,perfect length little comfortable jeans summer pants bought soft long colors fall length perfect fit perfect flattering
1,love this dress it is sooo pretty i happened to find it in a store and i am glad i did bc i never would have ordered it online bc it is petite i bought a petite and am 5 8 i love the length on me hits just a little below the knee would definitely be a true midi on someone who is truly petite,ordered petite xs length regular long short lbs xxs tried store sleeves big waist nice
2,i had such high hopes for this dress and really wanted it to work for me i initially ordered the petite small my usual size but i found this to be outrageously small so small in fact that i could not zip it up i reordered it in petite medium which was just ok overall the top half was comfortable and fit nicely but the bottom half had a very tight under layer and several somewhat cheap net over layers imo a major design flaw was the net over layer sewn directly into the zipper it c,small medium large runs usually size small ordered extra retailer small but little run ordered small big runs small
3,i love love love this jumpsuit it is fun flirty and fabulous every time i wear it i get nothing but great compliments,love love dress colors love top absolutely fits comfortable absolutely love style flattering soft super fell wanted love fell love
4,this shirt is very flattering to all due to the adjustable front tie it is the perfect length to wear with leggings and it is sleeveless so it pairs well with any cardigan love this shirt,shirt cute little white material soft bought black shirts looks bit tee sleeves super large
5,i love tracy reese dresses but this one is not for the very petite i am just under 5 feet tall and usually wear a 0p in this brand this dress was very pretty out of the package but its a lot of dress the skirt is long and very full so it overwhelmed my small frame not a stranger to alterations shortening and narrowing the skirt would take away from the embellishment of the garment i love the color and the idea of the style but it just did not work on me i returned this dress,but little but not bit cute large big think usually thought sale return price tried wanted
6,i aded this in my basket at hte last mintue to see what it would look like in person store pick up i went with teh darkler color only because i am so pale hte color is really gorgeous and turns out it mathced everythiing i was trying on with it prefectly it is a little baggy on me and hte xs is hte msallet size bummer no petite i decided to jkeep it though because as i said it matvehd everything my ejans pants and the 3 skirts i waas trying on of which i kept all oops,color sweater beautiful soft blue bought green nice red pink online purchased sleeves fall light
7,i ordered this in carbon for store pick up and had a ton of stuff as always to try on and used this top to pair skirts and pants everything went with it the color is really nice charcoal with shimmer and went well with pencil skirts flare pants etc my only compaint is it is a bit big sleeves are long and it does not go in petite also a bit loose for me but no xxs so i kept it and wil ldecide later since the light color is already sold out in hte smallest size,color sweater beautiful soft blue bought green nice red pink online purchased sleeves fall light
8,i love this dress i usually get an xs but it runs a little snug in bust so i ordered up a size very flattering and feminine with the usual retailer flair for style,very flattering very flattering comfortable well soft very comfortable material very soft pretty nice very pretty very cute very nice cute
9,i am 5 5 and 125 lbs i ordered the s petite to make sure the length was not too long i typically wear an xs regular in retailer dresses if you are less busty 34b cup or smaller a s petite will fit you perfectly snug but not tight i love that i could dress it up for a party or down for work i love that the tulle is longer then the fabric underneath,ordered petite xs length regular long short lbs xxs tried store sleeves big waist nice


In [None]:
#try LDA
#from sklearn.decomposition import LatentDirichletAllocation
#n_topics4 = 10
#n_top_words4 = 5

# Fit the LDA model
#lda_10_5 = LatentDirichletAllocation(n_topics=n_topics4, max_iter=5,
                                learning_method='online', learning_offset=50.,
                                random_state=0)

#lda_10_5.fit(cv10_chat)

#feature_names4 = cv10.get_feature_names()

#what are the topics for this corpus?
#print("\nTopics for texts in LDA model:")
#print_top_words(lda_10_5, feature_names4, n_top_words4)

In [None]:
#apply most similar topic to each document
#t5 = time.time()
#review_df['lda_10_5_topics'] = np.ma.apply_along_axis(topic_sim, axis=1, 
 #       arr=cv10_chat.toarray(), feature_names=cv10.get_feature_names(), n_top_words=4, topics=lda_10_5.components_)

#t6 = time.time()

#t6-t5

In [None]:
#review_df['lda_10_5_topics'].value_counts()

In [None]:
#review_df[['Review Text','topics_10_5','lda_10_5_topics']].head(10)


### END