Incorrect polarity calculation #21

swsankar · 2016-07-27T12:47:52Z

Finding it strange. Trying the sentence "Crashing tv isn't showing" yields a sentiment score of 0.5

Sentiment for "Crashing TV" yields -0.70
Sentiment for "isn't showing" yields 0
Sentiment for "isn't " yields 0 - This is surprising coz I have "isn't" as negator in my valence table

There were only a couple of additions to the valence table and the polarity table - and none of it should have any impact in the context of this sentence.

Any idea what is wrong ?

sentiment_by("Crashing tv isn't showing", by = NULL, polarity_dt = pk_table,

```
         valence_shifters_dt = vs_table)
```
element_id word_count sd ave_sentiment
1: 1 4 NA 0.5

sentiment_by("Crashing tv", by = NULL, polarity_dt = pk_table,
```
         valence_shifters_dt = vs_table)
```
element_id word_count sd ave_sentiment
1: 1 2 NA -0.7071068

sentiment_by("isn't showing", by = NULL, polarity_dt = pk_table,
```
         valence_shifters_dt = vs_table)
```
element_id word_count sd ave_sentiment
1: 1 2 NA 0
sentiment_by("isn't", by = NULL, polarity_dt = pk_table,
```
         valence_shifters_dt = vs_table)
```
element_id word_count sd ave_sentiment
1: 1 1 NA 0

The text was updated successfully, but these errors were encountered:

trinker · 2016-07-27T13:08:13Z

Thanks for trying sentimentr.

It's hard to tell discuss this without reproducible example. I believe I know where you are getting tripped up but will wait until you make a reproducible example so I can see your process. Please use markdown formatting to display intext and blocks of code so that it's easy to read & grab.

swsankar · 2016-07-28T09:30:03Z

Not sure if I understand the ask
I am trying to evaluate/debug the anomalies I am getting in terms of the sentiment score and eventually improve my dictionary. One such example is for the sentence "Crashing tv isn't showing"

All I am doing is running <sentimentby()> function for the above sentence in RStudio as it appears above

sentiment_by("Crashing tv isn't showing", by = NULL, polarity_dt = pk_table, valence_shifters_dt = vs_table)

Below is what I use to update the polarity table and Valence table.

vs_table <- sentimentr::valence_shifters_table  
 vs_table <- update_key(vs_table, drop=NULL, x = data.frame(x = c("especially", "most", "more", "bigger"), y = c(2,4), stringsAsFactors = FALSE), comparison=sentimentr::polarity_table, sentiment=F)

pk_table <- sentimentr::polarity_table
pk_table <- update_key(pk_table,  x = data.frame(x = c("used to", "outdated", "restarts", "reboot", "i wish"), y = c(rep(-2, 5))

swsankar · 2016-07-28T11:34:54Z

Here are more examples where I am getting a positive score instead of a negative

Horrible can't even watch the game and it's football season, this app needs a face lift.

Looks like half the channels from basic cable lineup are missing! (tried adding 'looks like' as a de-amp, but getting a duplication error while both the polarity & valence table does not contain it)

It crashes every time I use it. They marketed it like it was as good or better then the Netflix app... Please. Don't even bother with this.

trinker · 2016-07-28T13:09:07Z

Let's start with this:

x = data.frame(x = c("especially", "most", "more", "bigger"), y = c(2,4)

These two vectors are not equal in length so R invokes the recycling rule to make the data.frame. IS that really what you want? Also...What is the 4 for? It's use isn't documented so I'm wondering what you are using it for.

Realize that negators (isn't in this case) before or after a polarized word can flip it's polarity. The default is if a negator is 2 words after a polarized word it flips the sign. You can tone this down but may affect other statements the opposite way.

sentiment_by("Crashing tv isn't showing", by = NULL, polarity_dt = pk_table, valence_shifters_dt = vs_table, n.after = 1)

gives:

   element_id word_count sd ave_sentiment
1:          1          4 NA          -0.5

In your original post you wrote:

Sentiment for "isn't showing" yields 0
Sentiment for "isn't " yields 0 - This is surprising coz I have "isn't" as negator in my valence table

isn't showing showing isn't a polarized word so it's not surprising that this is considered neutral. Your second statement has me believing you don't understand the difference between a negative word and a negator. Negative words make polarity negative. Negators flip the sign of the polarity. A negator has no polarity of its own, it can only affect polarized words.

trinker · 2016-07-28T13:40:52Z

The sentences you are showing are not surprising to me. Here's a few things to note:

There is no claims to sentimentr being 100% accurate. Even the best taggers such as Stanford's does not come close to 100% accurate. See the comparison between a few taggers here: https://github.com/trinker/sentimentr#comparing-sentimentr-syuzhet-and-stanford
The sentiment_by function averages the sentiments for each sentence using a simple mean. So if you have a combo of negative and positive the eman smoothes that out and may not be what you want. Use sentiment and figure out how to handle the differences between sentences yourself.
The tagger requires properly formatted sentences as the tagger is based on a model of how English works. This sentence "Horrible can't even watch the game and it's football season, this app needs a face lift." in particular breaks this model. This is actually 2 sentences, not one. There should be a period after the word horrible. Instead the word 'can't' negated horrible. Not what you want.

Also realize the update_key protects you from adding words to a key when they are found in the other key. In this case it won't let you add isn't to the sentiment key because it's in the valence key. You'll need to update the valence key first using the drop argument. This is why you're getting warnings. The key's are data.table objects so you can see if your added words made it in the by looking at the key.

The act of making dictionaries is important and the format in sentimentr was designed to be mutable but requires attention to detail. As you go through this process, if you have ideas to make the UX of dictionary updating smoother please share.

swsankar · 2016-07-29T10:32:47Z

Thank you for the detailed insights

x = data.frame(x = c("especially", "most", "more", "bigger"), y = c(2,4)
My bad - I intended to use the c(rep(2,4)). Now I have corrected that.
I now get why "isn't" and "isn't showing" is yielding 0 - And it makes perfect sense. Guess for my domain, I am gonna try n.after=1 to see how it does overall
I completely understand that 100% accuracy is impossible and I have seen your comparison as well. Just that sentences like "Crashing TV isn't showing", "Don't even bother with this" kept bothering me if I am doing anything wrong. Now, I re-collect how the valence shifters context is resulting in this
Lastly on the update_key, I understand the comparison part before adding a new word. But curious with my specific example. When I try to add Looks like in the Valence table, it did not allow me. I verified that it does not exists in both Polarity as well as Valence table. However I kept getting the error. One thing to note is there was this word like that already exists in Polarity table. Was wondering if the logic checks for every singular word in a n-gram word for duplication when we add a n-gram word to the table ?

And once again, thanks a lot for building an excellent sentiment analysis tool.

trinker · 2016-11-24T14:43:19Z

I will check int this.

trinker · 2016-11-25T05:47:10Z

Can you show me the code you tried? It works for me. Here's my code and output:

update_key(
    valence_shifters_table, 
    x = data.frame(x = c("Looks like"), y = c(3)), 
    comparison = sentimentr::polarity_table
)

Output:

                x y
 1:         acute 2
 2:       acutely 2
 3:         ain't 1
 4:      although 4
 5:        aren't 1
 6:        barely 3
 7:           but 4
 8:         can't 1
 9:        cannot 1
10:       certain 2
11:     certainly 2
12:      colossal 2
13:    colossally 2
14:      couldn't 1
15:          deep 2
16:        deeply 2
17:      definite 2
18:    definitely 2
19:        didn't 1
20:       doesn't 1
21:         don't 1
22:      enormous 2
23:    enormously 2
24:       extreme 2
25:     extremely 2
26:       faintly 3
27:           few 3
28:       greatly 2
29:        hardly 3
30:        hasn't 1
31:       haven't 1
32:       heavily 2
33:         heavy 2
34:          high 2
35:        highly 2
36:       however 4
37:          huge 2
38:        hugely 2
39:       immense 2
40:     immensely 2
41:  incalculable 2
42:  incalculably 2
43:         isn't 1
44:         least 3
45:        little 3
46:    looks like 3
47:       massive 2
48:     massively 2
49:      mightn't 1
50:          more 2
51:          much 2
52:       mustn't 1
53:       neither 1
54:         never 1
55:            no 1
56:        nobody 1
57:          none 1
58:           nor 1
59:           not 1
60:          only 3
61:    particular 2
62:  particularly 2
63:       purpose 2
64:     purposely 2
65:         quite 2
66:        rarely 3
67:          real 2
68:        really 2
69:        seldom 3
70:       serious 2
71:     seriously 2
72:        severe 2
73:      severely 2
74:        shan't 1
75:     shouldn't 1
76:   significant 2
77: significantly 2
78:      slightly 3
79:      sparesly 3
80:  sporadically 3
81:          sure 2
82:        surely 2
83:       totally 2
84:          true 2
85:         truly 2
86:          vast 2
87:        vastly 2
88:          very 2
89:      very few 3
90:   very little 3
91:        wasn't 1
92:       weren't 1
93:         won't 1
94:      wouldn't 1
                x y
Warning message:
In update_key(valence_shifters_table, x = data.frame(x = c("Looks like"),  :
  One or more terms in the first column contain capital letters. Capitals are ignored.
  I found the following suspects:

   * Looks like

These terms have been lower cased.

swsankar closed this as completed Nov 24, 2016

trinker reopened this Nov 24, 2016

trinker closed this as completed Dec 5, 2016

fahadshery mentioned this issue Dec 21, 2018

amend valence_shifters_dt #99

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect polarity calculation #21

Incorrect polarity calculation #21

swsankar commented Jul 27, 2016

trinker commented Jul 27, 2016

swsankar commented Jul 28, 2016 •

edited by trinker

Loading

swsankar commented Jul 28, 2016

trinker commented Jul 28, 2016 •

edited

Loading

trinker commented Jul 28, 2016 •

edited

Loading

swsankar commented Jul 29, 2016

trinker commented Nov 24, 2016

trinker commented Nov 25, 2016

Incorrect polarity calculation #21

Incorrect polarity calculation #21

Comments

swsankar commented Jul 27, 2016

trinker commented Jul 27, 2016

swsankar commented Jul 28, 2016 • edited by trinker Loading

swsankar commented Jul 28, 2016

trinker commented Jul 28, 2016 • edited Loading

trinker commented Jul 28, 2016 • edited Loading

swsankar commented Jul 29, 2016

trinker commented Nov 24, 2016

trinker commented Nov 25, 2016

swsankar commented Jul 28, 2016 •

edited by trinker

Loading

trinker commented Jul 28, 2016 •

edited

Loading

trinker commented Jul 28, 2016 •

edited

Loading