Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added documentation for AFINN, returned objects and tokenization. #124

Closed
wants to merge 6 commits into from

Conversation

rishpandey
Copy link
Contributor

No description provided.

@rishpandey rishpandey mentioned this pull request Oct 11, 2017
3 tasks
Copy link
Owner

@thisandagain thisandagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! A few really minor comments.

@@ -79,7 +79,62 @@ Yelp: 0.69 (+2%)
```

---
### How it works
#### AFINN
AFINN is a list of words rated for valence with an integer between minus five (negative) and plus five (positive). Sentiment analysis is performed by cross-checking the string tokens( words, emojis) with the AFINN list and getting their respective scores. The comparative score is simply: sum of each token / number of tokens. So for example let's take the following:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor nitpicks here:

  • Fix spacing / formatting in parenthetical "tokens (e.g. words and emojis)"
  • Format "sum of each token / number of tokens" as code by wrapping in code

(5 * 200) / 200 = 5

#### Tokenization
Tokenization works by splitting the lines of input string, then removing the special characters and finally splitting it using spaces. This is used to get list of words in the string.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another really minor nitpick:

  • Please use an oxford comma for "characters, and finally"

@@ -79,7 +79,62 @@ Yelp: 0.69 (+2%)
```

---
### How it works
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole section should probably be before "Benchmarks". Also, please match the spacing between the H3 and the divider line as shown in the other sections:

... lorem ipsum dolor sit amet.

---

### Some title
Lorem ipsum dolor sit amet...


This approach leaves you with a mid-point of 0 and the upper and lower bounds are constrained to positive and negative 5 respectively (the same as each token! 😸). For example, let's imagine an incredibly "positive" string with 200 tokens and where each token has an AFINN score of 5. Our resulting comparative score would look like this:

(max positive score * number of tokens) / number of tokens
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wrap this and the line below it in a code block

* __Negative__: List of negative words in input string that were found in AFINN list.

In this case, love has a value of 3, allergic has a value of -2, and the remaining tokens are neutral with a value of 0. Because the string has 9 tokens the resulting comparative score looks like:
(3 + -2) / 9 = 0.111111111
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wrap this line in a code block

@rishpandey
Copy link
Contributor Author

I fixed these in #125. Please take a look.

@rishpandey rishpandey closed this Oct 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants