Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for replacing punctuation in tokenize.js #149

Merged

Conversation

jalners
Copy link
Contributor

@jalners jalners commented Jun 24, 2018

It will be better if you will replace punctuation in tokenizer with space. Because you can have sentence like next: "If you are Razr owner...you must have this!" In previous sentence your tokenizer will return next wrong array:
['if', 'you', 'are', 'razr', 'owneryou', 'must', 'have', 'this']
The error in - 'owneryou'

@thisandagain thisandagain self-requested a review August 30, 2018 15:28
@thisandagain thisandagain self-assigned this Aug 30, 2018
@elyas-bhy elyas-bhy self-assigned this Sep 13, 2018
@elyas-bhy
Copy link
Collaborator

LGTM. @thisandagain ?

@elyas-bhy elyas-bhy removed their assignment Sep 13, 2018
@elyas-bhy elyas-bhy self-requested a review September 13, 2018 10:33
@thisandagain
Copy link
Owner

Excellent! The validation accuracy improvements with this are certainly worthwhile:

Before

Amazon accuracy: 0.7202797202797203
IMDB accuracy: 0.7642357642357642
Yelp accuracy: 0.6943056943056943

After

Amazon accuracy: 0.7252747252747253
IMDB accuracy: 0.7652347652347652
Yelp accuracy: 0.6963036963036963

@thisandagain thisandagain merged commit 222d52e into thisandagain:develop Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants