New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Operation: Word Count #193

Closed
neocotic opened this Issue May 14, 2013 · 2 comments

Comments

Projects
None yet
1 participant
@neocotic
Member

neocotic commented May 14, 2013

Add a new operation that counts the words within its rendered contents.

@ghost ghost assigned neocotic May 14, 2013

@neocotic

This comment has been minimized.

Member

neocotic commented May 18, 2013

A few things to note on the new wordCount operation;

  • It will simply count all character blocks that are separated by white space (i.e. spaces, tabs, new lines, or a mix of all three)
  • It will not differentiate between what is a word or an HTML block etc. so it should be used on plain text for best results

Finally, I've been testing various methods for counting words and trying to find the best performing solution as this could become a popular tool (e.g. easily count words in the selected text).

I came up with 4 different approaches and tested them first to ensure they worked as expected: http://jsfiddle.net/alasdair/d3KGy/. Then I created a simple performance test to find the best solution and found some surprising results: http://jsperf.com/word-count-test.

Surprisingly, on Chrome at least, the approach with the most code performs best. This is best translation (I think) into CoffeeScript:

count   = 0
text    = text.trim()
matches = text.match /\s+/g

if text
  count++
  count+= matches.length if matches

count
@neocotic

This comment has been minimized.

Member

neocotic commented May 19, 2013

This has been implemented by PR neocotic/template-chrome#1. One main caveat that I'll possibly fix in the future: special characters are current considered as "words". For example; foo - bar would be counted as having 3 words, as the hyphen would be counted. This is due to the optimized and way in which we're counting words based on spaces between text and words themselves. Also, how am I to know what should be considered a word (e.g !@~# could be considered a - possibly censored - word by some).

@neocotic neocotic closed this May 19, 2013

@neocotic neocotic added the accepted label Nov 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment