New Operation: Word Count #193

Closed
neocotic opened this Issue May 14, 2013 · 2 comments

Comments

Projects
None yet
1 participant
@neocotic
Member

neocotic commented May 14, 2013

Add a new operation that counts the words within its rendered contents.

@ghost ghost assigned neocotic May 14, 2013

@neocotic

This comment has been minimized.

Show comment
Hide comment
@neocotic

neocotic May 18, 2013

Member

A few things to note on the new wordCount operation;

  • It will simply count all character blocks that are separated by white space (i.e. spaces, tabs, new lines, or a mix of all three)
  • It will not differentiate between what is a word or an HTML block etc. so it should be used on plain text for best results

Finally, I've been testing various methods for counting words and trying to find the best performing solution as this could become a popular tool (e.g. easily count words in the selected text).

I came up with 4 different approaches and tested them first to ensure they worked as expected: http://jsfiddle.net/alasdair/d3KGy/. Then I created a simple performance test to find the best solution and found some surprising results: http://jsperf.com/word-count-test.

Surprisingly, on Chrome at least, the approach with the most code performs best. This is best translation (I think) into CoffeeScript:

count   = 0
text    = text.trim()
matches = text.match /\s+/g

if text
  count++
  count+= matches.length if matches

count
Member

neocotic commented May 18, 2013

A few things to note on the new wordCount operation;

  • It will simply count all character blocks that are separated by white space (i.e. spaces, tabs, new lines, or a mix of all three)
  • It will not differentiate between what is a word or an HTML block etc. so it should be used on plain text for best results

Finally, I've been testing various methods for counting words and trying to find the best performing solution as this could become a popular tool (e.g. easily count words in the selected text).

I came up with 4 different approaches and tested them first to ensure they worked as expected: http://jsfiddle.net/alasdair/d3KGy/. Then I created a simple performance test to find the best solution and found some surprising results: http://jsperf.com/word-count-test.

Surprisingly, on Chrome at least, the approach with the most code performs best. This is best translation (I think) into CoffeeScript:

count   = 0
text    = text.trim()
matches = text.match /\s+/g

if text
  count++
  count+= matches.length if matches

count
@neocotic

This comment has been minimized.

Show comment
Hide comment
@neocotic

neocotic May 19, 2013

Member

This has been implemented by PR neocotic/template-chrome#1. One main caveat that I'll possibly fix in the future: special characters are current considered as "words". For example; foo - bar would be counted as having 3 words, as the hyphen would be counted. This is due to the optimized and way in which we're counting words based on spaces between text and words themselves. Also, how am I to know what should be considered a word (e.g !@~# could be considered a - possibly censored - word by some).

Member

neocotic commented May 19, 2013

This has been implemented by PR neocotic/template-chrome#1. One main caveat that I'll possibly fix in the future: special characters are current considered as "words". For example; foo - bar would be counted as having 3 words, as the hyphen would be counted. This is due to the optimized and way in which we're counting words based on spaces between text and words themselves. Also, how am I to know what should be considered a word (e.g !@~# could be considered a - possibly censored - word by some).

@neocotic neocotic closed this May 19, 2013

@neocotic neocotic added the accepted label Nov 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment