diff --git a/docs-toc.yml b/docs-toc.yml index c32804c..ff72b34 100644 --- a/docs-toc.yml +++ b/docs-toc.yml @@ -19,6 +19,8 @@ toc: - string.song - string.splitElisions - string.stem + - string.tokenize + - string.tokenize0 - string.trim - string.upperCase - name: helper diff --git a/docs/index.html b/docs/index.html index c85a15c..a98fb2d 100644 --- a/docs/index.html +++ b/docs/index.html @@ -224,6 +224,26 @@
The function uses the following set of rules to tokenize:
+...
are converted to ellipses.Not
elisions are amplified and then split on elisions. Thus words with elisions get tokenized.cannot
is split in to can not
.. , -
punctuations that commonly embedded in numbers are left intact,_
) embedded in the word is preserved.(string)
+ — the input string.
+
+ Array<string>
:
+ of tokens.
+
+
+
+
+
+
+
+
+ tokenize( "someone's wallet, isn't it? I'll return!" );
+// -> [ 'someone\'s', 'wallet', ',', 'is', 'not', 'it',
+// '?', 'i', '\'ll', 'return', '!' ]
+
+
+
+
+
+
+
+
+Tokenizes by splitting the input string on non-words. This means tokens would +consists of only alphas, numerals and underscores; all other characters will +be stripped as they are treated as separators. It also removes all elisions; +however negations are retained and amplified.
+ + +(string)
+ — the input string.
+
+ Array<string>
:
+ of tokens.
+
+
+
+
+
+
+
+
+ tokenize0( "someone's wallet, isn't it?" );
+// -> [ 'someone', 's', 'wallet', 'is', 'not', 'it' ]
+
+
+
+
+
+
+
+