Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
78 lines (57 sloc) 2.13 KB


Tokenise a line of text or an article into it’s smaller components (i.e. words, punctuation, numbers).


$ ./
No parameter has been passed. Please see usage below:

       Usage: ./ --method [ simple | learnable ]
                 --text [text]
                 --file [path/to/filename]

       --method       [ simple | learnable ]
                        simple        use the simple tokenisation method
                        learnable     use the learned model to perform tokenisation
       --text         plain text surrounded by quotes
       --file         name of the file containing text to pass as command arg
       --help         shows the script usage help text


$ ./ --method simple --text "this-is-worth,tokenising.and,this,is,another,one"
this - is - worth , tokenising . and , this , is , another , one

Average: 111.1 sent/s
Total: 1 sent
Runtime: 0.009s
Execution time: 0.233 seconds
$ ./ --method learnable --text "this-is-worth,tokenising.and,this,is,another,one"
Checking if model en-token.bin (en) exists...
Downloading model en-token.bin (en)...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  429k  100  429k    0     0   391k      0  0:00:01  0:00:01 --:--:--  391k
Loading Tokenizer model ... done (0.281s)
this-is-worth ,tokenising .and ,this ,is ,another ,one

Average: 76.9 sent/s
Total: 1 sent
Runtime: 0.013s
Execution time: 0.600 seconds

Similarly a file containing text can be passed in as:

$ ./ --method simple    --file article.txt
$ ./ --method learnable --file article.txt


Contributions are very welcome, please share back with the wider community (and get credited for it)!

Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.

Back to main page (table of contents)

You can’t perform that action at this time.