Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
90 lines (68 sloc) 3.11 KB

Tag Parts of Speech

Tag parts of speech of each token in a line of text or an article (see Penn Treebank tag set for legend of token types), also see https://nlp.stanford.edu/software/tagger.shtml.

Usage:

$ ./posTagger.sh
No parameter has been passed. Please see usage below:

       Usage: ./posTagger.sh --method [ maxent | perceptron ]
                 --text [text]
                 --file [path/to/filename]
                 --help

       --method       [ maxent | perceptron ]
                        maxent        use the model trained using the Max Entropy algorithm
                        perceptron    use the model trained using the Perceptron algorithm to train
       --text         plain text surrounded by quotes
       --file         name of the file containing text to pass as command arg
       --help         shows the script usage help text

Examples:

./posTagger.sh --method maxent --text "This is a simple text to tag"
Checking if model en-pos-maxent.bin (en) exists...
Downloading model en-pos-maxent.bin (en)...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5562k  100 5562k    0     0   602k      0  0:00:09  0:00:09 --:--:--  561k
Loading POS Tagger model ... done (1.837s)
This_DT is_VBZ a_DT simple_JJ text_NN to_TO tag_NN


Average: 38.5 sent/s
Total: 1 sent
Runtime: 0.026s
Execution time: 1.969 seconds

Check out https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
  to find out what each of the tags mean
$ ./posTagger.sh --method perceptron --text "This is a simple text to tag"
Checking if model en-pos-perceptron.bin (en) exists...
Downloading model en-pos-perceptron.bin (en)...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3882k  100 3882k    0     0   634k      0  0:00:06  0:00:06 --:--:--  523k
Loading POS Tagger model ... done (1.492s)
This_DT is_VBZ a_DT simple_JJ text_NN to_TO tag_VB


Average: 58.8 sent/s
Total: 1 sent
Runtime: 0.017s
Execution time: 1.593 seconds

Check out https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
  to find out what each of the tags mean

Parts of speech are prefixed with _ followed by a two or three letter suffix, see https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html to find out what each of the Parts of Speech mean.

Similarly a file containing text can be passed in as:

$ ./posTagger.sh --method maxent     --file article.txt
or
$ ./posTagger.sh --method perceptron --file article.txt

Contributing

Contributions are very welcome, please share back with the wider community (and get credited for it)!

Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.


Back to main page (table of contents)

You can’t perform that action at this time.