Skip to content
An NLP Application, which is a simple GUI app for tagging and tokenizing text, written in Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
imagesForUtilApp
README.md
tokenTagUtil.py

README.md

TokenizerTaggerUtilityApp

An NLP (Natural Language Processing) Application, which is a simple GUI app for tagging and tokenizing text, written in Python

img1

Installation:

  1. Install Python (if you have not done so: click here for instructions)
  2. Install Tkinter, which is used for Python GUI. The above link from #1 also contains instructions for the installation.
  3. For the installation of NLTK (Natural Language Toolkit) and Stanford NER Tagger (Named Entity Recognition), click on this link.

Tokenization

This is the process of breaking down a text into (either word or sentence) tokens, depending on the option that you select. This app can also display the number of tokens the text has.

img2

This image above shows the tokenized text (in words), with the token count.

img2

This image above shows the tokenized text (in sentences).

Tagging

Part of the blackboxing process of tagging is word tokenization, and for each token - it shall be tagged with a specific tag:

  • Stanford NER - Stanford's Named Entity Recognition (named entities can be thought of "brands"), can recognize and tag each token as '0' (i.e. not a named entity), 'PERSON' (name of a person), 'LOCATION' (name of a location), 'ORGANIZATION' (name of an organization), etc. It might not be perfect to detect every single named entity (or a false positive wherein it detects a named entity but is not), but most of the time it gets it right.
  • NLTK POS - Part of the Natural Language Toolkit is Part-of-Speech tagging. While Stanford's NER algorithm specializes with named entities among nouns, NLTK's POS focuses on tagging each tokenized word (see here for the complete list of tags).

img2

Above shows an image screenshot for tagging using Stanford's NER algorithm.

img2

Above shows an image screenshot for tagging using NLTK's POS algorithm.

You can’t perform that action at this time.