===============
Generic POS Tagger for text documents. Folder used to read/write should be in the same level at 'default' package
python3 postagger.py --input <input-folder-location> --output <output--folder-location>
- Requires the following libraries:
nltkstop_wordsinflection
[2018-12-22]
- Stopwords considering NLTK. The one from PyPl is still na option, no clear difference about
- Concatenation of NOUNS working 2.a. Has to be done before ADJ chains for more sound results
- removeLastRepWord(word) in case this is part of a previous concatenation 3.a. Important to removePunctuation first
- Concatenation of ADJ*NOUN+ is not working for consecutive adjectives - To Do
[2018-12-20]
- Implemented NOUNS+ concatenation
- Better punctuation removal
- Holy Grail
math-<?>
[2018-12-17]
- Fixed all filters to ignore
math-<?>tokens. It also does not concatenatemath-<?>tokens
[2018-12-14]
- Added one extra filter option to just lowercase words without any tag
- cleanText calls all the filter-functions comment and/or uncomment the ones desired
[2018-12-07]
- removePlurals() using
inflection - Included all POS tags from NLTK in the end od
file_manipulation.py - Simple refactoring
- Fixed relative imports from default. package
[2018-12-04]
- Major changes in the code to work with (word,pos_tag)
- All functions (major) refactor
- Concatenate words with NOUNS and ADJECTIVE tags in nltk.pos_tag()
[2018-11-29]
- makeLowerCase and applyStemmer ignore words beginning with
[2018-11-28]
- Several pre-processing functions implemented
- NLTK tagger working
- General refactorings
[2018-11-14]
- cleanString implemented to get rid of specific chars
[2018-10-29]
- Reading/writing files - line-by-line,
- Text clean and stopwords removal
- POS tagger using NLTK prototype
[2018-10-26]
- Project creation
- File/Folder structure implementation
- Command line arguments added