Vietnamese Named Entity Recognition Tool
Install using npm:
npm install -g @vntk/tagger
Two main features:
- Train a new NER from raw data, like: News, QA, Comments, or Chat logs.
- Predict new text input into entities:
PER
,ORG
,LOC
,DATE
,TIME
, ...
Simply run following command to predict new input from file:
vntk-tagger predict [your_file_name.txt]
The output is a new file with name: your_file_name.txt.tags
Preparing your data from: News, QA, Comments, or Chat logs.
Convert raw data to enrich data, by:
- remove junk characters
- remove comments
- delete empty line
Command: vntk-tagger clean [your_data_file.txt]
Run node preprocess.js
to clean and convert raw data to iob
format. Result can feed to the trainer!
- Convert xml to iob format
- Tag raw input to
Run the following command to train new NER data.
vntk-tagger train [your_training_file.txt]