----A basic template for punctuation prediction implemented by PyTorch
data
raw_data
: the directory to store raw data for training, validation, and testing, containingtrain.txt
,valid.txt
, andtest.txt
, respectively.processed_data
: the directory to store the processed data ofraw_data
bypreprocessing.py
, containingtrain.npy
,valid.npy
andtest.npy
, respectively.preprocessing.py
: python script to split inputs and labels, convert input sentences to indexes, pad index list to the same length, and save the processed data as.npy
format.dataset.py
: python script to build pytorchDataset
andDataLoader
.
models
BaseModel.py
: the father class implementing the network building, setup input, forward computation, backpropagation, network saving and loading, learning rate schedulers, and visualization of losses and metrics.**Model.py
: the implementaion that extendsBaseModel
of specific models (methods), such asLstmModel
,Seq2SeqModel
etc.**Net.py
: the code of network achitectures, such asLstmNet.py
,Seq2SeqNet.py
etc.
run
trainer.py
: a basic template python file for training from scratch, or resuming training, and validation the**Model
.tester.py
: a basic template python file for testing the**Model
.
utils
configs.py
: the python file can be used to store and modify the hyper-parameters for training, validation and testing process.help_functions.py
: the python file can be used to store and modify the model initilization strategies and optimizer scheduler settings.metrics.py
: the python file can be used to store and modify the evaluation metrics, such asPrecsion
,Recall
,F1-score
etc.visualizer.py
: the python file can be used for visualization of the losses and images.
main.py: the script for running the code.
- if you want to train the model from scratch, run
python main.py --mode train --start_epoch 1
in the command line of your python environment. - if you want to resume the training process, run
python main.py --mode train --start_epoch epoch_you_want_to_resume
in the command line of your python environment. - if you want to test the model, run
python main.py --mode test --load_epoch parameters_epoch_you_want_to_test
in the command line of your python environment.
if you want to train and test your own model using this template, your just need to:
- create a
YourOwnNet.py
file in directorymodels
and implement every details of your own networks. - create a
YourOwnModel.py
that extends the father classBaseModel
and only implement theforward()
function andbackward()
function - import
YourOwnModel
intrainer.py
andtester.py
as well as modify the MODEL NAME with `YourOwnModel'.
- implement more
**Models
for punctuation prediction. - try to do punctuation restoration.
- try different kinds of Language materials, such as Chinese etc.
if you have any questions, please email to: tsmotlp or sakura.