Skip to content

reference pytorch code for intent classification

Notifications You must be signed in to change notification settings

xiaoanshi/iclassifier

 
 

Repository files navigation

Description

reference pytorch code for intent(sentence) classification.


Requirements


Snips data

experiments summary

Accuracy (%) GPU / CPU ONNX CONDA CONDA+je INTEL INTEL+je Dynamic Dynamic+je Inference Inference+Dynamic Inference+ONNX Etc
GloVe, GNB 80.43 1.2929 / - - / - / - - / - - -
GloVe, CNN 97.86 1.7939 / - 7.5656 / 1.8689 / 1.7735 4.6868 / 2.7592 1.9398 0.3848 / - / FAILED threads=14
GloVe, Densenet-CNN 97.57 3.6094 / - 19.1212 / 3.0717 / 3.0917 7.6969 / 6.5887 4.9481 0.8658 / - / FAILED threads=14
GloVe, Densenet-DSA 97.43 7.5007 / - - / 4.4936 / 4.9337 / 9.7873 7.2086 1.5420 / - / FAILED threads=14
DistilBERT, CLS 97.71 9.3075 / - - / 32.4263 / 31.1101 - / 37.7777 - / 29.3939 14.9494 10.4040 8.9942 / 10.1848 / 4.8818 threads=14
BERT-base(uncased), CNN 97.57 12.1273 / - - / 81.8787 - / 52.4949 34.7878 30.5454 threads=14
BERT-base(uncased), CLS 97.43 12.7714 / 100.929 174.2222 / 46.4263 / 43.5078 69.4343 / 62.5959 66.1212 / 63.0707 68.9191 66 66.9494 / 49.4747 60.7777 / 50.4040 30.7979 24.5353 16.9756 threads=14
BERT-base(uncased), CLS 97.00 9.2660 / 73.1010 113.2424 / 31.5400 / 26.9472 47.2323 / 42.8950 45.0000 / 43.2020 48.5050 45.2727 44.8080 / 33.4623 40.8888 / 34.0606 16.7419 13.5703 11.7487 del 8,9,19,11, threads=14
BERT-large, CNN 98.00 24.277 / -
BERT-large, CLS 97.86 23.542 / -
* GPU / CPU : Elapsed time/example(ms), GPU / CPU(pip 1.2.0)  [Tesla V100 1 GPU, Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz, 2 CPU, 14CORES/1CPU, HyperThreading]
* ONNX : onnxruntime 1.2.0, pip pytorch==1.2.0 
         / onnxruntime 1.3.0, conda pytorch=1.5.0 
         / onnxruntime 1.3.0, conda pytorch=1.5.0, onnxruntime_tools.optimizer_cli
* CONDA : conda pytorch=1.2.0
          / pytorch=1.5.0
* CONDA+je : pytorch=1.2.0, etc/jemalloc_omp_kmp.sh
             / pytorch=1.5.0, etc/jemalloc_omp_kmp.sh
* INTEL : conda pytorch=1.2.0, [intel optimzaed transformers](https://github.com/mingfeima/transformers/tree/kakao/gpt2)
* INTEL+je : conda pytorch=1.2.0, [intel optimzaed transformers](https://github.com/mingfeima/transformers/tree/kakao/gpt2), etc/jemalloc_omp_kmp.sh
* Dynamic : conda pytorch=1.4.0, --enable_dqm
            / pytorch=1.5.0, --enable_dqm
* Dynamic+je : conda pytorch=1.4.0, --enable_dqm, etc/jemalloc_omp_kmp.sh
               / pytorch=1.5.0, --enable_dqm, etc/jemalloc_omp_kmp.sh
* Inference : conda pytorch=1.5.0, --enable_inference
* Inference+Dynamic : conda pytorch=1.5.0, --enable_dqm, --enable_inference
* Inference+ONNX : conda pytorch=1.5.0, --enable_ort(onnxruntime 1.3.0), --enable_inference
                   / conda pytorch=1.6.0, --enable_ort(onnxruntime 1.4.0), --enable_inference 
                   / conda pytorch=1.6.0, --enable_ort(onnxruntime 1.4.0), --enable_inference, --quantize_onnx
* default batch size, learning rate, n_ctx(max_seq_length) : 128, 2e-4, 100
* number of tokens / sentence : MEAN : 9.08, MAX:24, MIN:3, MEDIAN:9
# threads Model Inference+ONNX / Inference+QuantizedONNX Etc
1 GloVe, Densenet-DSA 3.33 / -
1 DistilBERT, CLS 51.77 / 15.66
2 DistilBERT, CLS 28.38 / 9.71
3 DistilBERT, CLS 21.47 / 7.79
4 DistilBERT, CLS 18.75 / 6.69
5 DistilBERT, CLS 15.23 / 6.09
6 DistilBERT, CLS 14.22 / 5.69
7 DistilBERT, CLS 12.52 / 5.44
8 DistilBERT, CLS 10.46 / 5.21 good enough
9 DistilBERT, CLS 10.93 / 5.17
10 DistilBERT, CLS 9.75 / 4.99
11 DistilBERT, CLS 9.22 / 4.98
12 DistilBERT, CLS 10.11 / 4.91
13 DistilBERT, CLS 9.45 / 4.81
14 DistilBERT, CLS 9.31 / 4.74
emb_class=glove, enc_class=gnb

  • train
* token_emb_dim in configs/config-glove-gnb.json == 300 (ex, glove.6B.300d.txt )
$ python preprocess.py --config=configs/config-glove-gnb.json
$ python train.py --config=configs/config-glove-gnb.json --lr_decay_rate=0.9
  • evaluation
$ python evaluate.py --config=configs/config-glove-gnb.json
INFO:__main__:[Accuracy] : 0.8043,   563/  700
INFO:__main__:[Elapsed Time] : 980.9308052062988ms, 1.292972264542259ms on average

emb_class=glove, enc_class=cnn

  • train
* token_emb_dim in configs/config-glove-cnn.json == 300 (ex, glove.6B.300d.txt )
$ python preprocess.py --config=configs/config-glove-cnn.json
$ python train.py --config=configs/config-glove-cnn.json --lr_decay_rate=0.9 --embedding_trainable

* tensorboardX
$ rm -rf runs
$ tensorboard --logdir runs/ --port port-number --bind_all
  • evaluation
$ python evaluate.py --config=configs/config-glove-cnn.json
INFO:__main__:[Accuracy] : 0.9786,   685/  700
INFO:__main__:[Elapsed Time] : 1351ms, 1.793991416309013ms on average

emb_class=glove, enc_class=densenet-cnn

  • train
* token_emb_dim in configs/config-densenet-cnn.json == 300 (ex, glove.6B.300d.txt )
$ python preprocess.py --config=configs/config-densenet-cnn.json
$ python train.py --config=configs/config-densenet-cnn.json --lr_decay_rate=0.9 --embedding_trainable
  • evaluation
$ python evaluate.py --config=configs/config-densenet-cnn.json

INFO:__main__:[Accuracy] : 0.9757,   683/  700
INFO:__main__:[Elapsed Time] : 2633ms, 3.609442060085837ms on average

emb_class=glove, enc_class=densenet-dsa

  • train
* token_emb_dim in configs/config-densenet-dsa.json == 300 (ex, glove.6B.300d.txt )
$ python preprocess.py --config=configs/config-densenet-dsa.json
$ python train.py --config=configs/config-densenet-dsa.json --lr_decay_rate=0.9
  • evaluation
$ python evaluate.py --config=configs/config-densenet-dsa.json

INFO:__main__:[Accuracy] : 0.9743,   682/  700
INFO:__main__:[Elapsed Time] : 5367ms, 7.500715307582261ms on average

emb_class=bert, enc_class=cnn | cls

  • train
* n_ctx size should be less than 512

* enc_class=cnn

$ python preprocess.py --config=configs/config-bert-cnn.json --bert_model_name_or_path=./embeddings/bert-base-uncased --bert_do_lower_case
$ python train.py --config=configs/config-bert-cnn.json --bert_model_name_or_path=./embeddings/bert-base-uncased --bert_do_lower_case --bert_output_dir=bert-checkpoint --lr=5e-5 --epoch=3 --batch_size=64

* enc_class=cls

$ python preprocess.py --config=configs/config-bert-cls.json --bert_model_name_or_path=./embeddings/bert-base-uncased --bert_do_lower_case
$ python train.py --config=configs/config-bert-cls.json --bert_model_name_or_path=./embeddings/bert-base-uncased --bert_do_lower_case --bert_output_dir=bert-checkpoint --lr=5e-5 --epoch=3 --batch_size=64

* --bert_use_feature_based for feature-based
  • evaluation
* enc_class=cnn

$ python evaluate.py --config=configs/config-bert-cnn.json --bert_output_dir=bert-checkpoint

INFO:__main__:[Accuracy] : 0.9757,   683/  700
INFO:__main__:[Elapsed Time] : 10624ms, 12.127324749642346ms on average
  
** --bert_model_name_or_path=bert-large-uncased --lr=2e-5 , without --bert_do_lower_case
INFO:__main__:[Accuracy] : 0.9800,   686/  700
INFO:__main__:[Elapsed Time] : 16994ms, 24.277142857142856ms on average

* enc_class=cls
$ python evaluate.py --config=configs/config-bert-cls.json --bert_output_dir=bert-checkpoint

INFO:__main__:[Accuracy] : 0.9743,   682/  700
INFO:__main__:[Elapsed Time] : 8940ms, 12.771428571428572ms on average
  
** --bert_model_name_or_path=bert-large-uncased --lr=2e-5 , without --bert_do_lower_case
INFO:__main__:[Accuracy] : 0.9786,   685/  700
INFO:__main__:[Elapsed Time] : 16480ms, 23.542857142857144ms on average

** --bert_remove_layers=8,9,10,11 
INFO:__main__:[Accuracy] : 0.9700,   679/  700
INFO:__main__:[Elapsed Time] : 6911ms, 9.266094420600858ms on average

** --config=configs/config-distilbert-cls.json --bert_model_name_or_path=./embeddings/distilbert-base-uncased
INFO:__main__:[Accuracy] : 0.9771,   684/  700
INFO:__main__:[Elapsed Time] : 6607ms, 9.30758226037196ms on average


SST-2 data

experiments summary

  • iclassifier
Accuracy (%) GPU/CPU CONDA CONDA+je Dynamic Dynamic+je Etc
GloVe, GNB 72.27 1.2253 / - / - - / - - / - - / - - / - -
GloVe, CNN 82.81 1.7670 / 3.9191 / 4.5757 - / 4.3131 - / 4.4040 - / 4.8686 - / 4.4848 threads=14
GloVe, DenseNet-CNN 86.38 3.6203 / 7.1414 threads=14
GloVe, DenseNet-DSA 85.34 6.2450 / -
DistilFromBERT, GloVe, CNN 86.16 1.7900 / - from large
DistilFromBERT, GloVe, DenseNet-CNN 88.52 3.6788 / - from large
DistilFromBERT, GloVe, DenseNet-DSA 88.14 8.4647 / - from large
DistilFromRoBERTa, GloVe, CNN 86.55 1.8483 / - from large
DistilFromRoBERTa, GloVe, DenseNet-CNN 88.80 3.9580 / - from large
DistilFromRoBERTa, GloVe, DenseNet-DSA 88.25 8.5627 / - from large
DistilFromELECTRA, GloVe, CNN 86.55 1.7466 / - from large
DistilFromELECTRA, GloVe, DenseNet-CNN 89.79 3.6406 / - from large
DistilFromELECTRA, GloVe, DenseNet-DSA 88.58 8.3708 / - from large
DistilFromELECTRA, DistilBERT, CLS 93.52 7.4879 / - from large
BERT-tiny, CNN 79.08 4.8604 / -
BERT-tiny, CLS 80.83 3.8461 / -
BERT-mini, CNN 83.36 7.0983 / -
BERT-mini, CLS 83.69 5.5521 / -
BERT-small, CNN 87.53 7.2010 / -
BERT-small, CLS 87.86 6.0450 / -
BERT-medium, CNN 88.58 11.9082 / -
BERT-medium, CLS 89.24 9.5857 / -
DistilBERT, CNN 89.90 9.9362 / - - / 44.1111 - / 35.7070 threads=14
DistilBERT, CLS 91.10 8.9719 / - - / 37.2626 - / 29.4646 threads=14
BERT-base(uncased), CNN 92.04 14.1576 / -
BERT-base(uncased), CLS 92.42 12.7549 / 100.555 / 62.5050 68.5757 / 66.1818 65.1616 / 63.1616 66.4545(92.42) / 50.8080 60.5656 / 50.4343 threads=14
BERT-base(uncased), CNN 90.55 10.6824 / - del 8,9,10,11
BERT-base(uncased), CLS 91.49 8.7747 / 66.6363 / 42.8989 46.6262 / 45.6060 45.1313 / 45.5050 44.7676(90.61) / 34.3131 41.3535 / 34.8686 del 8,9,10,11, threads=14
BERT-base(uncased), CLS 90.23 7.0241 / 51.7676 43.5959 del 6,7,8,9,10,11, threads=14
BERT-base(uncased), CLS 86.66 5.8868 / 36.2121 26.5555 del 4,5,6,7,8,9,10,11, threads=14
BERT-large, CNN 93.08 28.6490 / -
BERT-large, CLS 94.12 22.3767 / -
BERT-large, CNN 88.47 14.7813 / - del 12~23
BERT-large, CLS 86.71 12.1560 / - del 12~23
SpanBERT-base, CNN 91.82 15.2098 / -
SpanBERT-base, CLS 91.49 13.1516 / -
SpanBERT-large, CNN 93.90 26.8609 / -
SpanBERT-large, CLS 93.96 26.0445 / -
ALBERT-base, CNN 92.04 16.0554 / -
ALBERT-base, CLS 90.01 14.6725 / -
ALBERT-xxlarge, CNN 95.77 57.4631 / -
ALBERT-xxlarge, CLS 94.45 51.8027 / -
RoBERTa-base, CNN 92.92 15.1016 / -
RoBERTa-base, CLS 93.03 14.6736 / -
RoBERTa-base, CNN 92.26 11.5241 / - del 8,9,10,11
RoBERTa-base, CLS 91.76 10.0296 / - del 8,9,10,11
RoBERTa-large, CNN 95.55 26.9807 / -
RoBERTa-large, CLS 95.66 23.7395 / -
BART-large, CNN 94.45 35.1708 / -
BART-large, CLS 94.89 33.3862 / -
ELECTRA-base, CNN 95.39 14.9802 / -
ELECTRA-base, CLS 95.22 14.0087 / -
ELECTRA-large, CNN 96.05 27.2868 / -
ELECTRA-large, CLS 96.43 25.6857 / -
Accuracy (%)
T5-3B 97.4
ALBERT 97.1
RoBERTa 96.7
MT-DNN 95.6
DistilBERT 92.7
emb_class=glove, enc_class=gnb

  • train
* token_emb_dim in configs/config-glove-gnb.json == 300 (ex, glove.6B.300d.txt )
$ python preprocess.py --config=configs/config-glove-gnb.json --data_dir=data/sst2
$ python train.py --config=configs/config-glove-gnb.json --data_dir=data/sst2 --lr=1e-3 --lr_decay_rate=0.9
  • evaluation
$ python evaluate.py --config=configs/config-glove-gnb.json --data_dir=data/sst2
INFO:__main__:[Accuracy] : 0.7227,  1316/ 1821
INFO:__main__:[Elapsed Time] : 2310.748338699341ms, 1.2253080095563615ms on average

emb_class=glove, enc_class=cnn

  • train
* token_emb_dim in configs/config-glove-cnn.json == 300 (ex, glove.6B.300d.txt )
$ python preprocess.py --data_dir=data/sst2
$ python train.py --data_dir=data/sst2 --lr=1e-3 --lr_decay_rate=0.9
  • evaluation
$ python evaluate.py --data_dir=data/sst2

INFO:__main__:[Accuracy] : 0.8281,  1508/ 1821
INFO:__main__:[Elapsed Time] : 3300ms, 1.767032967032967ms on average

emb_class=glove, enc_class=densenet-cnn

  • train
* token_emb_dim in configs/config-densenet-cnn.json == 300 (ex, glove.6B.300d.txt )
$ python preprocess.py --config=configs/config-densenet-cnn.json --data_dir=data/sst2
$ python train.py --config=configs/config-densenet-cnn.json --data_dir=data/sst2 --lr_decay_rate=0.9
  • evaluation
$ python evaluate.py --config=configs/config-densenet-cnn.json --data_dir=data/sst2

INFO:__main__:[Accuracy] : 0.8638,  1573/ 1821
INFO:__main__:[Elapsed Time] : 6678ms, 3.6203296703296703ms on average

emb_class=glove, enc_class=densenet-dsa

  • train
* token_emb_dim in configs/config-densenet-dsa.json == 300 (ex, glove.6B.300d.txt )
$ python preprocess.py --config=configs/config-densenet-dsa.json --data_dir=data/sst2
$ python train.py --config=configs/config-densenet-dsa.json --data_dir=data/sst2 --lr_decay_rate=0.9
  • evaluation
$ python evaluate.py --config=configs/config-densenet-dsa.json --data_dir=data/sst2

INFO:__main__:[Accuracy] : 0.8534,  1554/ 1821
INFO:__main__:[Elapsed Time] : 11459ms, 6.245054945054945ms on average

* try again
INFO:__main__:[Accuracy] : 0.8506,  1549/ 1821
INFO:__main__:[Elapsed Time] : 21745ms, 11.885714285714286ms on average

* softmax masking
INFO:__main__:[Accuracy] : 0.8473,  1543/ 1821
INFO:__main__:[Elapsed Time] : 19214ms, 10.477472527472527ms on average

emb_class=bert, enc_class=cnn | cls

  • train
* n_ctx size should be less than 512

* enc_class=cnn

$ python preprocess.py --config=configs/config-bert-cnn.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/bert-base-uncased --bert_do_lower_case
$ python train.py --config=configs/config-bert-cnn.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/bert-base-uncased --bert_do_lower_case --bert_output_dir=bert-checkpoint --lr=1e-5 --epoch=3 --batch_size=64

* enc_class=cls

$ python preprocess.py --config=configs/config-bert-cls.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/bert-base-uncased --bert_do_lower_case
$ python train.py --config=configs/config-bert-cls.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/bert-base-uncased --bert_do_lower_case --bert_output_dir=bert-checkpoint --lr=1e-5 --epoch=3 --batch_size=64
  • evaluation
* enc_class=cnn

$ python evaluate.py --config=configs/config-bert-cnn.json --data_dir=data/sst2 --bert_output_dir=bert-checkpoint

INFO:__main__:[Accuracy] : 0.9204,  1676/ 1821
INFO:__main__:[Elapsed Time] : 25878ms, 14.157692307692308ms on average

** --bert_model_name_or_path=bert-large-uncased
INFO:__main__:[Accuracy] : 0.9308,  1695/ 1821
INFO:__main__:[Elapsed Time] : 52170ms, 28.649093904448105ms on average

** --bert_model_name_or_path=embeddings/pytorch.uncased_L-8_H-512_A-8
INFO:__main__:[Accuracy] : 0.8858,  1613/ 1821
INFO:__main__:[Elapsed Time] : 21791ms, 11.908241758241758ms on average

** --bert_model_name_or_path=embeddings/pytorch.uncased_L-4_H-512_A-8
INFO:__main__:[Accuracy] : 0.8753,  1594/ 1821
INFO:__main__:[Elapsed Time] : 13206ms, 7.201098901098901ms on average

** --bert_model_name_or_path=embeddings/pytorch.uncased_L-4_H-256_A-4
INFO:__main__:[Accuracy] : 0.8336,  1518/ 1821
INFO:__main__:[Elapsed Time] : 13021ms, 7.098351648351648ms on average

** --bert_model_name_or_path=embeddings/pytorch.uncased_L-2_H-128_A-2
INFO:__main__:[Accuracy] : 0.7908,  1440/ 1821
INFO:__main__:[Elapsed Time] : 8951ms, 4.86043956043956ms on average

** --configs/config-distilbert-cnn.json --bert_model_name_or_path=embeddings/distilbert-base-uncased
INFO:__main__:[Accuracy] : 0.8990,  1637/ 1821
INFO:__main__:[Elapsed Time] : 18193ms, 9.936263736263736ms on average

** for using SpanBERT embedding, just replace pretrained BERT model to SpanBERT.
** --bert_model_name_or_path=embeddings/spanbert_hf_large , without --bert_do_lower_case
INFO:__main__:[Accuracy] : 0.9390,  1710/ 1821
INFO:__main__:[Elapsed Time] : 49042ms, 26.860989010989012ms on average

** --bert_model_name_or_path=embeddings/spanbert_hf_base , without --bert_do_lower_case
INFO:__main__:[Accuracy] : 0.9182,  1672/ 1821
INFO:__main__:[Elapsed Time] : 27796ms, 15.20989010989011ms on average

** --bert_remove_layers=8,9,10,11
INFO:__main__:[Accuracy] : 0.9055,  1649/ 1821
INFO:__main__:[Elapsed Time] : 19541ms, 10.682417582417582ms on average

** --bert_model_name_or_path=bert-large-uncased --bert_remove_layers=12,13,14,15,16,17,18,19,20,21,22,23 
INFO:__main__:[Accuracy] : 0.8847,  1611/ 1821
INFO:__main__:[Elapsed Time] : 27017ms, 14.781318681318682ms on average

* enc_class=cls

$ python evaluate.py --config=configs/config-bert-cls.json --data_dir=data/sst2 --bert_output_dir=bert-checkpoint

INFO:__main__:[Accuracy] : 0.9242,  1683/ 1821
INFO:__main__:[Elapsed Time] : 23314ms, 12.754945054945056ms on average

** n_ctx=64
INFO:__main__:[Accuracy] : 0.9259,  1686/ 1821
INFO:__main__:[Elapsed Time] : 23765.23184776306ms, 13.007715246179602ms on average

** n_ctx=64, --lr=2e-5 --epoch=3 --batch_size=64 --use_transformers_optimizer --warmup_epoch=1 --weight_decay=0.0 --seed=0
INFO:__main__:[Accuracy] : 0.9281,  1690/ 1821
INFO:__main__:[Elapsed Time] : 21707.942724227905ms, 11.878120244204343ms on average

** --bert_model_name_or_path=bert-large-uncased --lr=2e-5
INFO:__main__:[Accuracy] : 0.9412,  1714/ 1821
INFO:__main__:[Elapsed Time] : 40847.62740135193ms, 22.37672412788475ms on average

** --bert_model_name_or_path=embeddings/pytorch.uncased_L-8_H-512_A-8
INFO:__main__:[Accuracy] : 0.8924,  1625/ 1821
INFO:__main__:[Elapsed Time] : 17558ms, 9.585714285714285ms on average

** --bert_model_name_or_path=embeddings/pytorch.uncased_L-4_H-512_A-8
INFO:__main__:[Accuracy] : 0.8786,  1600/ 1821
INFO:__main__:[Elapsed Time] : 11104ms, 6.045054945054945ms on average

** --bert_model_name_or_path=embeddings/pytorch.uncased_L-4_H-256_A-4
INFO:__main__:[Accuracy] : 0.8369,  1524/ 1821
INFO:__main__:[Elapsed Time] : 10196ms, 5.552197802197802ms on average

** --bert_model_name_or_path=embeddings/pytorch.uncased_L-2_H-128_A-2
INFO:__main__:[Accuracy] : 0.8083,  1472/ 1821
INFO:__main__:[Elapsed Time] : 7124ms, 3.8461538461538463ms on average

** --configs/config-distilbert-cls.json --bert_model_name_or_path=embeddings/distilbert-base-uncased
INFO:__main__:[Accuracy] : 0.9110,  1659/ 1821
INFO:__main__:[Elapsed Time] : 16431ms, 8.971978021978021ms on average

** for using SpanBERT embedding, just replace pretrained BERT model to SpanBERT.
** --bert_model_name_or_path=embeddings/spanbert_hf_large , without --bert_do_lower_case
INFO:__main__:[Accuracy] : 0.9396,  1711/ 1821
INFO:__main__:[Elapsed Time] : 47570ms, 26.044505494505493ms on average

** --bert_model_name_or_path=embeddings/spanbert_hf_base , without --bert_do_lower_case
INFO:__main__:[Accuracy] : 0.9149,  1666/ 1821
INFO:__main__:[Elapsed Time] : 24049ms, 13.151648351648351ms on average

** --bert_remove_layers=8,9,10,11
INFO:__main__:[Accuracy] : 0.9149,  1666/ 1821
INFO:__main__:[Elapsed Time] : 16082ms, 8.774725274725276ms on average

** --bert_remove_layers=6,7,8,9,10,11
INFO:__main__:[Accuracy] : 0.9023,  1643/ 1821
INFO:__main__:[Elapsed Time] : 12865ms, 7.024175824175824ms on average

** --bert_remove_layers=4,5,6,7,8,9,10,11
INFO:__main__:[Accuracy] : 0.8666,  1578/ 1821
INFO:__main__:[Elapsed Time] : 10800ms, 5.886813186813187ms on average

** --bert_model_name_or_path=bert-large-uncased --bert_remove_layers=12,13,14,15,16,17,18,19,20,21,22,23 
INFO:__main__:[Accuracy] : 0.8671,  1579/ 1821
INFO:__main__:[Elapsed Time] : 22229ms, 12.156043956043955ms on average

emb_class=albert, enc_class=cnn | cls

  • train
* n_ctx size should be less than 512

* enc_class=cnn

$ python preprocess.py --config=configs/config-albert-cnn.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/albert-base-v2 --bert_do_lower_case
$ python train.py --config=configs/config-albert-cnn.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/albert-base-v2 --bert_output_dir=bert-checkpoint --lr=1e-5 --epoch=10 --batch_size=64 --bert_do_lower_case
  • evaluation
* enc_class=cnn

$ python evaluate.py --config=configs/config-albert-cnn.json --data_dir=data/sst2 --bert_output_dir=bert-checkpoint 

INFO:__main__:[Accuracy] : 0.9204,  1676/ 1821
INFO:__main__:[Elapsed Time] : 29321ms, 16.055494505494504ms on average

** --bert_model_name_or_path=./embeddings/albert-xxlarge-v2 --batch_size=32
INFO:__main__:[Accuracy] : 0.9577,  1744/ 1821
INFO:__main__:[Elapsed Time] : 104769ms, 57.463186813186816ms on average

* enc_class=cls

$ python evaluate.py --config=configs/config-albert-cls.json --data_dir=data/sst2 --bert_output_dir=bert-checkpoint

INFO:__main__:[Accuracy] : 0.9001,  1639/ 1821
INFO:__main__:[Elapsed Time] : 26819ms, 14.672527472527472ms on average

** --bert_model_name_or_path=./embeddings/albert-xxlarge-v2 --batch_size=32
INFO:__main__:[Accuracy] : 0.9445,  1720/ 1821
INFO:__main__:[Elapsed Time] : 94456ms, 51.80274725274725ms on average

emb_class=roberta, enc_class=cnn | cls

  • train
* n_ctx size should be less than 512

* enc_class=cnn

$ python preprocess.py --config=configs/config-roberta-cnn.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/roberta-large
$ python train.py --config=configs/config-roberta-cnn.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/roberta-large --bert_output_dir=bert-checkpoint --lr=1e-5 --epoch=10 --lr_decay_rate=0.9 --batch_size=64

* enc_class=cls

$ python preprocess.py --config=configs/config-roberta-cls.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/roberta-large 
$ python train.py --config=configs/config-roberta-cls.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/roberta-large --bert_output_dir=bert-checkpoint --lr=1e-5 --epoch=10 --lr_decay_rate=0.9 --batch_size=64
  • evaluation
* enc_class=cnn

$ python evaluate.py --config=configs/config-roberta-cnn.json --data_dir=data/sst2 --bert_output_dir=bert-checkpoint

INFO:__main__:[Accuracy] : 0.9555,  1740/ 1821
INFO:__main__:[Elapsed Time] : 49297ms, 26.98076923076923ms on average

** --bert_model_name_or_path=./embeddings/roberta-base
INFO:__main__:[Accuracy] : 0.9292,  1692/ 1821
INFO:__main__:[Elapsed Time] : 27615ms, 15.101648351648352ms on average

** --bert_model_name_or_path=./embeddings/roberta-base --bert_remove_layers=8,9,10,11
INFO:__main__:[Accuracy] : 0.9226,  1680/ 1821
INFO:__main__:[Elapsed Time] : 21127ms, 11.524175824175824ms on average

* enc_class=cls

$ python evaluate.py --config=configs/config-roberta-cls.json --data_dir=data/sst2 --bert_output_dir=bert-checkpoint

INFO:__main__:[Accuracy] : 0.9566,  1742/ 1821
INFO:__main__:[Elapsed Time] : 43363ms, 23.73956043956044ms on average

** --bert_model_name_or_path=./embeddings/roberta-base
INFO:__main__:[Accuracy] : 0.9303,  1694/ 1821
INFO:__main__:[Elapsed Time] : 26822ms, 14.673626373626373ms on average

** --bert_model_name_or_path=./embeddings/roberta-base --bert_remove_layers=8,9,10,11
INFO:__main__:[Accuracy] : 0.9176,  1671/ 1821
INFO:__main__:[Elapsed Time] : 18344ms, 10.02967032967033ms on average

emb_class=bart, enc_class=cnn | cls

  • train
* n_ctx size should be less than 512

* enc_class=cnn

$ python preprocess.py --config=configs/config-bart-cnn.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/bart-large
$ python train.py --config=configs/config-bart-cnn.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/bart-large --bert_output_dir=bert-checkpoint --lr=1e-5 --epoch=10 --lr_decay_rate=0.9 --batch_size=64

* enc_class=cls

$ python preprocess.py --config=configs/config-bart-cls.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/bart-large 
$ python train.py --config=configs/config-bart-cls.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/bart-large --bert_output_dir=bert-checkpoint --lr=1e-5 --epoch=10 --lr_decay_rate=0.9 --batch_size=64
  • evaluation
* enc_class=cnn

$ python evaluate.py --config=configs/config-bart-cnn.json --data_dir=data/sst2 --bert_output_dir=bert-checkpoint

INFO:__main__:[Accuracy] : 0.9445,  1720/ 1821
INFO:__main__:[Elapsed Time] : 64224ms, 35.17087912087912ms on average

* enc_class=cls

$ python evaluate.py --config=configs/config-bart-cls.json --data_dir=data/sst2 --bert_output_dir=bert-checkpoint

INFO:__main__:[Accuracy] : 0.9489,  1728/ 1821
INFO:__main__:[Elapsed Time] : 61015ms, 33.386263736263736ms on average

emb_class=electra, enc_class=cnn | cls

  • train
* n_ctx size should be less than 512

* enc_class=cnn

$ python preprocess.py --config=configs/config-electra-cnn.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/electra-base-discriminator --bert_do_lower_case
$ python train.py --config=configs/config-electra-cnn.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/electra-base-discriminator --bert_output_dir=bert-checkpoint --lr=1e-5 --epoch=10 --lr_decay_rate=0.9 --batch_size=64 --bert_do_lower_case

* enc_class=cls

$ python preprocess.py --config=configs/config-electra-cls.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/electra-base-discriminator --bert_do_lower_case
$ python train.py --config=configs/config-electra-cls.json --data_dir=data/sst2 --bert_model_name_or_path=./embeddings/electra-base-discriminator --bert_output_dir=bert-checkpoint --lr=1e-5 --epoch=10 --lr_decay_rate=0.9 --batch_size=64 --bert_do_lower_case
  • evaluation
* enc_class=cnn

$ python evaluate.py --config=configs/config-electra-cnn.json --data_dir=data/sst2 --bert_output_dir=bert-checkpoint 

INFO:__main__:[Accuracy] : 0.9539,  1737/ 1821
INFO:__main__:[Elapsed Time] : 29602ms, 14.98021978021978ms on average

** --bert_model_name_or_path=./embeddings/electra-large-discriminator --lr=1e-6
INFO:__main__:[Accuracy] : 0.9566,  1742/ 1821
INFO:__main__:[Elapsed Time] : 54157ms, 28.356593406593408ms on average

** --bert_model_name_or_path=./embeddings/electra-large-discriminator --lr=1e-6 --epoch=15
INFO:__main__:[Accuracy] : 0.9605,  1749/ 1821
INFO:__main__:[Elapsed Time] : 52163ms, 27.286813186813188ms on average

* enc_lass=cls

$ python evaluate.py --config=configs/config-electra-cls.json --data_dir=data/sst2 --bert_output_dir=bert-checkpoint

INFO:__main__:[Accuracy] : 0.9522,  1734/ 1821
INFO:__main__:[Elapsed Time] : 25956ms, 14.008791208791209ms on average

** --bert_model_name_or_path=./embeddings/electra-large-discriminator --lr=1e-6 --epoch=15
INFO:__main__:[Accuracy] : 0.9643,  1756/ 1821
INFO:__main__:[Elapsed Time] : 47163ms, 25.685714285714287ms on average


Experiments for Korean


Optimization


Distillation

  • DISTILLATION.md
    • BERT-large, RoBERTa-large, ELECTRA-large, CLS -> GloVe, DenseNet-CNN, DenseNet-DSA

TorchServe

  • archiving and start torch server
$ cd torchserve
* modify for emb_class to use.
$ ./archiver.sh -v -v
$ ./start-torchserve.sh -v -v
  • request
* health check
$ curl http://localhost:16543/ping
{
  "status": "Healthy"
}

* management api
$ curl http://localhost:16544/models
{
  "models": [
    {
      "modelName": "electra",
      "modelUrl": "electra.mar"
    }
  ]
}

* view all inference apis
$ curl -X OPTIONS http://localhost:16543

* view all management apis
$ curl -X OPTIONS http://localhost:16544

* classify
$ curl -X POST http://localhost:16543/predictions/electra --form data='이 영화는 재미가 있다' | jq
{
  "text": "이 영화는 재미가 있다",
  "results": "1"
}

Citation

@misc{iclassifier,
  author = {dsindex},
  title = {iclassifier},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/dsindex/iclassifier}},
}

References

About

reference pytorch code for intent classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 92.4%
  • Shell 7.6%