toolkits for NLP

intent

为了方便自己学习与理解一些东西，实现一些自己的想法

Update info:

2021.8.5 增加serving 相关代码，线上deploy 时可以参考，对应代码：serving
2021.8.4 增加混合精度训练实验，实验代码 : classification_tnews_mixed_precision
2021.5.13 增加GPT2 及对应的对话生成实验，实验代码：basic_language_model_gpt2_gen
2021.5.12 增加GPT 及对应的对话生成实验，实验代码: basic_language_model_gpt_gen
2021.5.1 增加ReZero 及对应的文本分类实验，实验代码: tnews_rezero_pretrain_finetuning
2021.3.26 增加RealFormer（residual attention)及对应的文本分类实验，实验代码: classification tnews pet realformer
2021.1.13 增加SBERT的复现demo,具体代码：sbert-stsb
2020.11.26 增加pretrain + fine-tuning example, 具体代码：classification tnew pretrain before fine-tuning
2020.11.10 NEZHA增加external_embedding_weights, 可以通过该参数将其他信息融合进NEZHA Token-Embedding,具体使用方式：

from toolkit4nlp.models import build_transformer_model
# 自己构造 embeddings_matrix，与vocabulary 对应
config_path = ''
checkpoint_path = ''
embeddings_matrix = None
nezha = build_transformer_model(
config_path=checkpoint_path,
checkpoint_path=checkpoint_path, 
model='nezha', external_embedding_size=100,
 external_embedding_weights=embeddings_matrix)

2020.11.3 增加ccf 2020 qa match baseline：ccf_2020_qa_match_pair和ccf_2020_qa_match_point
2020.10.19 AdaBelief Optimizer 及对应example，具体代码：classification use AdaBelief
2020.10.16 增加focal loss 及对应example，具体代码：classification_focal_loss
2020.09.27 增加NEZHA的实现，使用方法：

from toolkit4nlp.models import build_transformer_model
config_path = '/home/mingming.xu/pretrain/NLP/chinese_nezha_base/config.json'
checkpoint_path = '/home/mingming.xu/pretrain/NLP/chinese_nezha_base/model_base.ckpt'

model = build_transformer_model(config_path=config_path, checkpoint_path=checkpoint_path, model='nezha')

2020.09.22 增加FastBERT的实现，具体代码：classification ifytek with FastBERT
2020.09.15 增加两个尝试在分类任务上构造新的任务来增强性能实验，具体代码：classification ifytek with similarity 和 classification ifytek with seq2seq
2020.09.10 增加Knowledge Distillation Bert example, 具体代码: distilling knowledge bert
2020.08.24 增加UniLM做question answer generation example，具体代码：qa question answer generation
2020.08.20 增加UniLM做question generation example，具体代码：qa question generation
2020.08.20 增加UniLM和LM model，使用方法：

from toolkit4nlp.models import build_transformer_model
config_path = '/home/mingming.xu/pretrain/NLP/chinese_electra_base_L-12_H-768_A-12/config.json'
checkpoint_path = '/home/mingming.xu/pretrain/NLP/chinese_electra_base_L-12_H-768_A-12/electra_base.ckpt'


# lm
model = build_transformer_model(
  config_path=config_path,
  checkpoint_path=checkpoint_path,
  application='lm'
)

# unilm
model = build_transformer_model(
  config_path=config_path,
  checkpoint_path=checkpoint_path,
  application='unilm'
)

2020.08.19 增加ELECTRA model,使用方法：

from toolkit4nlp.models import build_transformer_model


config_path = '/home/mingming.xu/pretrain/NLP/chinese_electra_base_L-12_H-768_A-12/config.json'
checkpoint_path = '/home/mingming.xu/pretrain/NLP/chinese_electra_base_L-12_H-768_A-12/electra_base.ckpt'

model =  build_transformer_model(
  config_path=config_path,
  checkpoint_path=checkpoint_path,
  model='electra',
)

2020.08.17 增加 two-stage-fine-tuning 实验，验证bert-of-theseus中theseus_model的必要性，具体代码: two_stage_fine_tuning
2020.08.14 增加 bert-of-theseus在ner相关实验下的代码，具体代码：sequence_labeling_ner_bert_of_theseus
2020.08.11 增加 bert-of-theseus在文本分类下的相关实验代码，具体代码:classification_ifytek_bert_of_theseus
2020.08.06 增加 cws-crf example,具体代码:cws_crf_example
2020.08.05 增加 ner-crf example,具体代码:ner_crf_example
2020.08.01 增加 bert + dgcnn 做 qa task, 具体代码:qa_dgcnn_example
2020.07.27 增加 pretraining，用法参照 pretraining/README.md
2020.07.18 增加 tokenizer，用法：

from toolkit4nlp.tokenizers import Tokenizer
vocab = ''
tokenizer = Tokenizer(vocab, do_lower_case=True)
tokenizer.encode('我爱你中国')

2020.07.16 完成bert加载预训练权重，用法：

from toolkit4nlp.models import build_transformer_model

config_path = ''
checkpoints_path = ''
model = build_transformer_model(config_path, checkpoints_path)

主要参考了bert 和 bert4keras以及 keras_bert

Name		Name	Last commit message	Last commit date
Latest commit History 328 Commits
examples		examples
pretraining		pretraining
serving		serving
toolkit4nlp		toolkit4nlp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

toolkits for NLP

intent

Update info:

About

Releases 1

Packages

Languages

License

xv44586/toolkit4nlp

Folders and files

Latest commit

History

Repository files navigation

toolkits for NLP

intent

Update info:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages