- This is a repository of pretrained dialogue generation models (GPT-2 and Meena) of Pingpong, ScatterLab.
- You can refer to our blog post for detailed pre-training processes and experiment results.
- Check our Korean demo and Japanese demo for the chatting experience.
- You can download the pretrained GPT-2 and Meena models from Release page.
base_gpt_trained_on_dialogue_data_kr.pth
- ํ๊ตญ์ด ๋ํ ๋ฐ์ดํฐ๋ก๋ง ํ์ตํ base size GPT-2
large_gpt_trained_on_dialogue_data_kr.pth
- ํ๊ตญ์ด ๋ํ ๋ฐ์ดํฐ๋ก๋ง ํ์ตํ large size GPT-2
base_gpt_trained_on_wiki_and_dialogue_data_kr.pth
- ํ๊ตญ์ด ๋ํ ๋ฐ์ดํฐ, ์ํคํผ๋์, ๋๋ฌด์ํค๋ก ํ์ตํ base size GPT-2
large_gpt_trained_on_wiki_and_dialogue_data_kr.pth
(Recommend)- ํ๊ตญ์ด ๋ํ ๋ฐ์ดํฐ, ์ํคํผ๋์, ๋๋ฌด์ํค๋ก ํ์ตํ large size GPT-2
base_meena_trained_on_filtered_data_kr.pth
- ํํฐ๋ง๋ ํ๊ตญ์ด ๋ํ ๋ฐ์ดํฐ๋ก ํ์ตํ base size Meena
large_meena_trained_on_filtered_data_kr.pth
(Recommend)- ํํฐ๋ง๋ ํ๊ตญ์ด ๋ํ ๋ฐ์ดํฐ๋ก ํ์ตํ large size Meena
base_meena_trained_on_non_filtered_data_kr.pth
- ํํฐ๋ง์ ๊ฑฐ์น์ง ์์ ํ๊ตญ์ด ๋ํ ๋ฐ์ดํฐ๋ก ํ์ตํ base size Meena
large_meena_trained_on_non_filtered_data_kr.pth
- ํํฐ๋ง์ ๊ฑฐ์น์ง ์์ ํ๊ตญ์ด ๋ํ ๋ฐ์ดํฐ๋ก ํ์ตํ large size Meena
base_meena_trained_on_filtered_data_jp.pth
- ็ด5ๅไปถใฎๆฅๆฌ่ชๆฅๅธธไผ่ฉฑใใผใฟใงๅญฆ็ฟใใbase sizeใฎMeena
- GPT
PYTHONPATH=. python examples/run_gpt.py \
--pretrained-model-path $PRETRAINED_MODEL_PATH \
--model-config-path $MODEL_CONFIG_PATH \
--tokenizer-model-path $TOKENIZER_MODEL_PATH \
--decoding-method $DECODING_METHOD
- Meena
PYTHONPATH=. python examples/run_meena.py \
--pretrained-model-path $PRETRAINED_MODEL_PATH \
--model-config-path $MODEL_CONFIG_PATH \
--tokenizer-model-path $TOKENIZER_MODEL_PATH \
--decoding-method $DECODING_METHOD
- We implement two decoding methods called Top-p Sampling and Beam Search as examples.
- There is a trade-off of Accuracy (Sensibleness) and Diversity (Specificity) between two decoding methods.
- Beam Search is a good choice if you prefer the accuracy of the answer, and Top-p Sampling is a good choice if you prefer the diversity of the answer.
- ๋ชจ๋ธ์ ์์ฑ ๊ฒฐ๊ณผ๋ ํ์ต์ ๋ฐํ์ผ๋ก ํ ์์ธก ๊ฒฐ๊ณผ์ด๋ฉฐ ์ค์บํฐ๋ฉ/ํํํ์ ์๊ฒฌ๊ณผ ๋ฌด๊ดํฉ๋๋ค.
- ๋ชจ๋ธ์ ์์ฑ ๊ฒฐ๊ณผ๋ ๊ฐ์์ ๋ํ ์์ฑ ๊ฒฐ๊ณผ์ด๋ฉฐ ์ฌ์ค ์ฌ๋ถ๋ฅผ ๋ด๋ณดํ์ง ์์ต๋๋ค.
- ์ค์บํฐ๋ฉ/ํํํ์ ๊ณต๊ฐํ ๋ชจ๋ธ์ ์์ฑ ๊ฒฐ๊ณผ์ ๋ํ ์ฑ ์์ ์ง์ง ์์ต๋๋ค.
- ๋ณธ ๋ ํฌ์งํ ๋ฆฌ๋ ๋ชจ๋ธ์ ์ฌ์ ํ์ต ์ฝ๋๋ฅผ ํฌํจํ๊ณ ์์ง ์์ต๋๋ค.
- ๊ณต๊ฐํ ๋ชจ๋ธ์ ์ ๋ ผ๋ฌธ์์ ์ ์๋ GPT-2 ๋ฐ Meena ๋ชจ๋ธ๊ณผ ์ฌ์ด์ฆ ๋ฐ ๊ตฌ์กฐ์ ์ผ๋ก ์ผ๋ถ ์ฐจ์ด๊ฐ ์์ต๋๋ค.
- ๊ณต๊ฐํ ๋ชจ๋ธ์ ๋๋์ ์นดํก ๋ฐ์ดํฐ๋ฅผ ์ด์ฉํ ์ฌ์ ํ์ต๋ง ์๋ฃํ ์ํ์ด๊ธฐ ๋๋ฌธ์ ์ค์ฌ์ฉ์ ํ ๋๋ ๋ชจ๋ธ์ ์ํ๋ ๋ชฉ์ ์ ๋ง๊ฒ ํ์ธํ๋ํ ๋ค ์ฌ์ฉํ์๋ ๊ฒ์ ๊ถ์ฅ๋๋ฆฝ๋๋ค.
- ๋ชจ๋ธ์ ์์ ์ ํ์ฉ์ ๋ํด์๋ support@pingpong.us๋ก ๋ฌธ์ ๋ถํ๋๋ฆฝ๋๋ค.
- ใขใใซใฎ็ๆ็ตๆใฏ็ตฑ่จ็ๆฉๆขฐๅญฆ็ฟใ็จใใไบๆธฌ็ตๆใงใใใไบๅฎใจใฏ็ก้ขไฟใช็บ่ฉฑๆใ็ๆใใใๅฏ่ฝๆงใใใใพใใใใฎ็ตๆใฏๅฝ็คพใฎๆๆๆฑบๅฎใๅคๆญใ็คบใใใฎใงใฏใใใพใใใ
- ๅฝ็คพใฏใๅ ฌ้ใใใขใใซใฎไฝฟ็จใซใใฃใฆ็ใใๆๅคฑใๆๅฎณ็ญใซใคใใฆใใใใชใๅ ดๅใซใใใฆใไธๅ่ฒฌไปปใ่ฒ ใใพใใใ
- ๆฌใฌใใธใใชใซใฏใขใใซใฎไบๅๅญฆ็ฟใซ้ขใใใฝใผในใณใผใใๅซใพใใฆใใใพใใใ
- ๅ ฌ้ใใใขใใซใซใฏใใชใชใธใใซ่ซๆใงๆๆกใใใGPT-2ใMeenaใจใฏใตใคใบใใขใใซใฎๆง้ ใซใใใฆไธ้จ็ฐใชใ้จๅใๅซใพใใฆใใใพใใ
- ๅ ฌ้ใใใขใใซใฏๆฅๅธธไผ่ฉฑใใผใฟใ็จใใไบๅๅญฆ็ฟใฎใฟใๅฎไบใใใใฎใงใใใๅฎ้ใซๅฉ็จใใๅ ดๅใซใฏ็ฎ็ใซใใฃใฆ่ฟฝๅ ๅญฆ็ฟใ่กใฃใฆใใๅฉ็จใใใใจใใๅงใใใพใใ
- ใขใใซใฎๅๆฅญ็ๅฉ็จใซ้ขใใฆใฏใsupport@pingpong.usใใๅใๅใใใ้กใใใพใใ
The pretrained models and the codes in this repository are distributed under the terms of the Apache-2.0 License.
If you use our software for research, please cite:
@misc{pingpong2020dial_gen_models,
author = {Chaehun Park, Sangwoo Seo, Dawoon Jung},
title = {dialogue-generation-models},
year = {2019},
publisher = {GitHub},
howpublished = {\url{https://github.com/pingpong-ai/dialogue-generation-models}}
}
@techreport{radford2019gpt2,
title={Language Models are Unsupervised Multitask Learners},
author={Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever},
institution={OpenAI},
year={2019}
}
@misc{adiwardana2020meena,
title={Towards a Human-like Open-Domain Chatbot},
author={Daniel Adiwardana, Minh-Thang Luong, David R So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu},
year={2020},
eprint={2001.09977},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
For training models, we used Cloud TPUs provided by TensorFlow Research Cloud program.