-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run the code on DailyDialog but have terrible result #5
Comments
It is suitable for other data. You can change the data input process in data_load.py. |
Hi, thanks for your response. I have a question when I try to reproduce ReCoSa. I hope to get the responses from you. Thank you very much. |
yes,recosa is add a lstm encoder for transformer |
I have the same problem. And I have changed the data input process in data_load.py. So how to solve this problem? |
Have you ever "python propro.py" to conduct the vocab? In the train.py at 216 line, you could print the x and y to see if they are all correct with the vocab in prepocessed folder. |
Oh, I reproduce the ReCoSa on the Dailydialog and Cornell dataset, and the performance is slightly worse than HRED with the attention module. It should be noted that the performance of the code in this repo is bad, so I reproduce the ReCoSa based on your paper. (Replace the RNN Context encoder with the Transformer, and the implementation of the Transformer is borrowed from the PyTorch 1.3) |
Maybe the characteristic of the dataset is different. In JDC, it is the customer-servers dataset with topic drift phenomenon. |
Thank you for your response, I agree with you. And I will try to modify the parameters of my implementation and find out the better setting. |
谢谢您的回复!我刚刚开始了解对话生成,读了您的文章很感兴趣,我实验用的是ubuntu数据集,我不确定这样的格式是否有问题,下面是大约600条数据。 |
格式是正确的,你需要多训练一下,我试了一下这600条数据,大概100epoch才能看到效果,整个模型训练大约需要2万epoch in my code。对话生成模型是否收敛不仅仅看dev set的指标,不用担心过拟合,而是要尽可能地多训练,才能看到效果。最后的生成效果会有瓶颈,但是dev指标是很早就开始跳跃了,但是效果还在变好,主要是因为指标不能反映对话生成的效果。 |
I have try the ttzhang511's data about 600 lines, it should have 100 epochs to train this small dataset. For my code , it need at least 20,000 epoches. You could run the model more times, regardless of the dev measures. |
Emmm, but actually the dailydialog dataset is big. So many epochs lead to so much time to converge. |
Yes, you could see the generation sentences with more times. |
Now, I run 30 epoches on Dailydialog (Cornell is bigger than dailydialog). And through the performance curve, I found nearly all the metrics converges. (bleu1~4, rouge, dist1,dist2, BERTScore) |
I will try to use 100 epoches and analyze the performance. Then I will continue to report the result. Thank you for your response. |
I run 500 epoches on Dailydialog and got a lot of " I'm sorry", so do I need to run more epoches ? |
I reproduce the ReCoSa in PyTorch 1.3 and use the official Transformer implementation of the PyTorch 1.3. 30 epochs are used to train the ReCoSa on Dailydialog dataset and here are my partial results: Due to some special reasons, I don't show more information about the comparison. @ttzhang511, hope the comparison can be helpful. I will make my repo public in about a month. |
Oh, I forget some essential information about my experiments:
|
Thank you so much! @gmftbyGMFTBY |
@gmftbyGMFTBY Hi. When I conduct ReCoSa on DailyDialog, I got some frequently existed bad examples.
|
Hi, I run the codes of the author in this repo, but actually it didn't work. In my opinion and has been recognized by the author, the ReCoSa just simply replace the RNN-based context encoder with the vanilla transformer encoder. So in my implementation, I use the official transformer architecture in PyTorch 1.3 (I also think you can try the transformer2.0) and the generation seems to be normal. But after comparing with the HRED and other baselines, I found that ReCoSa is slightly worse than these baselines (Although the BLEU and other automatic evaluations may not be suitable for measuring the open-domain dialogue systems). I also tried some hyperparameters but the conclusion is the same. You can try to reproduce the ReCoSa by yourself and see the result. |
Thank you. Do you have any suggestion about high-quality tensorflow (<2.0) implement of Transformer ? I am not sure that whether it is the implementation I used cause these bad examples.
发自我的iPhone
… 在 2019年12月28日,14:40,GMFTBY ***@***.***> 写道:
Hi, I run the codes of the author in this repo, but actually it didn't work.
So I reproduce the ReCoSa by myself in PyTorch (source codes in this repo are written by tensorflow, but I don't think the issue is caused by the deep learning framework.).
In my opinion and has been recognized by the author, the ReCoSa just simply replace the RNN-based context encoder with the vanilla transformer encoder. So in my implementation, I use the official transformer architecture in PyTorch 1.3 (I also think you can try the transformer2.0) and the generation seems to be normal.
But after comparing with the HRED and other baselines, I found that ReCoSa is slightly worse than these baselines (Although the BLEU and other automatic evaluations may not be suitable for measuring the open-domain dialogue systems). I also tried some hyperparameters but the conclusion is the same. You can try to reproduce the ReCoSa by yourself and see the result.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I'm not very familiar with the TensorFlow. Transformer 2.0 (huggingface) seems that support the TensorFlow and you can try it. |
@ttzhang511 @katherinelyx @zhanghainan Hi guys, I make my repo public and it contains the PyTorch version ReCoSa and other multi-turn dialogue models. Welcome to use my repo. If you have any questions, feel free to raise the issue and let me know. I will try my best to give my solutions and responses. Thank you so much, and I will close the issue. Repo: MultiTurnDialogZoo |
Hi, thanks for your open source codes of this work.
I try to apply your code on a new dataset DialogDialog, but I found that the outputs of the model are all the token '.' which means nothing.
So, I'm very curious that if this code is not appropriate to other datasets?
Can you help me troubleshoot the issue?
The text was updated successfully, but these errors were encountered: