Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run the code on DailyDialog but have terrible result #5

Closed
gmftbyGMFTBY opened this issue Sep 14, 2019 · 26 comments
Closed

Run the code on DailyDialog but have terrible result #5

gmftbyGMFTBY opened this issue Sep 14, 2019 · 26 comments

Comments

@gmftbyGMFTBY
Copy link

gmftbyGMFTBY commented Sep 14, 2019

Hi, thanks for your open source codes of this work.
I try to apply your code on a new dataset DialogDialog, but I found that the outputs of the model are all the token '.' which means nothing.

So, I'm very curious that if this code is not appropriate to other datasets?
Can you help me troubleshoot the issue?

@zhanghainan
Copy link
Owner

It is suitable for other data. You can change the data input process in data_load.py.

@gmftbyGMFTBY
Copy link
Author

Hi, thanks for your response. I have a question when I try to reproduce ReCoSa.
After completely analyzing the model architecture, I think that the main difference between the ReCoSa and HRED-based hierarchical models are the transformer-based context encoder.
Can I just simply think that the ReCoSa replace the RNN context encoder with the transformer?

I hope to get the responses from you. Thank you very much.

@zhanghainan
Copy link
Owner

yes,recosa is add a lstm encoder for transformer

@ttzhang511
Copy link

ttzhang511 commented Dec 17, 2019

Hi, thanks for your open source codes of this work.
I try to apply your code on a new dataset DialogDialog, but I found that the outputs of the model are all the token '.' which means nothing.

So, I'm very curious that if this code is not appropriate to other datasets?
Can you help you troubleshoot the issue?

I have the same problem. And I have changed the data input process in data_load.py. So how to solve this problem?

@zhanghainan
Copy link
Owner

Have you ever "python propro.py" to conduct the vocab? In the train.py at 216 line, you could print the x and y to see if they are all correct with the vocab in prepocessed folder.

@ttzhang511
Copy link

Yes,x ,y are correct, but the preds like this :
image

@gmftbyGMFTBY
Copy link
Author

gmftbyGMFTBY commented Dec 17, 2019

Oh, I reproduce the ReCoSa on the Dailydialog and Cornell dataset, and the performance is slightly worse than HRED with the attention module.

It should be noted that the performance of the code in this repo is bad, so I reproduce the ReCoSa based on your paper. (Replace the RNN Context encoder with the Transformer, and the implementation of the Transformer is borrowed from the PyTorch 1.3)

@zhanghainan
Copy link
Owner

Yes,x ,y are correct, but the preds like this :
image

could you give me 100 lines of training data? I run this data to see the problem.

@zhanghainan
Copy link
Owner

Oh, I reproduce the ReCoSa on the Dailydialog and Cornell dataset, and the performance is slightly worse than HRED with the attention module.

It should be noted that the performance of the code in this repo is bad, so I reproduce the ReCoSa based on your paper. (Replace the RNN Context encoder with the Transformer, and the implementation of the Transformer is borrowed from the PyTorch 1.3)

Maybe the characteristic of the dataset is different. In JDC, it is the customer-servers dataset with topic drift phenomenon.

@gmftbyGMFTBY
Copy link
Author

gmftbyGMFTBY commented Dec 18, 2019

Thank you for your response, I agree with you. And I will try to modify the parameters of my implementation and find out the better setting.

@ttzhang511
Copy link

谢谢您的回复!我刚刚开始了解对话生成,读了您的文章很感兴趣,我实验用的是ubuntu数据集,我不确定这样的格式是否有问题,下面是大约600条数据。

ubuntu_train_answer_8.txt
ubuntu_train_query_8.txt

@zhanghainan
Copy link
Owner

谢谢您的回复!我刚刚开始了解对话生成,读了您的文章很感兴趣,我实验用的是ubuntu数据集,我不确定这样的格式是否有问题,下面是大约600条数据。

ubuntu_train_answer_8.txt
ubuntu_train_query_8.txt

格式是正确的,你需要多训练一下,我试了一下这600条数据,大概100epoch才能看到效果,整个模型训练大约需要2万epoch in my code。对话生成模型是否收敛不仅仅看dev set的指标,不用担心过拟合,而是要尽可能地多训练,才能看到效果。最后的生成效果会有瓶颈,但是dev指标是很早就开始跳跃了,但是效果还在变好,主要是因为指标不能反映对话生成的效果。

@zhanghainan
Copy link
Owner

Thank you for your response, I agree with you. And I will try to modify the parameters of my implementation and find out the better setting.

I have try the ttzhang511's data about 600 lines, it should have 100 epochs to train this small dataset. For my code , it need at least 20,000 epoches. You could run the model more times, regardless of the dev measures.

@gmftbyGMFTBY
Copy link
Author

gmftbyGMFTBY commented Dec 18, 2019

Emmm, but actually the dailydialog dataset is big. So many epochs lead to so much time to converge.

@zhanghainan
Copy link
Owner

Yes, you could see the generation sentences with more times.

@gmftbyGMFTBY
Copy link
Author

Now, I run 30 epoches on Dailydialog (Cornell is bigger than dailydialog). And through the performance curve, I found nearly all the metrics converges. (bleu1~4, rouge, dist1,dist2, BERTScore)

@gmftbyGMFTBY
Copy link
Author

I will try to use 100 epoches and analyze the performance. Then I will continue to report the result. Thank you for your response.

@ttzhang511
Copy link

I run 500 epoches on Dailydialog and got a lot of " I'm sorry", so do I need to run more epoches ?

@gmftbyGMFTBY
Copy link
Author

gmftbyGMFTBY commented Dec 19, 2019

I reproduce the ReCoSa in PyTorch 1.3 and use the official Transformer implementation of the PyTorch 1.3. 30 epochs are used to train the ReCoSa on Dailydialog dataset and here are my partial results:

1
2
3

Due to some special reasons, I don't show more information about the comparison.
It can be found that the ReCoSa is slightly worse than the baselines on some automatic evaluation methods. We will use the human annotations to test the performance of the ReCoSa in the future.

@ttzhang511, hope the comparison can be helpful. I will make my repo public in about a month.

@gmftbyGMFTBY
Copy link
Author

gmftbyGMFTBY commented Dec 19, 2019

Oh, I forget some essential information about my experiments:

  1. Training ReCoSa by 30 epochs cost nearly 8.5 hours.
  2. I didn't use the whole Dailydialog datasets due to some special reasons (So the low BLEU is explicable). If the whole Dailydialog dataset is used, the training time will be higher.

@ttzhang511
Copy link

Thank you so much! @gmftbyGMFTBY

@katherinelyx
Copy link

@gmftbyGMFTBY Hi. When I conduct ReCoSa on DailyDialog, I got some frequently existed bad examples.

  1. 'yes, and and and and and', many repetitions.
  2. 'i i will take it', always starts with the word 'i'.
    Have you seen such problems? Or maybe something thing is wrong with my operations? Could you please provide some suggestions about handling such issues?
    Thank you very much.

@gmftbyGMFTBY
Copy link
Author

Hi, I run the codes of the author in this repo, but actually it didn't work.
So I reproduce the ReCoSa by myself in PyTorch (source codes in this repo are written by tensorflow, but I don't think the issue is caused by the deep learning framework.).

In my opinion and has been recognized by the author, the ReCoSa just simply replace the RNN-based context encoder with the vanilla transformer encoder. So in my implementation, I use the official transformer architecture in PyTorch 1.3 (I also think you can try the transformer2.0) and the generation seems to be normal.

But after comparing with the HRED and other baselines, I found that ReCoSa is slightly worse than these baselines (Although the BLEU and other automatic evaluations may not be suitable for measuring the open-domain dialogue systems). I also tried some hyperparameters but the conclusion is the same. You can try to reproduce the ReCoSa by yourself and see the result.

@katherinelyx
Copy link

katherinelyx commented Dec 28, 2019 via email

@gmftbyGMFTBY
Copy link
Author

I'm not very familiar with the TensorFlow. Transformer 2.0 (huggingface) seems that support the TensorFlow and you can try it.

@gmftbyGMFTBY
Copy link
Author

gmftbyGMFTBY commented Feb 9, 2020

@ttzhang511 @katherinelyx @zhanghainan Hi guys, I make my repo public and it contains the PyTorch version ReCoSa and other multi-turn dialogue models. Welcome to use my repo. If you have any questions, feel free to raise the issue and let me know. I will try my best to give my solutions and responses. Thank you so much, and I will close the issue.

Repo: MultiTurnDialogZoo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants