Run the code on DailyDialog but have terrible result #5

gmftbyGMFTBY · 2019-09-14T04:53:43Z

Hi, thanks for your open source codes of this work.
I try to apply your code on a new dataset DialogDialog, but I found that the outputs of the model are all the token '.' which means nothing.

So, I'm very curious that if this code is not appropriate to other datasets?
Can you help me troubleshoot the issue?

zhanghainan · 2019-12-06T07:07:05Z

It is suitable for other data. You can change the data input process in data_load.py.

gmftbyGMFTBY · 2019-12-06T16:09:34Z

Hi, thanks for your response. I have a question when I try to reproduce ReCoSa.
After completely analyzing the model architecture, I think that the main difference between the ReCoSa and HRED-based hierarchical models are the transformer-based context encoder.
Can I just simply think that the ReCoSa replace the RNN context encoder with the transformer?

I hope to get the responses from you. Thank you very much.

zhanghainan · 2019-12-17T03:36:26Z

yes，recosa is add a lstm encoder for transformer

ttzhang511 · 2019-12-17T11:25:21Z

Hi, thanks for your open source codes of this work.
I try to apply your code on a new dataset DialogDialog, but I found that the outputs of the model are all the token '.' which means nothing.

So, I'm very curious that if this code is not appropriate to other datasets?
Can you help you troubleshoot the issue?

I have the same problem. And I have changed the data input process in data_load.py. So how to solve this problem?

zhanghainan · 2019-12-17T11:31:04Z

Have you ever "python propro.py" to conduct the vocab? In the train.py at 216 line, you could print the x and y to see if they are all correct with the vocab in prepocessed folder.

ttzhang511 · 2019-12-17T12:15:28Z

Yes，x ,y are correct, but the preds like this :

gmftbyGMFTBY · 2019-12-17T13:06:41Z

Oh, I reproduce the ReCoSa on the Dailydialog and Cornell dataset, and the performance is slightly worse than HRED with the attention module.

It should be noted that the performance of the code in this repo is bad, so I reproduce the ReCoSa based on your paper. (Replace the RNN Context encoder with the Transformer, and the implementation of the Transformer is borrowed from the PyTorch 1.3)

zhanghainan · 2019-12-18T02:21:10Z

Yes，x ,y are correct, but the preds like this :

could you give me 100 lines of training data? I run this data to see the problem.

zhanghainan · 2019-12-18T02:26:11Z

Oh, I reproduce the ReCoSa on the Dailydialog and Cornell dataset, and the performance is slightly worse than HRED with the attention module.

It should be noted that the performance of the code in this repo is bad, so I reproduce the ReCoSa based on your paper. (Replace the RNN Context encoder with the Transformer, and the implementation of the Transformer is borrowed from the PyTorch 1.3)

Maybe the characteristic of the dataset is different. In JDC, it is the customer-servers dataset with topic drift phenomenon.

gmftbyGMFTBY · 2019-12-18T02:28:31Z

Thank you for your response, I agree with you. And I will try to modify the parameters of my implementation and find out the better setting.

ttzhang511 · 2019-12-18T03:02:47Z

谢谢您的回复！我刚刚开始了解对话生成，读了您的文章很感兴趣，我实验用的是ubuntu数据集，我不确定这样的格式是否有问题，下面是大约600条数据。

ubuntu_train_answer_8.txt
ubuntu_train_query_8.txt

zhanghainan · 2019-12-18T03:33:38Z

谢谢您的回复！我刚刚开始了解对话生成，读了您的文章很感兴趣，我实验用的是ubuntu数据集，我不确定这样的格式是否有问题，下面是大约600条数据。

ubuntu_train_answer_8.txt
ubuntu_train_query_8.txt

格式是正确的，你需要多训练一下，我试了一下这600条数据，大概100epoch才能看到效果，整个模型训练大约需要2万epoch in my code。对话生成模型是否收敛不仅仅看dev set的指标，不用担心过拟合，而是要尽可能地多训练，才能看到效果。最后的生成效果会有瓶颈，但是dev指标是很早就开始跳跃了，但是效果还在变好，主要是因为指标不能反映对话生成的效果。

zhanghainan · 2019-12-18T03:36:54Z

Thank you for your response, I agree with you. And I will try to modify the parameters of my implementation and find out the better setting.

I have try the ttzhang511's data about 600 lines, it should have 100 epochs to train this small dataset. For my code , it need at least 20,000 epoches. You could run the model more times, regardless of the dev measures.

gmftbyGMFTBY · 2019-12-18T03:42:41Z

Emmm, but actually the dailydialog dataset is big. So many epochs lead to so much time to converge.

zhanghainan · 2019-12-18T03:44:04Z

Yes, you could see the generation sentences with more times.

gmftbyGMFTBY · 2019-12-18T03:47:28Z

Now, I run 30 epoches on Dailydialog (Cornell is bigger than dailydialog). And through the performance curve, I found nearly all the metrics converges. (bleu1~4, rouge, dist1,dist2, BERTScore)

gmftbyGMFTBY · 2019-12-18T03:48:47Z

I will try to use 100 epoches and analyze the performance. Then I will continue to report the result. Thank you for your response.

ttzhang511 · 2019-12-19T13:34:50Z

I run 500 epoches on Dailydialog and got a lot of " I'm sorry", so do I need to run more epoches ?

gmftbyGMFTBY · 2019-12-19T13:44:31Z

I reproduce the ReCoSa in PyTorch 1.3 and use the official Transformer implementation of the PyTorch 1.3. 30 epochs are used to train the ReCoSa on Dailydialog dataset and here are my partial results:

Due to some special reasons, I don't show more information about the comparison.
It can be found that the ReCoSa is slightly worse than the baselines on some automatic evaluation methods. We will use the human annotations to test the performance of the ReCoSa in the future.

@ttzhang511, hope the comparison can be helpful. I will make my repo public in about a month.

gmftbyGMFTBY · 2019-12-19T13:53:03Z

Oh, I forget some essential information about my experiments:

Training ReCoSa by 30 epochs cost nearly 8.5 hours.
I didn't use the whole Dailydialog datasets due to some special reasons (So the low BLEU is explicable). If the whole Dailydialog dataset is used, the training time will be higher.

ttzhang511 · 2019-12-19T14:05:38Z

Thank you so much! @gmftbyGMFTBY

katherinelyx · 2019-12-28T06:31:52Z

@gmftbyGMFTBY Hi. When I conduct ReCoSa on DailyDialog, I got some frequently existed bad examples.

'yes, and and and and and', many repetitions.
'i i will take it', always starts with the word 'i'.
Have you seen such problems? Or maybe something thing is wrong with my operations? Could you please provide some suggestions about handling such issues?
Thank you very much.

gmftbyGMFTBY · 2019-12-28T06:39:58Z

Hi, I run the codes of the author in this repo, but actually it didn't work.
So I reproduce the ReCoSa by myself in PyTorch (source codes in this repo are written by tensorflow, but I don't think the issue is caused by the deep learning framework.).

In my opinion and has been recognized by the author, the ReCoSa just simply replace the RNN-based context encoder with the vanilla transformer encoder. So in my implementation, I use the official transformer architecture in PyTorch 1.3 (I also think you can try the transformer2.0) and the generation seems to be normal.

But after comparing with the HRED and other baselines, I found that ReCoSa is slightly worse than these baselines (Although the BLEU and other automatic evaluations may not be suitable for measuring the open-domain dialogue systems). I also tried some hyperparameters but the conclusion is the same. You can try to reproduce the ReCoSa by yourself and see the result.

katherinelyx · 2019-12-28T07:22:54Z

Thank you. Do you have any suggestion about high-quality tensorflow (<2.0) implement of Transformer ? I am not sure that whether it is the implementation I used cause these bad examples. 发自我的iPhone

…

在 2019年12月28日，14:40，GMFTBY ***@***.***> 写道： Hi, I run the codes of the author in this repo, but actually it didn't work. So I reproduce the ReCoSa by myself in PyTorch (source codes in this repo are written by tensorflow, but I don't think the issue is caused by the deep learning framework.). In my opinion and has been recognized by the author, the ReCoSa just simply replace the RNN-based context encoder with the vanilla transformer encoder. So in my implementation, I use the official transformer architecture in PyTorch 1.3 (I also think you can try the transformer2.0) and the generation seems to be normal. But after comparing with the HRED and other baselines, I found that ReCoSa is slightly worse than these baselines (Although the BLEU and other automatic evaluations may not be suitable for measuring the open-domain dialogue systems). I also tried some hyperparameters but the conclusion is the same. You can try to reproduce the ReCoSa by yourself and see the result. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

gmftbyGMFTBY · 2019-12-28T07:29:13Z

I'm not very familiar with the TensorFlow. Transformer 2.0 (huggingface) seems that support the TensorFlow and you can try it.

gmftbyGMFTBY · 2020-02-09T15:24:48Z

@ttzhang511 @katherinelyx @zhanghainan Hi guys, I make my repo public and it contains the PyTorch version ReCoSa and other multi-turn dialogue models. Welcome to use my repo. If you have any questions, feel free to raise the issue and let me know. I will try my best to give my solutions and responses. Thank you so much, and I will close the issue.

Repo: MultiTurnDialogZoo

gmftbyGMFTBY closed this as completed Feb 9, 2020

gmftbyGMFTBY mentioned this issue Feb 22, 2020

An inquiry about ReCoSa model gmftbyGMFTBY/MultiTurnDialogZoo#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run the code on DailyDialog but have terrible result #5

Run the code on DailyDialog but have terrible result #5

gmftbyGMFTBY commented Sep 14, 2019 •

edited

Loading

zhanghainan commented Dec 6, 2019

gmftbyGMFTBY commented Dec 6, 2019

zhanghainan commented Dec 17, 2019

ttzhang511 commented Dec 17, 2019 •

edited

Loading

zhanghainan commented Dec 17, 2019

ttzhang511 commented Dec 17, 2019

gmftbyGMFTBY commented Dec 17, 2019 •

edited

Loading

zhanghainan commented Dec 18, 2019

zhanghainan commented Dec 18, 2019

gmftbyGMFTBY commented Dec 18, 2019 •

edited

Loading

ttzhang511 commented Dec 18, 2019

zhanghainan commented Dec 18, 2019

zhanghainan commented Dec 18, 2019

gmftbyGMFTBY commented Dec 18, 2019 •

edited

Loading

zhanghainan commented Dec 18, 2019

gmftbyGMFTBY commented Dec 18, 2019

gmftbyGMFTBY commented Dec 18, 2019

ttzhang511 commented Dec 19, 2019

gmftbyGMFTBY commented Dec 19, 2019 •

edited

Loading

gmftbyGMFTBY commented Dec 19, 2019 •

edited

Loading

ttzhang511 commented Dec 19, 2019

katherinelyx commented Dec 28, 2019

gmftbyGMFTBY commented Dec 28, 2019

katherinelyx commented Dec 28, 2019 via email

gmftbyGMFTBY commented Dec 28, 2019

gmftbyGMFTBY commented Feb 9, 2020 •

edited

Loading

Run the code on DailyDialog but have terrible result #5

Run the code on DailyDialog but have terrible result #5

Comments

gmftbyGMFTBY commented Sep 14, 2019 • edited Loading

zhanghainan commented Dec 6, 2019

gmftbyGMFTBY commented Dec 6, 2019

zhanghainan commented Dec 17, 2019

ttzhang511 commented Dec 17, 2019 • edited Loading

zhanghainan commented Dec 17, 2019

ttzhang511 commented Dec 17, 2019

gmftbyGMFTBY commented Dec 17, 2019 • edited Loading

zhanghainan commented Dec 18, 2019

zhanghainan commented Dec 18, 2019

gmftbyGMFTBY commented Dec 18, 2019 • edited Loading

ttzhang511 commented Dec 18, 2019

zhanghainan commented Dec 18, 2019

zhanghainan commented Dec 18, 2019

gmftbyGMFTBY commented Dec 18, 2019 • edited Loading

zhanghainan commented Dec 18, 2019

gmftbyGMFTBY commented Dec 18, 2019

gmftbyGMFTBY commented Dec 18, 2019

ttzhang511 commented Dec 19, 2019

gmftbyGMFTBY commented Dec 19, 2019 • edited Loading

gmftbyGMFTBY commented Dec 19, 2019 • edited Loading

ttzhang511 commented Dec 19, 2019

katherinelyx commented Dec 28, 2019

gmftbyGMFTBY commented Dec 28, 2019

katherinelyx commented Dec 28, 2019 via email

gmftbyGMFTBY commented Dec 28, 2019

gmftbyGMFTBY commented Feb 9, 2020 • edited Loading

gmftbyGMFTBY commented Sep 14, 2019 •

edited

Loading

ttzhang511 commented Dec 17, 2019 •

edited

Loading

gmftbyGMFTBY commented Dec 17, 2019 •

edited

Loading

gmftbyGMFTBY commented Dec 18, 2019 •

edited

Loading

gmftbyGMFTBY commented Dec 18, 2019 •

edited

Loading

gmftbyGMFTBY commented Dec 19, 2019 •

edited

Loading

gmftbyGMFTBY commented Dec 19, 2019 •

edited

Loading

gmftbyGMFTBY commented Feb 9, 2020 •

edited

Loading