Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues of reproducing Table 1 results on Commonsense Conversation Dataset (CCD) #13

Open
Silin159 opened this issue Mar 16, 2023 · 1 comment

Comments

@Silin159
Copy link

Hi, I try to use your script (ccd.sh) to reproduce the Table 1 results on Commonsense Conversation Dataset, but it turns out that my reproduced results (BLEU: 0.154, Rouge-L: 6.38) are far below your reported values (BLEU: 1.02, Rouge-L: 8.59). Could you check whether the hyperparameters in ccd.sh are the optimal ones that you use? It would be better if you could also provide the evaluation scripts for producing BLEU and Rouge-L (currently the inference_scripts only save the testing outputs but no metrics evaluation results if I run it right)? Besides, are there any model checkpoints and testing outputs available?

@Yuanhy1997
Copy link
Owner

I think you can select the checkpoints around 100000 training steps using the validation data. The number on the paper is out-of-date, the new results is a bit lower and is 0.84 in BLEU. BTW CCD is a pretty bizarre datasets in a way that it easily overfit the training data and the outputs actually require commonsense knowledges. (Diffuseq only achieved BLEU around 1. This means the outputs barely correlate with the labels.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants