You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Thanks for your wonderful work of ConvLab series!
I found that the end-to-end performance you reported on MultiWoz is slightly mismatched with the component level evaluation.
For example, BERTNLU+RuleDST+RulePolicy+TemplateNLG has a lower Complete rate and Sucess rate than MILU+RuleDST+RulePolicy+TemplateNLG. However, BERTNLU performs better than MILU in module evaluation.
How does this happen?
Besides, given a newly unseen evaluation dataset, how can I decide which pipeline configuration perform best? As there are too much combinations of difference modules and the module level evaluation does not match well with end-to-end evaluation well.
I know these are very open questions and I am now trying to do some research about it. I will be very grateful If you have some insights or literature to share with me.
The text was updated successfully, but these errors were encountered:
To decide which configuration to use, I would try several models that perform best in module-wise evaluation. Also, Try to use the pre-trained model in an end-to-end setting.
Hi, Thanks for your wonderful work of ConvLab series!
I found that the end-to-end performance you reported on MultiWoz is slightly mismatched with the component level evaluation.
For example, BERTNLU+RuleDST+RulePolicy+TemplateNLG has a lower Complete rate and Sucess rate than MILU+RuleDST+RulePolicy+TemplateNLG. However, BERTNLU performs better than MILU in module evaluation.
How does this happen?
Besides, given a newly unseen evaluation dataset, how can I decide which pipeline configuration perform best? As there are too much combinations of difference modules and the module level evaluation does not match well with end-to-end evaluation well.
I know these are very open questions and I am now trying to do some research about it. I will be very grateful If you have some insights or literature to share with me.
The text was updated successfully, but these errors were encountered: