New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why not "next_strategy_id" used in Blender mode code? #1
Comments
Hi. Partially true. You can successfully train a model only by data preprocess adding strategy. No difference between using codes/src/transformers or huggingface transformers in the modeling process. next_strategy_id is used in the code if you'd like to generate a response conditioned on a designated strategy. If you reproduce the Joint model, it is not necessary. Role id and turn id are useless. I tried to use this information but didn't get significant improvement. Except for the above difference, I think there is another different place. I changed the way of penalizing repetitive tokens (mentioned in the appendix of the paper). That should be seen in the generation part. |
In line 1392, I changed the logits_processor's inputs
|
You may see magic numbers in the https://github.com/thu-coai/Emotional-Support-Conversation/blob/b0c98059edeb0607f762f62587c09f17fda2a2c1/codes/src/transformers/generation_utils.py, such as 50257 or 54945. They are the size of the vocab that is either used by Blender or DialoGPT. BTW, in the original implementation, I forgot to give masks to padding tokens in the encoder, which may make a little difference between your produced results and the paper's results. I have corrected the code in the file. Happy to reply to you if you have any other questions. |
I thought args.strategy is a mode option for Vanilla or Joint/Oracle. But now, I think I understand your implementation with your explanations above. So I close this ticket with no issue. |
Sorry. Your original thought is correct. The oracle model and joint model are the same model in training. So they all use strategy. They are only different in the generation. arg.strategy == True, means it works as the models with strategy. Especially, when generating, you should put next_strategy_id and cls_position (the position of the strategy you want to add) into the dict params and run model.generate(input_ids, **params, ..., ) if you want to give a designated strategy. arg.strategy ==False, means it works as the vanilla model. In training, there is no necessity that models use next_strategy_id (only used to designating a strategy in generation) because strategy tokens are already in inputs_ids and label_ids. I noticed there is also an input parameter next_strategy_ids in modeling. It seems I created it for a classification head. When I found the classification head doesn't carry much improvement, I give up using it as the same as I treat role_id and turn_id. But next_strategy_id is used in the generation. Happy to hear from you if you have more questions. |
I see "next_strategy_id" is not used in the Emotional-Support-Conversation/codes/src/transformers/models/blenderbot_small/modeling_blenderbot_small.py.
Except for useless code for role and turn id, I can't tell more code differences between codes/src/transformers and huggingface transformers.
That is to say, I can successfully reproduce your result only by data preprocess adding strategy. Right?
The text was updated successfully, but these errors were encountered: