Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not "next_strategy_id" used in Blender mode code? #1

Closed
nanzhao opened this issue Jun 29, 2021 · 5 comments
Closed

Why not "next_strategy_id" used in Blender mode code? #1

nanzhao opened this issue Jun 29, 2021 · 5 comments

Comments

@nanzhao
Copy link

nanzhao commented Jun 29, 2021

I see "next_strategy_id" is not used in the Emotional-Support-Conversation/codes/src/transformers/models/blenderbot_small/modeling_blenderbot_small.py.

Except for useless code for role and turn id, I can't tell more code differences between codes/src/transformers and huggingface transformers.

That is to say, I can successfully reproduce your result only by data preprocess adding strategy. Right?

@lsy641
Copy link
Collaborator

lsy641 commented Jun 29, 2021

Hi. Partially true. You can successfully train a model only by data preprocess adding strategy. No difference between using codes/src/transformers or huggingface transformers in the modeling process.

next_strategy_id is used in the code if you'd like to generate a response conditioned on a designated strategy. If you reproduce the Joint model, it is not necessary.

Role id and turn id are useless. I tried to use this information but didn't get significant improvement.

Except for the above difference, I think there is another different place. I changed the way of penalizing repetitive tokens (mentioned in the appendix of the paper). That should be seen in the generation part.
I will show you where the code is later.

@lsy641
Copy link
Collaborator

lsy641 commented Jun 29, 2021

https://github.com/thu-coai/Emotional-Support-Conversation/blob/b0c98059edeb0607f762f62587c09f17fda2a2c1/codes/src/transformers/generation_utils.py

In line 1392, I changed the logits_processor's inputs

 if encoder_input_ids is not None:
     next_token_scores = logits_processor(torch.cat([encoder_input_ids, input_ids],dim=-1), next_token_logits)
    #next_token_scores = logits_processor(input_ids, next_token_scores)

@lsy641
Copy link
Collaborator

lsy641 commented Jun 29, 2021

You may see magic numbers in the https://github.com/thu-coai/Emotional-Support-Conversation/blob/b0c98059edeb0607f762f62587c09f17fda2a2c1/codes/src/transformers/generation_utils.py, such as 50257 or 54945. They are the size of the vocab that is either used by Blender or DialoGPT.

BTW, in the original implementation, I forgot to give masks to padding tokens in the encoder, which may make a little difference between your produced results and the paper's results. I have corrected the code in the file.

Happy to reply to you if you have any other questions.

@nanzhao
Copy link
Author

nanzhao commented Jun 29, 2021

I thought args.strategy is a mode option for Vanilla or Joint/Oracle.

But now, I think I understand your implementation with your explanations above.
args.strategy == True, means it works as Oracle mode.
args.strategy == False, means it works as Joint mode.

So I close this ticket with no issue.

@nanzhao nanzhao closed this as completed Jun 29, 2021
@lsy641
Copy link
Collaborator

lsy641 commented Jun 29, 2021

I thought args.strategy is a mode option for Vanilla or Joint/Oracle.

But now, I think I understand your implementation with your explanations above.
args.strategy == True, means it works as Oracle mode.
args.strategy == False, means it works as Joint mode.

So I close this ticket with no issue.

Sorry. Your original thought is correct. The oracle model and joint model are the same model in training. So they all use strategy. They are only different in the generation.

arg.strategy == True, means it works as the models with strategy. Especially, when generating, you should put next_strategy_id and cls_position (the position of the strategy you want to add) into the dict params and run model.generate(input_ids, **params, ..., ) if you want to give a designated strategy.

arg.strategy ==False, means it works as the vanilla model.

In training, there is no necessity that models use next_strategy_id (only used to designating a strategy in generation) because strategy tokens are already in inputs_ids and label_ids.

I noticed there is also an input parameter next_strategy_ids in modeling. It seems I created it for a classification head. When I found the classification head doesn't carry much improvement, I give up using it as the same as I treat role_id and turn_id. But next_strategy_id is used in the generation.

Happy to hear from you if you have more questions.

@lsy641 lsy641 reopened this Jun 29, 2021
@lsy641 lsy641 closed this as completed Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants