Why not "next_strategy_id" used in Blender mode code? #1

nanzhao · 2021-06-29T03:35:33Z

I see "next_strategy_id" is not used in the Emotional-Support-Conversation/codes/src/transformers/models/blenderbot_small/modeling_blenderbot_small.py.

Except for useless code for role and turn id, I can't tell more code differences between codes/src/transformers and huggingface transformers.

That is to say, I can successfully reproduce your result only by data preprocess adding strategy. Right?

lsy641 · 2021-06-29T10:42:35Z

Hi. Partially true. You can successfully train a model only by data preprocess adding strategy. No difference between using codes/src/transformers or huggingface transformers in the modeling process.

next_strategy_id is used in the code if you'd like to generate a response conditioned on a designated strategy. If you reproduce the Joint model, it is not necessary.

Role id and turn id are useless. I tried to use this information but didn't get significant improvement.

Except for the above difference, I think there is another different place. I changed the way of penalizing repetitive tokens (mentioned in the appendix of the paper). That should be seen in the generation part.
I will show you where the code is later.

lsy641 · 2021-06-29T11:44:08Z

https://github.com/thu-coai/Emotional-Support-Conversation/blob/b0c98059edeb0607f762f62587c09f17fda2a2c1/codes/src/transformers/generation_utils.py

In line 1392, I changed the logits_processor's inputs

 if encoder_input_ids is not None:
     next_token_scores = logits_processor(torch.cat([encoder_input_ids, input_ids],dim=-1), next_token_logits)
    #next_token_scores = logits_processor(input_ids, next_token_scores)

lsy641 · 2021-06-29T11:51:27Z

You may see magic numbers in the https://github.com/thu-coai/Emotional-Support-Conversation/blob/b0c98059edeb0607f762f62587c09f17fda2a2c1/codes/src/transformers/generation_utils.py, such as 50257 or 54945. They are the size of the vocab that is either used by Blender or DialoGPT.

BTW, in the original implementation, I forgot to give masks to padding tokens in the encoder, which may make a little difference between your produced results and the paper's results. I have corrected the code in the file.

Happy to reply to you if you have any other questions.

nanzhao · 2021-06-29T13:21:56Z

I thought args.strategy is a mode option for Vanilla or Joint/Oracle.

But now, I think I understand your implementation with your explanations above.
args.strategy == True, means it works as Oracle mode.
args.strategy == False, means it works as Joint mode.

So I close this ticket with no issue.

lsy641 · 2021-06-29T13:31:07Z

I thought args.strategy is a mode option for Vanilla or Joint/Oracle.

But now, I think I understand your implementation with your explanations above.
args.strategy == True, means it works as Oracle mode.
args.strategy == False, means it works as Joint mode.

So I close this ticket with no issue.

Sorry. Your original thought is correct. The oracle model and joint model are the same model in training. So they all use strategy. They are only different in the generation.

arg.strategy == True, means it works as the models with strategy. Especially, when generating, you should put next_strategy_id and cls_position (the position of the strategy you want to add) into the dict params and run model.generate(input_ids, **params, ..., ) if you want to give a designated strategy.

arg.strategy ==False, means it works as the vanilla model.

In training, there is no necessity that models use next_strategy_id (only used to designating a strategy in generation) because strategy tokens are already in inputs_ids and label_ids.

I noticed there is also an input parameter next_strategy_ids in modeling. It seems I created it for a classification head. When I found the classification head doesn't carry much improvement, I give up using it as the same as I treat role_id and turn_id. But next_strategy_id is used in the generation.

Happy to hear from you if you have more questions.

nanzhao closed this as completed Jun 29, 2021

lsy641 reopened this Jun 29, 2021

lsy641 closed this as completed Jun 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not "next_strategy_id" used in Blender mode code? #1

Why not "next_strategy_id" used in Blender mode code? #1

nanzhao commented Jun 29, 2021

lsy641 commented Jun 29, 2021 •

edited

lsy641 commented Jun 29, 2021 •

edited

lsy641 commented Jun 29, 2021 •

edited

nanzhao commented Jun 29, 2021

lsy641 commented Jun 29, 2021 •

edited

Why not "next_strategy_id" used in Blender mode code? #1

Why not "next_strategy_id" used in Blender mode code? #1

Comments

nanzhao commented Jun 29, 2021

lsy641 commented Jun 29, 2021 • edited

lsy641 commented Jun 29, 2021 • edited

lsy641 commented Jun 29, 2021 • edited

nanzhao commented Jun 29, 2021

lsy641 commented Jun 29, 2021 • edited

lsy641 commented Jun 29, 2021 •

edited

lsy641 commented Jun 29, 2021 •

edited

lsy641 commented Jun 29, 2021 •

edited

lsy641 commented Jun 29, 2021 •

edited